The image below shows the arrival rate of requests over a 24 hr period. PDF
Instead of plotting each individual second, the period is broken into intervals, and a box plot (or box-and-whisker plot) is used to summarize the values for that period.
The above detail shows a number intervals. The top and bottom edges of the box are the upper and lower quartiles, respectively. The line in the middle is the median value. The upper and lower endpoints of the dotted lines are the maximum and minimum non-outlier (extreme) values, respectively. Finally the circles are outlier values.
To determine the arrival rate, import a list of time stamps. Actually, these don't have to have any real meaning, they just need to represent a moment (second) in time a request was made. Obviously duplicates are to be expected if multiple requests were made during that same moment.
To parse them out of an Apache log, use sed:sed -n 's/^\([^ ]*\) [^ ]* [^ ]* [^ ]* \[\([^ ]*\) [^ ]*\] .*$/\2\t\1/p'
and Ruby to make a valid timestamp (R could parse this, but it saves memory to do it before the read.table):
#!/opt/local/bin/ruby
while readline
line = $_.split( /\t/ );
tm = line[ 0 ].split( /:/ );
dt = tm[ 0 ].split( /\// );
t = Time::mktime( dt[ 2 ], 6, dt[ 0 ], tm[ 1 ], tm[ 2 ], tm[ 3 ] );
printf( "%i\t%s", t.to_i, line[ 1 ]);
end
Then import your log:stats <- read.table( "timestamps.log" )
See this for the 'arrivalrate' function.
rate <- arrivalrate( stats$V1 )
plot.rate.boxplot( rate )
Where:
plot.rate.boxplot <- function( rate, freq=60*15, start_time=0, pdf=F ) {
if( pdf ) pdf( file="arrival-rate-boxplot.pdf", height=11, width=16 )
index <- floor((1:length(rate))/freq) names <- unique( index * freq )
names <- lapply( names, function( x ) sprintf( "%02d:%02d", floor((start_time + x)/(60*60))%%24, ((start_time + x)/60)%%60 ) )
boxplot( rate ~ index, names=names, ylab="requests/sec", xlab=paste( freq/60, " minute intervals" ), main="Requests per Second During Each Time Interval" )
if( pdf ) dev.off()
}
Leave a comment