Simple Arrival Rate Log Analysis with R

| | Comments (0) | TrackBacks (0)
Quick note on using box plots to show arrival rates during specified interval periods.

The image below shows the arrival rate of requests over a 24 hr period. PDF

r-arrival-rate.png

Instead of plotting each individual second, the period is broken into intervals, and a box plot (or box-and-whisker plot) is used to summarize the values for that period.

r-arrival-rate-zoom.png

The above detail shows a number intervals. The top and bottom edges of the box are the upper and lower quartiles, respectively. The line in the middle is the median value. The upper and lower endpoints of the dotted lines are the maximum and minimum non-outlier (extreme) values, respectively. Finally the circles are outlier values.

To determine the arrival rate, import a list of time stamps. Actually, these don't have to have any real meaning, they just need to represent a moment (second) in time a request was made. Obviously duplicates are to be expected if multiple requests were made during that same moment.

To parse them out of an Apache log, use sed:
sed -n 's/^\([^ ]*\) [^ ]* [^ ]* [^ ]* \[\([^ ]*\) [^ ]*\] .*$/\2\t\1/p'
and Ruby to make a valid timestamp (R could parse this, but it saves memory to do it before the read.table):
#!/opt/local/bin/ruby

while readline   

  line = $_.split( /\t/ );  
  tm = line[ 0 ].split( /:/ );    
  dt = tm[ 0 ].split( /\// );    
  t = Time::mktime( dt[ 2 ], 6, dt[ 0 ], tm[ 1 ], tm[ 2 ], tm[ 3 ] );  

  printf( "%i\t%s", t.to_i, line[ 1 ]);

end
Then import your log:
stats <- read.table( "timestamps.log" )
See this for the 'arrivalrate' function.
rate <- arrivalrate( stats$V1 )
plot.rate.boxplot( rate )
Where:
plot.rate.boxplot <- function( rate, freq=60*15, start_time=0, pdf=F ) {
  if( pdf ) pdf( file="arrival-rate-boxplot.pdf", height=11, width=16 )

  index <- floor((1:length(rate))/freq)  names <- unique( index * freq )
  names <- lapply( names, function( x ) sprintf( "%02d:%02d", floor((start_time + x)/(60*60))%%24, ((start_time + x)/60)%%60 ) )

  boxplot( rate ~ index, names=names, ylab="requests/sec", xlab=paste( freq/60, " minute intervals" ), main="Requests per Second During Each Time Interval" )

  if( pdf ) dev.off()	
}

0 TrackBacks

Listed below are links to blogs that reference this entry: Simple Arrival Rate Log Analysis with R.

TrackBack URL for this entry: http://www.manamplified.org/cgi-bin/mt-tb.cgi/274

Leave a comment