Cleaning Squid Logs

| | Comments (0) | TrackBacks (0)

Two quick unix commands that will clean a Squid log file for import into R. Very useful when you need to fine tune your cache strategy.

# read in some.log file, remove unecessary bits
cat some.log | tr -s ' ' | cut -d' ' -f1,4,7 > log.clean

# strip off the millisecond values from the timestamp, shorten the requested URL
sed -r -n -e 's/^([[:digit:]]*)[^ ]* *[^ ]* *http:\/\/somedomainname(.*)$/\1 \2/Ip' log.clean > log.import

This creates a two column file. First column is a timestamp at 1 second granularity. The second should be the query string after the domain. You should tweak this last column to be as small as possible, if you have a very large log file.

Now simulate your cache settings with R.

0 TrackBacks

Listed below are links to blogs that reference this entry: Cleaning Squid Logs.

TrackBack URL for this entry: http://www.manamplified.org/cgi-bin/mt-tb.cgi/328

Leave a comment