Search

Processing Large Log Files

In my job, there are a few really strange circumstances I have to deal with on a daily basis.

Well, today I thought I’d share just one of my challenges… Huge Log Files!
Huge Log Files
For some reason (please don’t ask me why), our hosting company “can’t” (or doesn’t know how to) export daily log files for me. So what do they do? They just export one log file every month. It’s zipped, which does help with downloading, but even with a high speed connection, it takes a while to download over 700 MB.

So what to do? If it were up to me, we would switch to a new hosting company, but it’s not, so how do I process log files this big?

I use a great free log analyzer called Analog. It’s extremely powerful, and highly configurable. It’s also the only program I could find that will handle such large files. Nothing else even came close.

Analog can process zipped files, which really makes life easy, but unfortunately, it won’t handle them if they’re over a certain size (somewhere over 300 MB.), so I have to unzip the already huge 700MB file. As you can see in the image, the unzipped file is 11.59 GB, and this is the only way I can get Analog to successfully process files of this size. It take a few minutes to finish, but Analog burns through it like a champ.

Of course we use web based analytics on this site, but there’s only so much web based analytics will show.

So if you’ve got some super huge log files, now you know how to process them.

If you enjoyed this post, make sure you subscribe to my RSS feed!



7 Responses to “Processing Large Log Files”

  1. Pierre Far Says:

    Good tip, Badi. However, even Analog is not good enough when you’re chasing down a specific question, like scrapers or who’s accessing the CSS or JS files. For that, I use grep, which I run on the command line in Windows. It’s very fast (relatively) and I use output piping to whittle down the relevant log lines into smaller and smaller text files. End result is a huge multi-GB log file zoomed in to about 2-3 MB. Then you can do a proper analysis.

    Pierre



  2. Yuri Says:

    You may want to check the following thread:

    http://www.cre8asiteforums.com/forums/index.php?showtopic=51682

    It has links and some discussion on processing large log files (nevermind the thread title).

    By the way, why don’t you correct the tab index for the “verify email address” field. It is currently after the website field.

    Cheers.



  3. Ryan Says:

    Pierre has it right, find “grep” and read the manual page (http://www.die.net/doc/linux/man/man1/grep.1.html) and specify what common terms you need to lookup. If I were to search for this month in a log file I would do the following: “grep 2007-07 logfile.txt >output.txt”

    If you run the tool on Windows you may need to call it “grep.exe” (shudders). I think the output redirection “>” should still work, though if you want to see the log on the screen you can skip the “>output.txt” part.

    Also, if you have the ability to modify the files I would delete them after downloading so you can start fresh each time. If you’re running on Unix or Linux the file should recreate itself on the next hit received by the website. A utility called “logrotate” does this if enabled on the sever.

    Ciao,
    Ryan



  4. B Jones Says:

    My case is pretty unique. I do use grep (zgrep) when I just need to see specific things, for example, 404 requests. That does work extremely well.

    I’ll share a little bit more about my situation. Our web host will not give me shell access (period). And it’s definitely not my choice to stay with them. The site has been hosted with this server for years, and the owner is extremely loyal/ hesitant to change.

    I have used grep (actually zgrep) to output daily log files (in an attempt to see daily stats) from the one huge file (after downloading), but that just takes way too long, and also makes my poor little laptop huff and puff :)

    These huge log files are for one month. We’re talking around 100K daily visits.



  5. Tom Says:

    Hi,

    That’s a crazily large log file to deal with. I used to work for a hosting company and we rotated the logs daily, I’m sure you already know this but if they’re using Apache i’m sure they can do this too… Very frustrating for you, although if the site is getting 100k daily visits I can see why the site owner doesn’t want to rock the boat :)



  6. Hamlet Batista Says:

    Grep is fine for simple cases, but for serious log mining, I personally prefer to create my own custom scripts in Perl or Python. for large log files I normally create a splitter script to divide the log in smaller files.



  7. Tavaris Says:

    I need to process 4GB’s per day! You think this software can help?



Leave a Comment