I’m working on a project that needed a parser for Apache log files. Specifically, I’m parsing the default “common” log files generated by Apache 2.
One of the difficulties I have is that log files get big quickly, and I only want to pull the most recent entries (starting from a specified date/time), without reading the whole file.
- Efficiently retrieve entries from a specific date onwards
- View elements of each log entry in an associative array (protocol, time, response code, path, referrer, etc.)
- Ignores HTTP hits which aren’t page views (uses a list of extensions) – room for improvement here perhaps
- Very efficient compared to reading in the whole file to get a subset of the data in a large log
It’s on Github and quite well documented internally (phpDocs) so maybe useful for other projects too.