Hey folks!
Yes, I did read the FAQ, Manual and spent about an hour of google'ing around. So far I came across outdated versions or no longer valid informations.
Summary: I do have piwik up and running, even with GeoIP stuff, so far it works great. Now, I can't modify the websites of the pages, so I have to rely on the apache logs, which I already modified to include the requested (sub)domain. Logs are in /var/log/http/$domain/$fqdn-access.log, ie, /var/log/http/tree.com/stump.tree.com-access.log. There are always new hosts / domains coming and going, so there is an unknown number of subdomains to be handled.
I built my log-import script like this:
This actually adds new sites for... new sites (duh), as required. All nice and easy - yay!
Now the tricky part, also known as Problem (dun-dun-dun)...
The logfiles are deleted each month (actually backed-up, then deleted). This also means that until then, the logs are not rotated. The beforementioned script does run every 3 hours, yielding in what I can see in pretty much duplictated entries. Hence the problem. (Is this a bug or a missing feature?)
I also noticed there is an archive.php script, which use is currently a mistery to me. Does this delete the duplicated entries? After a run of archive.php a site that had 4 visits (really 2 visits, but with one duplictate log entry 4) still remains 4. It does drop if I delete all the piwik_archive_* tables, but uh... This would mean I'd have to:
- run the update script,
- run the archive script,
- drop all archive dbs.
every 3 hours!
I know I am missing something blatantly obvious here. The question is: How do I really go for updating Piwik with apache logs (which can't be rotated)?
Thank you very much in advance,
great work with piwik,
-Christian.
Yes, I did read the FAQ, Manual and spent about an hour of google'ing around. So far I came across outdated versions or no longer valid informations.
Summary: I do have piwik up and running, even with GeoIP stuff, so far it works great. Now, I can't modify the websites of the pages, so I have to rely on the apache logs, which I already modified to include the requested (sub)domain. Logs are in /var/log/http/$domain/$fqdn-access.log, ie, /var/log/http/tree.com/stump.tree.com-access.log. There are always new hosts / domains coming and going, so there is an unknown number of subdomains to be handled.
I built my log-import script like this:
#! /usr/local/bin/bash # Configuration BIN="/usr/local/www/piwik/misc/log-analytics/import_logs.py" URL="http://server/piwik/" SMP="4" EXTRA="--enable-http-errors --enable-http-redirects --enable-static --enable-http-redirects --enable-reverse-dns --enable-bots --add-sites-new-hosts" find /var/log/httpd/ -type f -iname "*access*" | xargs $BIN --url=$URL --recorders=$SMP $EXTRA
This actually adds new sites for... new sites (duh), as required. All nice and easy - yay!
Now the tricky part, also known as Problem (dun-dun-dun)...
The logfiles are deleted each month (actually backed-up, then deleted). This also means that until then, the logs are not rotated. The beforementioned script does run every 3 hours, yielding in what I can see in pretty much duplictated entries. Hence the problem. (Is this a bug or a missing feature?)
I also noticed there is an archive.php script, which use is currently a mistery to me. Does this delete the duplicated entries? After a run of archive.php a site that had 4 visits (really 2 visits, but with one duplictate log entry 4) still remains 4. It does drop if I delete all the piwik_archive_* tables, but uh... This would mean I'd have to:
- run the update script,
- run the archive script,
- drop all archive dbs.
every 3 hours!
I know I am missing something blatantly obvious here. The question is: How do I really go for updating Piwik with apache logs (which can't be rotated)?
Thank you very much in advance,
great work with piwik,
-Christian.