Cheers everyone, forum first-timer here …
I'm in the process of importing server access logs into Piwik and hit a wall there. Everything I import gets recorded with the date and time the import is running. It seems to me that the dates in those access logs are being simply ignored.
I've done this import stuff before and as far as I can remember it went well then.
I'm calling the importer like that:
The output of that yields:
It doesn't matter if I run the suggested archiving command or not; the dates remain wrong (also double-verified by directly querying the database for those records).
A stripped down version of a tested log file looks like this (this very file generated the output above):
I poked around a bit in import_logs.py and it seems that the dates are at least correct up to the point where the whole lot is handed over to piwik.php. From there on things are out of my league to debug.
Also, all logs together result in around 150k lines and range from April to today and right now these only generate 280 visitors shown on the Dashboard. I'm aware that there's a lot of garbage in those logs and it's not a very frequented site but still this seems like very little outcome. Though I'm not sure if this won't change once the date issue is resolved.
Piwik is at version 2.14.0, PHP is 5.4.17.
If I can provide more information that might be helpful, please let me know.
Best and thanks in advance for any help
Matthias
I'm in the process of importing server access logs into Piwik and hit a wall there. Everything I import gets recorded with the date and time the import is running. It seems to me that the dates in those access logs are being simply ignored.
I've done this import stuff before and as far as I can remember it went well then.
I'm calling the importer like that:
python misc/log-analytics/import_logs.py --show-progress -dddddddd --url=http://mypiwik.example --idsite=<SITE_ID_HERE> --token-auth=<AUTH_TOKEN_HERE> /path/to/logfile.log
The output of that yields:
2015-07-16 00:16:03,629: [DEBUG] Accepted hostnames: all 2015-07-16 00:16:03,630: [DEBUG] Piwik URL is: http://mypiwik.example 2015-07-16 00:16:03,630: [DEBUG] Authentication token token_auth is: <AUTH_TOKEN_HERE> 2015-07-16 00:16:03,630: [DEBUG] Resolver: static 0 lines parsed, 0 lines recorded, 0 records/sec (avg), 0 records/sec (current) 2015-07-16 00:16:03,991: [DEBUG] Launched recorder Parsing log /path/to/logfile.log... 2015-07-16 00:16:03,991: [DEBUG] Detecting the log format 2015-07-16 00:16:03,991: [DEBUG] Check format icecast2 2015-07-16 00:16:03,992: [DEBUG] Format icecast2 matches 2015-07-16 00:16:03,992: [DEBUG] Format match contains 9 groups 2015-07-16 00:16:03,992: [DEBUG] Check format w3c_extended 2015-07-16 00:16:03,993: [DEBUG] Format w3c_extended does not match 2015-07-16 00:16:03,993: [DEBUG] Check format iis 2015-07-16 00:16:03,993: [DEBUG] Format iis does not match 2015-07-16 00:16:03,993: [DEBUG] Check format common 2015-07-16 00:16:03,994: [DEBUG] Format common matches 2015-07-16 00:16:03,994: [DEBUG] Format match contains 6 groups 2015-07-16 00:16:03,994: [DEBUG] Check format common_vhost 2015-07-16 00:16:03,994: [DEBUG] Format common_vhost does not match 2015-07-16 00:16:03,994: [DEBUG] Check format nginx_json 2015-07-16 00:16:03,995: [DEBUG] Format nginx_json does not match 2015-07-16 00:16:03,995: [DEBUG] Check format s3 2015-07-16 00:16:03,995: [DEBUG] Format s3 does not match 2015-07-16 00:16:03,995: [DEBUG] Check format ncsa_extended 2015-07-16 00:16:03,996: [DEBUG] Format ncsa_extended matches 2015-07-16 00:16:03,996: [DEBUG] Format match contains 8 groups 2015-07-16 00:16:03,996: [DEBUG] Check format common_complete 2015-07-16 00:16:03,996: [DEBUG] Format common_complete does not match 2015-07-16 00:16:03,996: [DEBUG] Check format amazon_cloudfront 2015-07-16 00:16:03,996: [DEBUG] Format amazon_cloudfront does not match 2015-07-16 00:16:03,997: [DEBUG] Format icecast2 is the best match Logs import summary ------------------- 6 requests imported successfully 0 requests were downloads 3 requests ignored: 0 HTTP errors 0 HTTP redirects 0 invalid log lines 0 requests did not match any known site 0 requests did not match any --hostname 3 requests done by bots, search engines... 0 requests to static resources (css, js, images, ico, ttf...) 0 requests to file downloads did not match any --download-extensions Website import summary ---------------------- 6 requests imported to 1 sites 1 sites already existed 0 sites were created: 0 distinct hostnames did not match any existing site: Performance summary ------------------- Total time: 0 seconds Requests imported per second: 15.45 requests per second Processing your log data ------------------------ In order for your logs to be processed by Piwik, you may need to run the following command: ./console core:archive --force-all-websites --force-all-periods=315576000 --force-date-last-n=1000 --url='http://mypiwik.example'
It doesn't matter if I run the suggested archiving command or not; the dates remain wrong (also double-verified by directly querying the database for those records).
A stripped down version of a tested log file looks like this (this very file generated the output above):
195.154.188.41 - - [15/Apr/2015:00:19:05 +0200] "GET /portfolio-view/novomania-2011-shanghai HTTP/1.0" 200 20322 "http://atelierschiefer.de/portfolio-view/novomania-2011-shanghai" "Mozilla/5.0 (Windows NT 5.1; rv:33.0) Gecko/20100101 Firefox/33.0" atelierschiefer.de 80.131.0.172 - - [15/Apr/2015:00:20:57 +0200] "GET /kontakt HTTP/1.1" 200 14691 "http://atelierschiefer.de/" "Mozilla/5.0 (Windows NT 6.1; rv:37.0) Gecko/20100101 Firefox/37.0" atelierschiefer.de 131.0.172 - - [15/Apr/2015:00:21:10 +0200] "GET /aktuell HTTP/1.1" 200 18704 "http://atelierschiefer.de/kontakt" "Mozilla/5.0 (Windows NT 6.1; rv:37.0) Gecko/20100101 Firefox/37.0" atelierschiefer.de 80.131.0.172 - - [15/Apr/2015:00:21:35 +0200] "GET /atelier HTTP/1.1" 200 18909 "http://atelierschiefer.de/aktuell" "Mozilla/5.0 (Windows NT 6.1; rv:37.0) Gecko/20100101 Firefox/37.0" atelierschiefer.de 80.131.0.172 - - [15/Apr/2015:00:22:33 +0200] "GET /leistungen HTTP/1.1" 200 14449 "http://atelierschiefer.de/atelier" "Mozilla/5.0 (Windows NT 6.1; rv:37.0) Gecko/20100101 Firefox/37.0" atelierschiefer.de 104.167.106.73 - - [15/Apr/2015:04:54:17 +0200] "GET /portfolio-view/sports-up HTTP/1.1" 200 20895 "-" "Java/1.4.1_04" atelierschiefer.de 207.46.13.44 - - [15/Apr/2015:05:20:12 +0200] "GET /home/ HTTP/1.1" 200 406 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" www.atelierschiefer.de 210.65.193.73 - - [15/Apr/2015:05:39:53 +0200] "GET / HTTP/1.1" 200 18426 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0)" atelierschiefer.de 207.46.13.44 - - [15/Apr/2015:07:24:51 +0200] "GET /beratung/cage HTTP/1.1" 200 431 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" atelierschiefer.de
I poked around a bit in import_logs.py and it seems that the dates are at least correct up to the point where the whole lot is handed over to piwik.php. From there on things are out of my league to debug.
Also, all logs together result in around 150k lines and range from April to today and right now these only generate 280 visitors shown on the Dashboard. I'm aware that there's a lot of garbage in those logs and it's not a very frequented site but still this seems like very little outcome. Though I'm not sure if this won't change once the date issue is resolved.
Piwik is at version 2.14.0, PHP is 5.4.17.
If I can provide more information that might be helpful, please let me know.
Best and thanks in advance for any help
Matthias