Dear All,
I am using the following regular expression to import logFiles:
python /var/www/piwik/misc/log-analytics/import_logs.py --url=https://XYZ /media/ezproxy/ezp20150729.log --idsite=35 --dry-run --log-format-regex='(?P<ip>.*)\s-\s[a-zA-Z0-9\-].*[(?P<date>.*?) (?P<timezone>.*?)\] \"(?P<path>.*?)\"\s(?P<status>\S+) (?P<length>\S+)\s\"(?P<user_agent>.*?)\"\s\"(?P<referrer>.*?)\"' --recorders=4 --enable-http-errors --enable-http-redirects --download-extensions=csd,ccs,dmg,enf,ens,enz,7z,aac,arc,arj,asf,asx,avi,bin,csv,deb,dmg,doc,docx,exe,gzip,hqx,jar,mpg,mp2,mp3,mp4,mpeg,mov,movie,msi,msp,odb,odf,odg,odp,ibooks,jar,mpg,mp2,mp3,mp4,mpeg,mov,movie,msi,msp,odb,odf,odg,odp,ods,odt,ogg,ogv,pdf,phps,ppt,pptx,qt,qtm,ra,ram,rar,rpm,sea,sit,tar,tbz,bz2,tgz,torrent,txt,wav,wma,wmv,wpd,xls,xlsx,xml,xsd,z,zip,azw3,epub,mobi,apk,flv,gz
This works fine, except the download files are not counted in the Piwik Backend (empty download file report); in contrast, in the --dry-run modus the download files are recognized.
When I use --log-format-name=common instead of the regular expression, there are about one third of unknown lines but download files are counted. Furthermore, with --log-format-name=common the browser types are not analyzed which is the case with the regular expression.
Has anyone an idea how to solve these problems?
Thank you!
Best
mucctecc
I am using the following regular expression to import logFiles:
python /var/www/piwik/misc/log-analytics/import_logs.py --url=https://XYZ /media/ezproxy/ezp20150729.log --idsite=35 --dry-run --log-format-regex='(?P<ip>.*)\s-\s[a-zA-Z0-9\-].*[(?P<date>.*?) (?P<timezone>.*?)\] \"(?P<path>.*?)\"\s(?P<status>\S+) (?P<length>\S+)\s\"(?P<user_agent>.*?)\"\s\"(?P<referrer>.*?)\"' --recorders=4 --enable-http-errors --enable-http-redirects --download-extensions=csd,ccs,dmg,enf,ens,enz,7z,aac,arc,arj,asf,asx,avi,bin,csv,deb,dmg,doc,docx,exe,gzip,hqx,jar,mpg,mp2,mp3,mp4,mpeg,mov,movie,msi,msp,odb,odf,odg,odp,ibooks,jar,mpg,mp2,mp3,mp4,mpeg,mov,movie,msi,msp,odb,odf,odg,odp,ods,odt,ogg,ogv,pdf,phps,ppt,pptx,qt,qtm,ra,ram,rar,rpm,sea,sit,tar,tbz,bz2,tgz,torrent,txt,wav,wma,wmv,wpd,xls,xlsx,xml,xsd,z,zip,azw3,epub,mobi,apk,flv,gz
This works fine, except the download files are not counted in the Piwik Backend (empty download file report); in contrast, in the --dry-run modus the download files are recognized.
When I use --log-format-name=common instead of the regular expression, there are about one third of unknown lines but download files are counted. Furthermore, with --log-format-name=common the browser types are not analyzed which is the case with the regular expression.
Has anyone an idea how to solve these problems?
Thank you!
Best
mucctecc