I do a similar sort of processing of logs to create the metrics at 
http://ebible.org/metrics/. I use a custom Free Pascal program that is probably 
not useful to anyone else without modification, but let me know if you want to 
see it. The Sword module count is based on actual modules, not the individual 
parts. For privacy reasons, I don't keep the actual IP address after 
geolocation lookup, nor do I keep the exact latitude and longitude. I store an 
anonymous user ID based on a one-way cryptographic hash of
IP address and browser identity, plus the country and point that is 
intentionally limited in resolution (even more than the free 
lite.ip2location.com data is already limited to). I delete the original logs, 
then run some queries on the database to get the metrics. Bots are excluded 
both on the basis of well-known agent identifiers and by activity (i.e. massive 
numbers of downloads and page views by the same IP+agent combination makes it 
look more like a web spider than a human). The result is arguably
devoid of any useful personally identifying information, especially anything 
that could endanger anyone in a creative access country. It probably doesn't 
answer all of the questions you might have, but it does give something of an 
idea of where digital Bibles are going, and it what formats. It also shows 
which translations are being downloaded and/or viewed online. However, if you 
just want to look at the map on that page and feel good about how far and wide 
the Bibles are going, you can do that. Then pat
yourself on the back, praise God for letting us do something useful for His 
Kingdom, then get back to work. ;-)


On 09/10/2017 08:45 PM, ref...@gmx.net wrote:
> There is one thing of which I am not sure how relevant it is, but the EU has 
> created legislation counting IP addresses as personal data (among many other 
> things) and making them subject to storage restrictions and other 
> limitations. Here in the UK it becomes law early next year. Sites designed 
> and intended to be transnational have in the past been pulled up for privacy 
> invading stuff. So I would hazard a guess that while CW is not based in the 
> EU, given that many of the collaborators are and some of the
> front-ends too, we need to consider this.
>
> Peter
>
> Sent from my mobile. Please forgive shortness, typos and weird autocorrects.
>
>
> -------- Original Message --------
> Subject: Re: [sword-devel] FYI geo IP lookups of repo access
> From: DM Smith
> To: SWORD Developers' Collaboration Forum
> CC:
>
>
>     The country information is interesting. I’ve found that bots also skew 
> the counts.
>
>     In my bin dir on the CW server, I have a perl program, moduleScrape.pl, 
> (~/bin/moduleScrape.pl) that slogs through the logs to figure out module 
> downloads counting each download once rather than by all the parts. It first 
> goes through the conf files to find the module in the repository and then 
> picks a single file for each module. Then it goes through the log files (ftp 
> and http) looking for downloads (including zip files) of modules. It tosses 
> hits by bots. The output format is normalized to:
>     DateModuleFormatTransportIPCountrySimplified agent
>     Note IP is obscured here.
>     20150628        Easton  prt     FTP     xxx.xxx.xxx.xxx  United States   
> w4....@xiphos.org <mailto:w4....@xiphos.org>
>     20150628        PolGdanska      zip     HTTP    xxx.xxx.xxx.xxx    Poland 
>  Apache-HttpClient/UNAVAILABLE (java 1.4)
>
>     The program needs tweaking for each server as it “knows” CrossWire’s 
> repositories and it’s logs.
>
>     There are a bunch of flags that allow to specify a date range and is 
> geared to find that last full month.
>
>     The program started out by J Ansorg and improved by N Carter.
>
>     I’ve also a program moduleStats, that runs this program and analyzes the 
> output to produce statistics about the modules.
>
>     Troy and I’ve been talking about tossing the data into a database.
>
>     DM
>
>
>>     On Sep 10, 2017, at 5:38 PM, Karl Kleinpaste <k...@kleinpaste.org 
>> <mailto:k...@kleinpaste.org>> wrote:
>>
>>     Now and then I get curious about where all the accesses to 
>> ftp.xiphos.org come from.  This is a crude summary from my /var/log/xferlog 
>> since early August.  Counts of accesses can be gotten by substituting the 
>> last "uniq" stage of the pipeline with "uniq -c | sort -nr" but such counts 
>> are registering individual files accessed, which is not very informative, 
>> especially for modules that include dozens of image files.
>>
>>     cat xferlog* | cut -f7 -d' ' | sed -e s/::ffff:// | sort | uniq -c | 
>> sort -nr | awk '{ print $2 }' | fgrep . | while read ip ; do geoiplookup $ip 
>> ; done | grep 'GeoIP Country Edition' | sed -e 's/GeoIP Country Edition: //' 
>> | sort | uniq
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel@crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page


-- 
signature

Aloha,
*/Michael Johnson/**
PO BOX 881143 • PUKALANI HI 96788-1143*• USA
mljohnson.org <http://mljohnson.org> • Phone: +1 808-333-6921 • Skype: 
kahunapule


_______________________________________________
sword-devel mailing list: sword-devel@crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Reply via email to