Re: [PHP-DEV] [RFC] PHP.net analytics

Bob Weinand Fri, 01 Nov 2024 16:11:09 -0700

On 1.11.2024 22:41:29, Larry Garfield wrote:

In a similar vein to approving the use of software, Roman Pronskiy asked for my 
help putting together an RFC on collecting analytics for PHP.net.


https://wiki.php.net/rfc/phpnet-analytics

Of particular note:

* This is self-hosted, first-party only.  No third parties get data, so no 
third parties can do evil things with it.
* There is no plan to collect any PII.
* The goal is to figure how how to most efficiently spend Foundation money 
improving php.net, something that is sorely needed.

Ideally we'd have this in place by the 8.4 release or shortly thereafter, 
though I realize that's a little tight on the timeline.


Hey Larry,

I have a couple concerns and questions:

Is there a way to track analytics with only transient data? As in, dataactually stored is always already anonymized enough that it would beunproblematic to share it with everyone?Or possibly, is there a retention period for the raw data after whichonly anonymized data remains?

Do you actually have a plan what to use that data for? The RFC mostlytalks about "high traffic". But does that mean anything? I do look at adocumentation page, because I need to look something specific up (whatwas the order of arguments of strpos again?). I may only look shortly atit. Maybe even often. But it has absolutely zero signal on whether thedocumentation page is good enough. In that case I don't look at thecomments either. Comments are something you rarely look at, mostly thefirst time you want to even use a function.

Also, I don't buy the argument none of that can be derived from serverlogs. Let's see what the RFC names:


    Time-on-page
    Whether they read the whole page or just a part
    Whether they even saw comments

Yes, these need a client side tracker. But I doubt the usefulness of thesignal. You don't know who reads that. Is it someone who is alreadyfamiliar with PHP and searches a detail? He'll quickly just find onepart. Is it someone who is new to PHP and tries to understand PHP. Hemay well read the whole page. But you don't know that.

Quality of documentation is measured in whether it's possible to graspthe information easily. Not in how long or how completely a page isbeing read.

What percentage of users get to the docs through direct links vsthe home page

That's something you can generally infer from server logs - was the homepage accessed from that IP right before another page was opened? It'snot as accurate, but for a general understanding of orders of magnitudeit's good enough.

If users are hitting a single page per browser window or navigatingthrough the site, and if the latter, how?

Number of windows needs a client side tracker too. Knowing whether thecross-referencing links (e.g. "See also") are used is possibly relevant.And also "what functions are looked up after this function".

How much are users using the search function? Is it finding whatthey want, or is it just a crutch?

How much is probably greppable from the server logs as well. Whetherthey find what they want - I'm not sure how you'd determine that. Isearch something ... and possibly open a page. If that's not what Iwanted, I'll leave the site and e.g. use google. If that's what Iwanted, I'll also stop looking after that page.

Do people use the translations alone, or do they use both theEnglish site and other languages in tandem?

    Does anyone use multiple translations?

That's likely also determinable by server logs.

And yeah, server logs are more locked down. But that's something you canfix. I hope that the raw analytics data is just as locked down as theserver logs...

I get that "cached by another proxy" is a possible problem, but it's astrawman I think. You don't need to be able to track all users, but justmany.

Overall I feel like the signal we can get from using a JS trackerspecifically is comparatively low to the point it's not actually worth it.

Bob

Re: [PHP-DEV] [RFC] PHP.net analytics

Reply via email to