Hi Martijn,
I think this is a great initiative. Thank you for pursuing this. It allows
us to
a) generate better insights into the usage of Apache Flink and its
documentation as shown in the video
a) do this in a privacy preserving way and
c) act as a role model for other Apache projects on this matter
Big +1. I am happy to help, if I can.
Cheers,
Konstantin
On Fri, Jan 14, 2022 at 11:21 AM Martijn Visser <mart...@ververica.com>
wrote:
Hi everyone,
The Flink website currently uses Google Analytics to track how visitors
of
the website are interacting with it. It provides insights into which
documentation pages are visited, how users are using the website (what's
the cycle of pages they visit before exiting the page), if they are
downloading Flink etc. However, the Apache Software Foundation
discourages
using Google Analytics [1] unless meeting certain requirements. The Flink
website currently does not meet those requirements.
I do believe that it's useful to understand what parts of a website are
important to users, what features are most frequently read up on, where
they get lost in the docs, etc. so we can better understand how users use
the system, the website, and the docs and where to focus improvements
next.
I would like to move the Flink website from Google Analytics to an
alternative as soon as possible for Flink. I would be in favour of
opening
up insights to this data for everyone too, it's public data anyway.
For the past couple of months, I've been engaging in a conversation with
ASF Legal and ASF Infra about setting up a privacy-friendly alternative
for
Google Analytics for all ASF projects via the priv...@apache.org mailing
list (I can't find a public web archive link for this unfortunately). As
part of that discussion, I've done a test with the open source and
self-hosted version of Matomo [2], taking a look at the privacy
implications and the functionality that this tool offers. You can watch a
recording of that experiment [3] and view the test setup I've used [4].
The current status is that ASF Legal, ASF Infra and I have agreed to take
the next step on this project. This step means that:
* I set up Matomo on a VM provided by ASF Infra
* A new DNS name is created (either https://analytics.apache.org/ or
https://matomo.analytics.apache.org/) by ASF Infra
* The Flink website is adjusted to remove the tracking from Google
Analytics and include the necessary Javascript to allow tracking of the
Flink website and documentation in Matomo
If this test would be successful, ASF Infra would take over the hosting
of
this solution and provide it to all ASF projects.
I would like to understand from the Flink community:
1. Do you think this is a good idea?
2. If yes, I need a couple of PMCs for requesting a VM from Apache Infra
[5]
Best regards,
Martijn
https://twitter.com/MartijnVisser82
[1] https://privacy.apache.org/faq/committers.html
[2] https://matomo.org/
[3]
https://drive.google.com/file/d/1yomYhLoyrzBW620bpn_dROiwyvSCzuvt/view?usp=sharing
[4] https://github.com/MartijnVisser/matomo-analytics
[5] https://infra.apache.org/vm-for-project.html
--
Konstantin Knauf
https://twitter.com/snntrable
https://github.com/knaufk