Igniters, Just a quick update. I haven't gotten response from ASF Legal on this thread and I frankly don't know how to proceed here. What's the process to arrive to a decision point here?
Thanks! -- Nikita Ivanov On Mon, Jul 10, 2017 at 3:11 PM, Konstantin Boudnik <c...@apache.org> wrote: > On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote: > > Cos, > > Based on my experience having it off by default negates the entire > > purpose... We need statistically meaningful data set to make any > inferences > > from it. Moreover, if we are going to ask folks to turn it on it will > > significantly skew the resulting data set anyways and show full picture. > I > > think "on" by default is the better option if we are to collect usage > stats > > to begin with. > > yes, sure. But having this "on" by default is likely to expose us to > another > shit-storm down the road. An interesting dilemma to have indeed. In my > experience, whenever I install something like a browser or an operating > system, it would ask if I want to make the particular piece of software > better > by sending back some anonymized stats. Basically, I am given a way to > explicitly opt-out if I wish. > > By turning the feature "on" by default is like saying: "we'll be collecting > some stats, but if you don't want to you can go here and there and disable > the > collection. Oh, and by the way - you need to go and figure out the exact > steps > to disable it." > > > Also, I want to re-iterate it again to avoid misunderstanding: there is > no > > proposal nor will there be a technical way to attribute collected data > back > > to a certain company. That's not what this is all about. We should only > be > > interested in aggregated stats (community size, geo information, language > > information, components usage). > > Yes, I think it is clear, but never hurts to re-iterate. > > Cos > > > Thoughts? > > > > -- > > Nikita Ivanov > > Founder & CTO > > GridGain Systems > > > > On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik <c...@apache.org> > wrote: > > > > > Actually, that should be OFF by default. It sounds like this reduce the > > > amount > > > of the data collected, but this would address the concerns of companies > > > like > > > Roman's. I know for sure that a few of my clients would sue my ass out > of > > > existence if I gave them the platform collecting their data-centers > info. > > > > > > Let's have it, set if off by default and document and easy way to turn > it > > > off. > > > Then start making rounds asking our user base to share _some_ of the > stats > > > with the community, so we can track the growth of the install base, > etc. > > > > > > Cos > > > > > > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote: > > > > The idea so far is to have a single system property in configuration > that > > > > turns this off completely. I envision that this will be prominently > > > > featured on Ignite website so that everyone who would like to > disable it > > > - > > > > can do it in seconds. > > > > > > > > Thoughts? > > > > > > > > -- > > > > Nikita Ivanov > > > > Founder & CTO > > > > GridGain Systems > > > > > > > > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh <rsht...@yahoo.com> > wrote: > > > > > > > > > Nikita, > > > > > > > > > > Sending and storing (somewhere the company cannot securely handle) > any > > > > > information (OS version, IP addresses, etc.) that can be used to > > > compromise > > > > > the services would be unacceptable. > > > > > Turning it off might be ok (possibly through the cluster settings, > not > > > via > > > > > globally-accessible site), but the thing that there's a risk some > > > > > information can leak outside (for any reason, starting from a human > > > > > mistake) is scary. > > > > > > > > > > -- Roman > > > > > > > > > > > > > > > > > > > > > > > > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov < > > > niva...@gridgain.com> > > > > > wrote: > > > > > > > > > > > > > > > Roman, > > > > > Thanks for the feedback. What are those questions specifically? > Are IP > > > > > addresses and OS is what causing it? > > > > > > > > > > Thanks! > > > > > > > > > > -- > > > > > Nikita Ivanov > > > > > Founder & CTO > > > > > GridGain Systems > > > > > > > > > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh > <rsht...@yahoo.com.invalid > > > > > > > > > wrote: > > > > > > > > > > NIkita, > > > > > > > > > > While this will help improve Ignite, it will prevent its adoption > by > > > many > > > > > projects -- sending and retaining IP adresses, OS versions, etc. > raises > > > > > tons of questions when considering to use Ignite. Even if it can be > > > opted > > > > > out. > > > > > -- Roman > > > > > > > > > > > > > > > On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov < > > > nivano...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > Igniters, > > > > > I would like to kick off the discussion on the idea of collecting > > > Ignite > > > > > usage statistics. The basic idea behind this is to better > understand > > > > > general and anonymous Ignite usage information to better calibrate > > > > > community efforts in developing new features, improving existing > ones, > > > > > delivering better documentation - and in every other way to make > our > > > > > project a better software solution. > > > > > > > > > > Although such instrumentation is standard practice in commercially > > > > > developed software, for an ASF project this could be a sensitive > issue. > > > > > Therefore I would like to initiate a full community discussion on > how > > > best > > > > > to implement such practice for the benefit of project while > ensuring > > > the > > > > > privacy protection of Ignite users. > > > > > > > > > > To ignite (pun intended) the discussion I'll outline below some of > the > > > > > basic thoughts that I have on this subject. They are here only to > give > > > an > > > > > idea of what such instrumentation may potentially look like so > that we > > > can > > > > > discuss the merits of this idea in a tangible context. > > > > > > > > > > Overview > > > > > ------------- > > > > > Upon start and every hour thereafter each Ignite node will collect, > > > encrypt > > > > > and send usage statistics over HTTPS to the ASF-hosted server. That > > > server > > > > > will accept such HTTPS packets, decrypt them and store them in a > > > > > time-series DB. A web interface will be provided to view the usage > > > > > information. > > > > > > > > > > Opt-In or Opt-out > > > > > ------------------------- > > > > > Opt-out. Ignite website will offer simple instructions (system > > > property) on > > > > > how to disable this instrumentation. > > > > > > > > > > Code, Infra, Access > > > > > --------------------------- > > > > > Ignite instrumentation will be part of the Ignite code base. The > > > collection > > > > > server will be a separate module in the Ignite code base (released > > > > > separately from Ignite). The collection server will be hosted by > ASF > > > Infra. > > > > > > > > > > Usage statistics will be publicly accessible by anyone in the > > > community. > > > > > > > > > > Private, Personal Data > > > > > ------------------------------ > > > > > No private or personal data will ever be transferred. No emails, > > > usernames, > > > > > company names, grid names, etc. > > > > > > > > > > Data Retention > > > > > -------------------- > > > > > All data will be retained for 1 year and deleted permanently > > > thereafter. > > > > > > > > > > Usage Data > > > > > ---------------- > > > > > The following data will be collected in each packet sent to the > > > collection > > > > > server: > > > > > - GRID_SIZE (to correspond our testing environment with the more > > > frequent > > > > > cluster sizes) > > > > > - IP_ADDR (for general geo-tracking as well as to know what > > > documentation > > > > > language should be a priority) > > > > > - SES_ID (to track continues uptime vs. re-starts) > > > > > - USERNAME_TYPE (privilege username vs. standard, to track > production > > > vs. > > > > > dev/testing usage; note - this is not an actual username) > > > > > - OS_NAME > > > > > - OS_VER > > > > > - OS_ARCH > > > > > - JAVA_VER > > > > > - JAVA_VENDOR > > > > > - COMP_SQL (whether or not this feature was used) > > > > > - COMP_COMPUTE (whether or not this feature was used) > > > > > - COMP_DATAGRID (whether or not this feature was used) > > > > > - COMP_STREAMING (whether or not this feature was used) > > > > > - COMP_IGFS (whether or not this feature was used) > > > > > - COMP_SERVICE (whether or not this feature was used) > > > > > - COMP_PERSISTENCE (whether or not this feature was used) > > > > > > > > > > Please let's discuss this idea. Everyone's comments and > suggestions are > > > > > *extremely* welcome. > > > > > > > > > > Thanks, > > > > > Nikita Ivanov. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >