Cos,
Based on my experience having it off by default negates the entire
purpose... We need statistically meaningful data set to make any inferences
from it. Moreover, if we are going to ask folks to turn it on it will
significantly skew the resulting data set anyways and show full picture. I
think "on" by default is the better option if we are to collect usage stats
to begin with.

Also, I want to re-iterate it again to avoid misunderstanding: there is no
proposal nor will there be a technical way to attribute collected data back
to a certain company. That's not what this is all about. We should only be
interested in aggregated stats (community size, geo information, language
information, components usage).

Thoughts?

--
Nikita Ivanov
Founder & CTO
GridGain Systems

On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik <c...@apache.org> wrote:

> Actually, that should be OFF by default. It sounds like this reduce the
> amount
> of the data collected, but this would address the concerns of companies
> like
> Roman's. I know for sure that a few of my clients would sue my ass out of
> existence if I gave them the platform collecting their data-centers info.
>
> Let's have it, set if off by default and document and easy way to turn it
> off.
> Then start making rounds asking our user base to share _some_ of the stats
> with the community, so we can track the growth of the install base, etc.
>
> Cos
>
> On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> > The idea so far is to have a single system property in configuration that
> > turns this off completely. I envision that this will be prominently
> > featured on Ignite website so that everyone who would like to disable it
> -
> > can do it in seconds.
> >
> > Thoughts?
> >
> > --
> > Nikita Ivanov
> > Founder & CTO
> > GridGain Systems
> >
> > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh <rsht...@yahoo.com> wrote:
> >
> > > Nikita,
> > >
> > > Sending and storing (somewhere the company cannot securely handle) any
> > > information (OS version, IP addresses, etc.) that can be used to
> compromise
> > > the services would be unacceptable.
> > > Turning it off might be ok (possibly through the cluster settings, not
> via
> > > globally-accessible site), but the thing that there's a risk some
> > > information can leak outside (for any reason, starting from a human
> > > mistake) is scary.
> > >
> > > -- Roman
> > >
> > >
> > >
> > >
> > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
> niva...@gridgain.com>
> > > wrote:
> > >
> > >
> > > Roman,
> > > Thanks for the feedback. What are those questions specifically? Are IP
> > > addresses and OS is what causing it?
> > >
> > > Thanks!
> > >
> > > --
> > > Nikita Ivanov
> > > Founder & CTO
> > > GridGain Systems
> > >
> > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh <rsht...@yahoo.com.invalid
> >
> > > wrote:
> > >
> > > NIkita,
> > >
> > > While this will help improve Ignite, it will prevent its adoption by
> many
> > > projects -- sending and retaining IP adresses, OS versions, etc. raises
> > > tons of questions when considering to use Ignite. Even if it can be
> opted
> > > out.
> > > -- Roman
> > >
> > >
> > >     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <
> nivano...@gmail.com>
> > > wrote:
> > >
> > >
> > >  Igniters,
> > > I would like to kick off the discussion on the idea of collecting
> Ignite
> > > usage statistics. The basic idea behind this is to better understand
> > > general and anonymous Ignite usage information to better calibrate
> > > community efforts in developing new features, improving existing ones,
> > > delivering better documentation - and in every other way to make our
> > > project a better software solution.
> > >
> > > Although such instrumentation is standard practice in commercially
> > > developed software, for an ASF project this could be a sensitive issue.
> > > Therefore I would like to initiate a full community discussion on how
> best
> > > to implement such practice for the benefit of project while ensuring
> the
> > > privacy protection of Ignite users.
> > >
> > > To ignite (pun intended) the discussion I'll outline below some of the
> > > basic thoughts that I have on this subject. They are here only to give
> an
> > > idea of what such instrumentation may potentially look like so that we
> can
> > > discuss the merits of this idea in a tangible context.
> > >
> > > Overview
> > > -------------
> > > Upon start and every hour thereafter each Ignite node will collect,
> encrypt
> > > and send usage statistics over HTTPS to the ASF-hosted server. That
> server
> > > will accept such HTTPS packets, decrypt them and store them in a
> > > time-series DB. A web interface will be provided to view the usage
> > > information.
> > >
> > > Opt-In or Opt-out
> > > -------------------------
> > > Opt-out. Ignite website will offer simple instructions (system
> property) on
> > > how to disable this instrumentation.
> > >
> > > Code, Infra, Access
> > > ---------------------------
> > > Ignite instrumentation will be part of the Ignite code base. The
> collection
> > > server will be a separate module in the Ignite code base (released
> > > separately from Ignite). The collection server will be hosted by ASF
> Infra.
> > >
> > > Usage statistics will be publicly accessible by anyone in the
> community.
> > >
> > > Private, Personal Data
> > > ------------------------------
> > > No private or personal data will ever be transferred. No emails,
> usernames,
> > > company names, grid names, etc.
> > >
> > > Data Retention
> > > --------------------
> > > All data will be retained for 1 year and deleted permanently
> thereafter.
> > >
> > > Usage Data
> > > ----------------
> > > The following data will be collected in each packet sent to the
> collection
> > > server:
> > > - GRID_SIZE (to correspond our testing environment with the more
> frequent
> > > cluster sizes)
> > > - IP_ADDR (for general geo-tracking as well as to know what
> documentation
> > > language should be a priority)
> > > - SES_ID (to track continues uptime vs. re-starts)
> > > - USERNAME_TYPE (privilege username vs. standard, to track production
> vs.
> > > dev/testing usage; note - this is not an actual username)
> > > - OS_NAME
> > > - OS_VER
> > > - OS_ARCH
> > > - JAVA_VER
> > > - JAVA_VENDOR
> > > - COMP_SQL (whether or not this feature was used)
> > > - COMP_COMPUTE (whether or not this feature was used)
> > > - COMP_DATAGRID (whether or not this feature was used)
> > > - COMP_STREAMING (whether or not this feature was used)
> > > - COMP_IGFS (whether or not this feature was used)
> > > - COMP_SERVICE (whether or not this feature was used)
> > > - COMP_PERSISTENCE (whether or not this feature was used)
> > >
> > > Please let's discuss this idea. Everyone's comments and suggestions are
> > > *extremely* welcome.
> > >
> > > Thanks,
> > > Nikita Ivanov.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
>

Reply via email to