I suppose a course estimate is also a coarse one [;<).

If we need to go beyond analyzing what is revealed in reports to the project, 
there remains the prospect for instrumentation.

I am not certain that we have the resources to do that.  So this is a 
thought-experiment.

INSTRUMENTATION

There is no instrumentation of Apache OpenOffice at present.  There is an 
existing path to doing so.  It already provides a crude measurement, if used.  
There are ways that, with adjustment of the software, more useful data could be 
obtained.  Producing and capturing the information involves development work.  
And any collection of such data must be kept anonymous, while recognizing data 
from the same installation.

Instrumentation can require considerable work and large databases for the 
captured information.  There might not be sufficient capacity to undertake any 
degree of instrumentation in the face of higher-priority needs.

The following note is a bit over-engineered.  It is simpler if we do not need 
to differentiate data sources at all, but that might not get us what we need.

How important is knowing what we could find out about usage patterns this way?

  1. Privacy of Data Collection
     It is possible to instrument the software to collect certain data, such as 
the numbers and formats of files opened and saved-as since the previous 
collection of data from a source.  This requires additions to the software to 
accumulate such information and to the servers receiving the request for 
capturing the information.
     Some data might need to be longitudinal, with data captured at different 
times from the same source recognized and combined in some way.  This allows 
quite different patterns of usage to be distinguished and not lumped together 
in a single mass, if that becomes important.
     This means that the source of the data must be anonymized in some manner 
that still allows data from the same copy of Apache OpenOffice to recognized, 
but without recording of anything that allows the captured data to be traced 
back to the originating source. 
     All of this involves substantial careful development.  The means for 
prevention of identifying sources must be carefully managed.  It must also be 
possible to protect the data collection procedure from exploits and denial of 
service attacks.

  2. Update Checking as a Data Source.  When installed copies of Apache 
OpenOffice conduct an automatic or manual check for updates, that is a source 
of information.  Unqualified, it is an indication that an installed copy of the 
software is being used in some manner.  
     Update checks are only useful, however, if pings estimated to be from the 
same installation are distinguishable.  The crudest measure is simply the date 
and time of the latest ping from the same (estimated) source, along with the 
version of Apache OpenOffice being used.  This could be captured without any 
modification of the existing software package. 
     To distinguish sources, it may be necessary to keep a database with up to 
50-100 million records that identifies information from each source without 
revealing that source.  The same principle is needed if additional data is 
provided as part of check-for-update requests from the software.
     Information in the currently-implemented HTTP request can be used to 
estimate when requests are from the same source.  To preserve anonymity of the 
source, that information can be transformed into a cryptographic hash that 
cannot be used to determine the original source but can be used to determine a 
match with a previously-captured ping.  This is a coarse arrangement.

  3. Specific Instrumentation.  If future releases were modified to collect and 
report usage data (with appropriate opt-in as part of the configuration 
set-up), that data could be attached to checks-for-updates when allowed.  To 
accumulate patterns over time, accumulation of data is best tied to user 
profiles.  By generating a statistically-unique cryptographically-random 
identifier as part of each user profile that is initialized, that can be used 
to recognize instrumentation from the same profile.  When the data is 
collected, the identifier is used in making the cryptographic hash in (2) and 
then discarded.  



> -----Original Message-----
> From: Dennis E. Hamilton [mailto:orc...@apache.org]
> Sent: Sunday, November 22, 2015 11:50
> To: dev@openoffice.apache.org
> Subject: [QUESTIONS] How Is Apache OpenOffice Used (was Apache
> OpenOffice ODF in the Marketplace ...)
> 
> I have changed the topic because Marketplace is misleading -- the AOO
> Project is not so much a participant in a market system.  Yet it is
> useful to determine who our public community is and what the adopters of
> Apache OpenOffice are doing with it.
> 
> We have the statistics below as a course estimate of the size of the
> active AOO community, our public.
> 
> The original question was, how important is ODF to those adopters?
> 
> That's an answer that is more likely to be found by asking "What are the
> adopters doing with their copies of Apache OneOffice?  In particular,
> what document formats are they using and to what relative degree?"
> 
> We have no way to know that directly at the moment.
> 
> There is one immediately-available source.
> 
> REPORTS TO US
> 
> What we know the most about what folks are doing with Apache OpenOffice
> comes from what the patterns of complaints are.  These can arise in
> questions to lists dev@ and users@, in filing of Bugzilla reports (or
> commenting on existing ones), and in comments on the Community Forums.
> 
> We can use those to determine more narrowly on what users on what
> platforms are reporting and what they are reporting about.  This
> provides evidence of what is found to be important enough to make the
> effort to report.  That is important all by itself.  It is a clue to
> what others may be experiencing and do not choose or known to report.
> 
> A subset of these reports may hinge on particular document formats and
> interchange/interoperability experiences with document formats.  My
> unqualified impression is that interchange via Microsoft Office formats
> will dominate, just as Microsoft Windows users are predominant among the
> population of AOO adopters.  It will be interesting to identify the ODF-
> related matters that also come up and what the balance is.
> 
> It is not easy to analyze this source mechanically but it is possible to
> do some manual "analytics" of various kinds.
> 
> Is this worth doing?
> 
> Of what value would digging this information out at an initial level of
> detail be?
> 
> We could probably look at a couple of month's data for clues and then
> examine a longer period if it seems profitable.
> 
> 
> > -----Original Message-----
> > From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org]
> > Sent: Sunday, November 8, 2015 22:19
> > To: dev@openoffice.apache.org
> > Subject: [REPORT] Apache OpenOffice ODF in the Marketplace - AOO 4.1.1
> > downloads
> >
> > Here are updates of the downloads for Apache OpenOffice 4.1.1, now
> that
> > 4.1.2 is being distributed by the mirror system.
> >
> > From Sourceforge,
> >
> <http://sourceforge.net/projects/openofficeorg.mirror/files/4.1.1/stats/
> > os?dates=2014-08-01+to+2015-11-08>
> >
> > Just shy of 50,000,000 downloads.  This number will be exceeded as
> older
> > versions will still continue downloading, although at an ever-
> decreasing
> > rate.
> >
> >    87.7% for Windows
> >     9.0% for Macintosh (0.1% small drop from end of August)
> >     3.3% for everything else, including Linux
> >
> > For the different countries in the same period (53.6 million for all
> > distributions, not just 4.1.1), the breakdown can be found here:
> >
> <http://sourceforge.net/projects/openofficeorg.mirror/files/stats/map?da
> > tes=2014-08-01+to+2015-11-09>.
> >
> > It is cool that there were 3 to Antartica: 2 for Windows, 1 for
> > Macintosh.
> >
> >  - Dennis
> >
> >
> >
> > > -----Original Message-----
> > > From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org]
> > > Sent: Wednesday, September 23, 2015 18:38
> > > To: dev@openoffice.apache.org
> > > Subject: RE: [DISCUSS] Apache OpenOffice ODF in the Marketplace -
> > > Downloading
> > >
> [ ... ]
> > > What is more difficult to determine is what folks are actually doing
> > > with Apache OpenOffice.  There may be ways to learn more.
> > >
> > >  - Dennis
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Dennis E. Hamilton [mailto:dennis.hamil...@acm.org]
> > > Sent: Friday, September 4, 2015 20:01
> > > To: dev@openoffice.apache.org
> > > Subject: [DISCUSS] Apache OpenOffice ODF in the Marketplace
> > >
> > > I had not encountered the topic of "ODF in the market place" with
> > regard
> > > to status of Apache OpenOffice.  Perhaps I have not been paying
> > > attention.
> > >
> > > I am curious how we might characterize how support for ODF matters
> to
> > > Apache OpenOffice users and various institutions that value support
> > for
> > > ODF in their reliance on Apache OpenOffice and related software.
> > >
> > > How can we determine what the influence of ODF is with respect to
> > Apache
> > > OpenOffice?
> > >
> > > It strikes me there are two parts to this question.
> > >
> > >  1. Who are the users of Apache OpenOffice?
> > >
> > >  2. What are the ways ODF is (comparatively) significant to those
> > users?
> > >
> > > [ ... ]
> > >
> > > WHO ARE THE USERS?
> > >
> > > Although there are now over 150 million downloads of Apache
> > OpenOffice,
> > > that does not tell us how many individual users are involved.
> > >
> > > Perhaps the download counts just for AOO 4.1.1 would be a
> > representable
> > > sample of a particularly-active segment of the user base, even
> though
> > > that would be underestimated a couple of ways.  But that, and the
> > > average weekly rate would be useful as "at least" figures.
> > >
> > > The mix of platforms for those downloads is also important,
> reflecting
> > > the context in which those installed downloads are used by new users
> > and
> > > those who are keeping their configurations current.
> > >
> > >
> > > [ ... ]
> > >
> > >
> > > --------------------------------------------------------------------
> -
> > > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> > > For additional commands, e-mail: dev-h...@openoffice.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> > For additional commands, e-mail: dev-h...@openoffice.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
> For additional commands, e-mail: dev-h...@openoffice.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@openoffice.apache.org
For additional commands, e-mail: dev-h...@openoffice.apache.org

Reply via email to