If you want to unsubscribe, please find instructions at http://apache.org/foundation/mailinglists.html
And the name of this list is dev@community.apache.org Cheers Niclas On Thu, May 7, 2015 at 7:48 AM, Betty James <bsquar...@gmail.com> wrote: > Oh my gosh. How do I get off this thread. don't know how I got on, but I > am just a totally ignorant individual using Open Office and trying to > donate (which doesn't sound necessary anymore)....so unless you are in good > shape and in your 70's try to figure out how I can get off the list! > > Betty B. James > > On Tue, May 5, 2015 at 7:33 AM, Boris Baldassari < > castalia.laborat...@gmail.com> wrote: > > > Hi Folks, > > > > Sorry for the late answer on this thread. Don't know what has been done > > since then, but I've some experience to share on this, so here are my > 2c.. > > > > * Parsing dates and time zones: > > If you are to use Perl, the Date::Parse module handles dates and time > > zones pretty well. As for Python I don't know -- there probably is a > module > > for that too.. > > I used Date::Parse to parse ASF mboxes (notably for Ant and JMeter, the > > data sets have been published here [0]), and it worked great. I do have a > > Perl script to do that, which I can provide -- but I have no access I'm > > aware of in the dev scm, and not sure if Perl is the most common language > > here.. so please let me know. > > > > * Parsing mboxes for software repository data mining: > > There is a suite of tools exactly targeted at this kind of duty on > github: > > Metrics Grimoire [1], developed (and used) by Bitergia [2]. I don't know > > how they manage time zones, but the toolsuite is widely used around (see > > [3] or [4] as examples) so I believe they are quite robust. It includes > > tools for data retrieval as well as visualisation. > > > > * As for the feedback/thoughts about the architecture and formats: > > I love the REST-API idea proposed by Rob. That's really easy to access > and > > retrieve through scripts on-demand. CSV and JSON are my favourite > formats, > > because they are, again, easy to parse and widely used -- every language > > and library has some facility to read them natively. > > > > > > Cheers, > > > > > > [0] http://castalia.solutions/datasets/ > > [1] https://metricsgrimoire.github.io/ > > [2] http://bitergia.com > > [3] Eclipse Dashboard: http://dashboard.eclipse.org/ > > [4] OpenStack Dashboard: http://activity.openstack.org/dash/browser/ > > > > > > > > -- > > Boris Baldassari > > Castalia Solutions -- Elegant Software Engineering > > Web: http://castalia.solutions > > Phone: +33 6 48 03 82 89 > > > > > > Le 28/04/2015 16:11, Rich Bowen a écrit : > > > >> > >> > >> On 04/27/2015 09:36 AM, Shane Curcuru wrote: > >> > >>> I'm interested in working on some visualizations of mailing list > >>> activity over time, in particular some simple analyses, like thread > >>> length/participants and the like. Given that the raw data can all be > >>> precomputed from mbox archives, is there any semi-standard way to > >>> distill and save metadata about mboxes? > >>> > >>> If we had a generic static database of past mail metadata and > statistics > >>> (i.e. not details of contents, but perhaps overall # of lines of text > or > >>> something), it would be interesting to see what kinds of visualizations > >>> that different people would come up with. > >>> > >>> Anyone have pointers to either a data format or the best parsing > library > >>> for this? I'm trying to think ahead, and work on the parsing, storing > >>> statistics, and visualizations as separate pieces so it's easier for > >>> different people to collaborate on something. > >>> > >> > >> Roberto posted something to the list a month or so ago about the efforts > >> that he's been working on for this kind of thing. You might ping him. > >> > >> --Rich > >> > >> > >> > > > -- Niclas Hedhman, Software Developer http://zest.apache.org - New Energy for Java