> On 05 May 2015, at 07:33, Boris Baldassari <castalia.laborat...@gmail.com> > wrote: > > Hi Folks, > > Sorry for the late answer on this thread. Don't know what has been done since > then, but I've some experience to share on this, so here are my 2c.. > > * Parsing dates and time zones: > If you are to use Perl, the Date::Parse module handles dates and time zones > pretty well. As for Python I don't know -- there probably is a module for > that too.. > I used Date::Parse to parse ASF mboxes (notably for Ant and JMeter, the data > sets have been published here [0]), and it worked great. I do have a Perl > script to do that, which I can provide -- but I have no access I'm aware of > in the dev scm, and not sure if Perl is the most common language here.. so > please let me know. > > * Parsing mboxes for software repository data mining: > There is a suite of tools exactly targeted at this kind of duty on github: > Metrics Grimoire [1], developed (and used) by Bitergia [2]. I don't know how > they manage time zones, but the toolsuite is widely used around (see [3] or > [4] as examples) so I believe they are quite robust. It includes tools for > data retrieval as well as visualisation. > > * As for the feedback/thoughts about the architecture and formats: > I love the REST-API idea proposed by Rob. That's really easy to access and > retrieve through scripts on-demand. CSV and JSON are my favourite formats, > because they are, again, easy to parse and widely used -- every language and > library has some facility to read them natively.
I have to endorse Bitergia, too. If they don’t immediately have what is wanted, they are likely to be interested in working on it. But you know this, I’m guessing. louis > > > Cheers, > > > [0] http://castalia.solutions/datasets/ > [1] https://metricsgrimoire.github.io/ > [2] http://bitergia.com > [3] Eclipse Dashboard: http://dashboard.eclipse.org/ > [4] OpenStack Dashboard: http://activity.openstack.org/dash/browser/ > > > > -- > Boris Baldassari > Castalia Solutions -- Elegant Software Engineering > Web: http://castalia.solutions > Phone: +33 6 48 03 82 89 > > > Le 28/04/2015 16:11, Rich Bowen a écrit : >> >> >> On 04/27/2015 09:36 AM, Shane Curcuru wrote: >>> I'm interested in working on some visualizations of mailing list >>> activity over time, in particular some simple analyses, like thread >>> length/participants and the like. Given that the raw data can all be >>> precomputed from mbox archives, is there any semi-standard way to >>> distill and save metadata about mboxes? >>> >>> If we had a generic static database of past mail metadata and statistics >>> (i.e. not details of contents, but perhaps overall # of lines of text or >>> something), it would be interesting to see what kinds of visualizations >>> that different people would come up with. >>> >>> Anyone have pointers to either a data format or the best parsing library >>> for this? I'm trying to think ahead, and work on the parsing, storing >>> statistics, and visualizations as separate pieces so it's easier for >>> different people to collaborate on something. >> >> Roberto posted something to the list a month or so ago about the efforts >> that he's been working on for this kind of thing. You might ping him. >> >> --Rich >> >> >
signature.asc
Description: Message signed with OpenPGP using GPGMail