> On 05 May 2015, at 07:33, Boris Baldassari <castalia.laborat...@gmail.com> 
> wrote:
> 
> Hi Folks,
> 
> Sorry for the late answer on this thread. Don't know what has been done since 
> then, but I've some experience to share on this, so here are my 2c..
> 
> * Parsing dates and time zones:
> If you are to use Perl, the Date::Parse module handles dates and time zones 
> pretty well. As for Python I don't know -- there probably is a module for 
> that too..
> I used Date::Parse to parse ASF mboxes (notably for Ant and JMeter, the data 
> sets have been published here [0]), and it worked great. I do have a Perl 
> script to do that, which I can provide -- but I have no access I'm aware of 
> in the dev scm, and not sure if Perl is the most common language here.. so 
> please let me know.
> 
> * Parsing mboxes for software repository data mining:
> There is a suite of tools exactly targeted at this kind of duty on github: 
> Metrics Grimoire [1], developed (and used) by Bitergia [2]. I don't know how 
> they manage time zones, but the toolsuite is widely used around (see [3] or 
> [4] as examples) so I believe they are quite robust. It includes tools for 
> data retrieval as well as visualisation.
> 
> * As for the feedback/thoughts about the architecture and formats:
> I love the REST-API idea proposed by Rob. That's really easy to access and 
> retrieve through scripts on-demand. CSV and JSON are my favourite formats, 
> because they are, again, easy to parse and widely used -- every language and 
> library has some facility to read them natively.

I have to endorse Bitergia, too. If they don’t immediately have what is wanted, 
they are likely to be interested in working on it. But you know this, I’m 
guessing.

louis

> 
> 
> Cheers,
> 
> 
> [0] http://castalia.solutions/datasets/
> [1] https://metricsgrimoire.github.io/
> [2] http://bitergia.com
> [3] Eclipse Dashboard: http://dashboard.eclipse.org/
> [4] OpenStack Dashboard: http://activity.openstack.org/dash/browser/
> 
> 
> 
> --
> Boris Baldassari
> Castalia Solutions -- Elegant Software Engineering
> Web: http://castalia.solutions
> Phone: +33 6 48 03 82 89
> 
> 
> Le 28/04/2015 16:11, Rich Bowen a écrit :
>> 
>> 
>> On 04/27/2015 09:36 AM, Shane Curcuru wrote:
>>> I'm interested in working on some visualizations of mailing list
>>> activity over time, in particular some simple analyses, like thread
>>> length/participants and the like.  Given that the raw data can all be
>>> precomputed from mbox archives, is there any semi-standard way to
>>> distill and save metadata about mboxes?
>>> 
>>> If we had a generic static database of past mail metadata and statistics
>>> (i.e. not details of contents, but perhaps overall # of lines of text or
>>> something), it would be interesting to see what kinds of visualizations
>>> that different people would come up with.
>>> 
>>> Anyone have pointers to either a data format or the best parsing library
>>> for this?  I'm trying to think ahead, and work on the parsing, storing
>>> statistics, and visualizations as separate pieces so it's easier for
>>> different people to collaborate on something.
>> 
>> Roberto posted something to the list a month or so ago about the efforts 
>> that he's been working on for this kind of thing. You might ping him.
>> 
>> --Rich
>> 
>> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to