I'm interested in working on some visualizations of mailing list activity over time, in particular some simple analyses, like thread length/participants and the like. Given that the raw data can all be precomputed from mbox archives, is there any semi-standard way to distill and save metadata about mboxes?
If we had a generic static database of past mail metadata and statistics (i.e. not details of contents, but perhaps overall # of lines of text or something), it would be interesting to see what kinds of visualizations that different people would come up with. Anyone have pointers to either a data format or the best parsing library for this? I'm trying to think ahead, and work on the parsing, storing statistics, and visualizations as separate pieces so it's easier for different people to collaborate on something. Thanks, - Shane