Yes, 2.6TB would make a nice regression test! Although a lot of the document will be similar, just names exchanged, I guess! :)
Dominik. On Wed, Apr 6, 2016 at 7:19 PM, Allison, Timothy B. <talli...@mitre.org> wrote: > Looks like quite a few MSG files! > > > -----Original Message----- > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] > Sent: Tuesday, April 05, 2016 6:47 PM > To: d...@tika.apache.org > Cc: pr...@apache.org > Subject: Apache Tika used to parse the Panama papers! > > FYI: > > http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak/?utm_campaign=ForbesTech&utm_source=TWITTER&utm_medium=social&utm_channel=Technology&linkId=23087770#709893771df5 > > > BTW I know Thomas and am in touch..he wrote an article about MEMEX last > year. > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) NASA Jet > Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Director, Information Retrieval and Data Science Group (IRDS) Adjunct > Associate Professor, Computer Science Department University of Southern > California, Los Angeles, CA 90089 USA > WWW: http://irds.usc.edu/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > >