Great. Thank you! -----Original Message----- From: Chris Mattmann [mailto:mattm...@apache.org] Sent: Friday, September 22, 2017 1:46 PM To: dev@tika.apache.org Subject: Re: TikaIO concerns
[dropping Beam on this] Tim, another thing is that you can finally download the TREC-DD Polar data either from the NSF Arctic Data Center (70GB zip), or from Amazon S3, as described here: http://github.com/chrismattmann/trec-dd-polar/ In case we want to use as part of our regression. Cheers, Chris On 9/22/17, 10:43 AM, "Allison, Timothy B." <talli...@mitre.org> wrote: >>1) We've gathered a TB of data from CommonCrawl and we run regression tests against this TB (thank you, Rackspace for hosting our vm!) to try to identify these problems. And if anyone with connections at a big company doing open source + cloud would be interested in floating us some storage and cycles, we'd be happy to move off our single vm to increase coverage and improve the speed for our large-scale regression tests. :D But seriously, thank you for this discussion and collaboration! Cheers, Tim