All,
For some recent work on Apache Tika, I used commons-compress to
extract entry names and metadata via a streaming read from roughly
500k zip-based files we have in Tika's regression corpus.
I was happy to see we have some POI-generated files in there. :)
I noticed some areas for improveme
All,
I recently downloaded attachments from the following bug trackers:
COMPRESS, TIKA, PDFBox, POI, Open Office, Libre Office and ghostscript:
http://162.242.228.174/docs/bugtrackers/
I then unpackaged/uncompressed all of the package/compressed files so:
COMPRESS-115-1.zip is the second fil
@Compress devs,
We recently transitioned our vm to a new provider, and we're improving
the ASF-itude of this project. We recently started a new email list for
those interested in guiding and using the 2 TB of files that we've gathered
so far.
Please join corpora-...@tika.apache.org if you ha