Nice overview, would be interesting to watch, especially the slides about
the "old days"!

Ad (9) there were a few other bugs that popped up regularly, unicode
handling, duplicate classes which forced us to unattic it
Ad (7) results for POI regression tests are at
http://people.apache.org/~centic/poi_regression/reports/ if you want to add
a link

Dominik


On Mon, Oct 15, 2018, 03:50 Dave Fisher <[email protected]> wrote:

> Hi -
>
> I’ve come with the plan for my POI talk next weekend. I need to finalize
> my slides tomorrow so that some Chinese translation can be done. I have
> some questions that I’ll mark as “—>”. If you can answer you’ll save me
> some research.
>
> I plan to tell the story of POI, including Tika interactions, and Common
> Crawler, in the end I want to give people two places to contribute along
> with motivation.
>
> (1) Title
>         Name of presentation
>         About Dave
> (2) POI
>         When it started in Jakarta the simple use case.
>         End of Jakarta
> (3) OOXML and the Microsoft Open Specification Promise
>         The OSP
>         The flame war
>         OpenXML4J -
> http://incubator.apache.org/ip-clearance/openxml4j.html <
> http://incubator.apache.org/ip-clearance/openxml4j.html>
>         XSSF, XSLF, and SS
> (4) Tika and OOXML lite
>         Apachecon Oakland 2009 - Jukka asked Nick, Yegor and I during
> BarCamp if we could something about the 13MB ooxml jar. Yegor came up with
> a solution in a day.
>         Unit Test and your Beans are included
>         —> Anyone: anything to add? XMLBeans impacts?
> (5) Graphics2D
>         Discuss output techniques developed.
>         —> Yegor - is there some sample code you might share.
> (6) Tika Text Extraction
>         —> Could use pointers to the basic tutorial.
> (7) Common Crawler - 1TB of samples
>         Common Crawler - commoncrawl.org
>         Common Crawler Download - centic9
>         Regression sets for POI, Tika and PDFBox
>         —> Are there other Apache projects that use these documents?
> (8) The POI Toolbox
>         A table of the various formats with input, output, and remarks.
> (9) XMLBeans 3
>         Bringing the product out of the attic.
>         —> Any reasons besides better control of Entity Expansion attacks?
> (10) Contributing to POI and Tika Will Improve Your Solr Search Results
>         How Solr and similar architectures depend on Tika and Tika depends
> on POI
>         Example is Headers and Footers choices on Word documents on the
> Tika List this past week.
>
> Thanks for your help and feedback!
>
> Regards,
> Dave
>
>
>

Reply via email to