+1 to Ken’s earlier point about maintenance. Note Tika wouldn’t even build in Germany, and we only discovered that because of inviting Tilman. :D We have a huge amount of maintenance already...
Checkout the incubating Daffodil project that aims to convert files to xml, validate them and then serialize back to original format. I do see a use for transform() and if we could use xhtml as an intermediary, then...maybe, but My inclination is w Ken. On Wed, Oct 16, 2019 at 11:50 AM Ken Krugler <kkrug...@apache.org> wrote: > I can see the attraction of one API to convert XHTML to various formats. > > Though very quickly that simple API would become complex, as each target > format has its own conversion options. > > And if successful, we’d pull in even more 3rd party jars to handle that > conversion. > > Wonder if there’s a need for a new project called “Akit”, which focuses on > XHTML -> various formats :) > > — Ken > > > On Oct 16, 2019, at 5:05 AM, Sergey Beryozkin <sberyoz...@gmail.com> > wrote: > > > > Ken, thanks for the feedback, I meant to reply to your comments, > > > > I suppose I really meant Tika offering a uniform API to create some > simple > > structured PDF/etc files. > > ContentCreator creator = ContentCreator.get("PDF"); > > creator.addTitle("Introduction to Tika"); > > creator.addText(""); > > creator.addTable("tablename", new LinkedHashMap<String, List<String>>()); > > creator.addAttachment(someImage); > > creator.complete(); > > > > It would be consistent with the Tika approach on the read side. > > > > Cheers, Sergey > > On Mon, Oct 14, 2019 at 4:13 PM Ken Krugler <kkrug...@apache.org> wrote: > > > >> If you’re suggesting ways to make it easier to use something like > >> YaHPConverter with Tika, definitely yes. > >> > >> If you’re talking about integrating this functionality…my personal view > is > >> no. > >> > >> I think Tika should focus on extracting content from documents, versus > >> format transformations. > >> > >> Tika is an attractive location for functionality like this, since it > sits > >> in the middle of a lot of data processing pipelines, but I worry about a > >> bloated code base, with corresponding challenges in maintenance and > support. > >> > >> Regards, > >> > >> — Ken > >> > >> > >>> On Oct 14, 2019, at 4:38 AM, Sergey Beryozkin <sberyoz...@gmail.com> > >> wrote: > >>> > >>> Hi All > >>> > >>> I've seen a Quarkus user asking how to convert to PDF, and one of my > >>> colleagues pointed to > >>> > >> > http://www.allcolor.org/YaHPConverter/doc/org/allcolor/yahp/converter/IHtmlToPdfTransformer.html > >>> > >>> Does it make sense for Tika to offer something related to the text to > PDF > >>> (for a start, something on top of that transformer), and then may be > even > >>> for other formats ? > >>> > >>> Sergey > >> > >> -------------------------- > >> Ken Krugler > >> http://www.scaleunlimited.com > >> custom big data solutions & training > >> Hadoop, Cascading, Cassandra & Solr > >> > >> > > -------------------------- > Ken Krugler > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Cassandra & Solr > >