[ https://issues.apache.org/jira/browse/TIKA-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-2542: ------------------------------ Fix Version/s: (was: 2.0.0) 2.0.0-BETA > Support in tika-server for getting plain text and metadata at the same time > --------------------------------------------------------------------------- > > Key: TIKA-2542 > URL: https://issues.apache.org/jira/browse/TIKA-2542 > Project: Tika > Issue Type: Improvement > Components: core, server > Affects Versions: 1.17 > Reporter: Manolo Caracuel > Priority: Minor > Labels: pull-request-available > Fix For: 2.0.0-BETA > > Original Estimate: 48h > Remaining Estimate: 48h > > It would be good to have a way to get a files plain text extracted and also > get the metadata detected. Currently you can only get the metadata if the > request has Accepts of text/xml or text/html but then the text in the body is > not the plain text as it contains html elements as well. > I propose that when requesting /tika/plain with Accepts header of text/xml, > an xhtml document is returned with the metadata in head's meta elements and > the plain text in the body. -- This message was sent by Atlassian Jira (v8.3.4#803005)