[ https://issues.apache.org/jira/browse/TIKA-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946836#comment-13946836 ]
David Pilato commented on TIKA-1165: ------------------------------------ Sounds like I never answered to your comment! Shame on me! :( In TIKA-1123 which is marked as fixed, we say that asciidoc should have {{text/x-asciidoc}} mimetype. That's one of the reason I thought that autodetection was working for asciidoc. About lib, it sounds like this one could help a lot here: https://github.com/asciidoctor/asciidoctorj#document-header I did not check any further though. Thanks! > Autodetect and parse Asciidoc > ----------------------------- > > Key: TIKA-1165 > URL: https://issues.apache.org/jira/browse/TIKA-1165 > Project: Tika > Issue Type: Wish > Components: languageidentifier, parser > Affects Versions: 1.4 > Reporter: David Pilato > Priority: Trivial > > When parsing asciidoc metadata, we currently get the following: > {noformat} > Content-Encoding: ISO-8859-1 > Content-Length: 66363 > Content-Type: text/plain; charset=ISO-8859-1 > resourceName: asciidoc.adoc > {noformat} > Steps to reproduce: > {code:title=asciidoc.sh|borderStyle=solid} > curl > https://raw.github.com/asciidoctor/asciidoctor.org/master/docs/asciidoc-syntax-quick-reference.adoc > -O -s > java -jar tika-app-1.4.jar -m asciidoc-syntax-quick-reference.adoc > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)