Marc Ubaldino created TIKA-3984:
-----------------------------------
Summary: Summarize Available Parsers as mapped to file types and
Maven artifacts
Key: TIKA-3984
URL: https://issues.apache.org/jira/browse/TIKA-3984
Project: Tika
Issue Type: Improvement
Components: documentation
Affects Versions: 2.7.0
Reporter: Marc Ubaldino
Documentation needed: discrete and clear list of Maven artifacts used to
configure a given Parser to handle a given file type.
User Question - To manipulate ".odt" file, what Parser do I use and what Maven
artifact should I choose? (Pick any file extension or media category). How
easy is it for non-Tika users or seasoned users to locate the answer?
Inspiration: [https://maven.apache.org/plugins/index.html] – Clear, concise.
Tika Resources:
* Parser listing:
[https://cwiki.apache.org/confluence/display/TIKA/Parsers]{color:#212121}
{color}
* Migration details for old Parsers:
[https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0]
* File type listing:
[https://tika.apache.org/2.7.0/formats.html#Full_list_of_Supported_Formats_in_standard_artifacts]
Some sort of table would be great for a lookup. 3-5 columns:
* Media type
* File extensions (MIME strings)
* Parser class
* Tika Maven coordinates to get Parser class
* Link in relevant how-to or examples behind Media type and Parser class
thank you,
Marc
// Tika user since 1.2
--
This message was sent by Atlassian Jira
(v8.20.10#820010)