[ https://issues.apache.org/jira/browse/TIKA-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893880#comment-17893880 ]
Tim Allison commented on TIKA-4314: ----------------------------------- Got it. Thank you. I like what you've done. There are a few challenges with this route. The default legacy ExternalParser that is loaded by TikaConfig or by default is a CompositeParser. If we change the behavior of the CompositeParser, that will have unintended consequences on other combinations of parsers. The basic design in Tika is one parser per file type. Another issue is that this relies on the legacy ExternalParser which wraps a number of external parsers. We're moving towards the more robust and flexible {{o.a.t.p.external2.ExternalParser}}. So, I don't think we want to have such a major change hinging on something that will be deprecated in 3.x and removed by 4.x (maybe? depending on community discussions/feedback). I think it would be much better to use the SupplementingParser, and have it wrap the ExternalParsers that you want. If we head in this direction, what will it take to get this working for you? Are you able to configure your parsers programmatically, or are you using tika-server or something else where you need to configure the parsers via tika-config.xml? > CompositeParser returns only one parser per content type > -------------------------------------------------------- > > Key: TIKA-4314 > URL: https://issues.apache.org/jira/browse/TIKA-4314 > Project: Tika > Issue Type: Bug > Components: core > Affects Versions: 2.9.2 > Reporter: Leszek Sliwko > Priority: Major > Attachments: CompositeParser.java, duration-test-2.avi, > geolocation-test-1.jpg, geolocation-test-2.jpg > > > External parsers can have many supported content types, but information is > lost in CompositeParser: > > public Map<MediaType, Parser> getParsers(ParseContext context) { > Map<MediaType, Parser> map = new HashMap<>(); > for (Parser parser : parsers) { > for (MediaType type : parser.getSupportedTypes(context)) > { map.put(registry.normalize(type), parser); } > } > return map; > } > > To recreate - parse any avi file (content type: video/x-msvideo), Only the > exiftool will by picked up and the ffmpeg parser won't be executed. -- This message was sent by Atlassian Jira (v8.20.10#820010)