[ https://issues.apache.org/jira/browse/TIKA-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883882#comment-17883882 ]
Tim Allison edited comment on TIKA-4314 at 9/23/24 1:26 PM: ------------------------------------------------------------ Tika's algorithm is to pick one parser per file type. I _think_ the above is as designed on purpose. We have special sorting so that non-tika parsers override tika parsers, and then we sort by class name within Tika parsers if there isn't a custom parser. We started some initial work on a "MultipleParser" that applies several parsers to a given file type for metadata extraction. The one concrete implementation we have is: https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/parser/multiple/SupplementingParser.java Note, though, that here, too, there's a single parser (the "MultipleParser") for a given file type. If the goal is to run both exiftool and ffmpeg on avi (for example), then you might be able to do something with the SupplementingParser (I haven't looked carefully, but I worry that that not play well with the ExternalParser), or maybe a combination of the SupplementingParser with {{o.a.t.parser.external2.ExternalParsers}} or you may need to write your own parser. was (Author: talli...@mitre.org): Tika's algorithm is to pick one parser per file type. I _think_ the above is as designed on purpose. We have special sorting so that non-tika parsers override tika parsers, and then we sort by class name within Tika parsers if there isn't a custom parser. We started some initial work on a "MultipleParser" that applies several parsers to a given file type for metadata extraction. The one concrete implementation we have is: https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/parser/multiple/SupplementingParser.java Note, though, that here, too, there's a single parser (the "MultipleParser" for a given file type. If the goal is to run both exiftool and ffmpeg on avi (for example), then you might be able to do something with the SupplementingParser (I haven't looked carefully, but I worry that that not play well with the ExternalParser), or maybe a combination of the SupplementingParser with {{o.a.t.parser.external2.ExternalParsers}} or you may need to write your own parser. > CompositeParser returns only one parser per content type > -------------------------------------------------------- > > Key: TIKA-4314 > URL: https://issues.apache.org/jira/browse/TIKA-4314 > Project: Tika > Issue Type: Bug > Components: core > Affects Versions: 2.9.2 > Reporter: Leszek Sliwko > Priority: Major > > External parsers can have many supported content types, but information is > lost in CompositeParser: > > public Map<MediaType, Parser> getParsers(ParseContext context) { > Map<MediaType, Parser> map = new HashMap<>(); > for (Parser parser : parsers) { > for (MediaType type : parser.getSupportedTypes(context)) > { map.put(registry.normalize(type), parser); } > } > return map; > } > > To recreate - parse any avi file (content type: video/x-msvideo), Only the > exiftool will by picked up and the ffmpeg parser won't be executed. -- This message was sent by Atlassian Jira (v8.20.10#820010)