[ https://issues.apache.org/jira/browse/TIKA-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17884572#comment-17884572 ]
Leszek Sliwko commented on TIKA-4314: ------------------------------------- Currently, FFmpeg is ignored in favor of ExifTool for analyzing AVI files (video/x-msvideo) - which is obviously not a desired result. I also recall that this feature worked fine in 2019 when I first started using Tika. I'm not knowledgeable enough to say whether selecting only one parser per content type is the correct approach, but there are numerous command-line tools capable of extracting much more metadata. For example, I added the SoX parser to extract the duration of WAV files, which none of the previous parsers could do. If changing this behavior at the design level, i.e., running all parsers that support a given content type and merging the results, is not possible, I would suggest implementing this at least for external parsers. The {{CompositeExternalParser}} seems like a good starting point. > CompositeParser returns only one parser per content type > -------------------------------------------------------- > > Key: TIKA-4314 > URL: https://issues.apache.org/jira/browse/TIKA-4314 > Project: Tika > Issue Type: Bug > Components: core > Affects Versions: 2.9.2 > Reporter: Leszek Sliwko > Priority: Major > > External parsers can have many supported content types, but information is > lost in CompositeParser: > > public Map<MediaType, Parser> getParsers(ParseContext context) { > Map<MediaType, Parser> map = new HashMap<>(); > for (Parser parser : parsers) { > for (MediaType type : parser.getSupportedTypes(context)) > { map.put(registry.normalize(type), parser); } > } > return map; > } > > To recreate - parse any avi file (content type: video/x-msvideo), Only the > exiftool will by picked up and the ffmpeg parser won't be executed. -- This message was sent by Atlassian Jira (v8.20.10#820010)