[ https://issues.apache.org/jira/browse/TIKA-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893906#comment-17893906 ]
Leszek Sliwko commented on TIKA-4314: ------------------------------------- I’m glad you like it. I understand the concept of one parser-per-file for other parsers - while it makes sense to run several for metadata, there could be issues with scraping content. However, using the {{SupplementingParser}} with {{ExternalParsers}} definitely makes sense, and possibly with {{TesseractOCRParser}} as well. I’ve created my own side routine in the code to run all parsers after the main Tika parser, so this isn’t an issue. Also, if you are introducing {{SupplementingParser}} for {{{}ExternalParsers{}}}, it would make sense to update {{ParserUtils}} as well: {code:java} public static void recordParserDetails(Parser parser, Metadata metadata) { List<String> parserClassNames; if (parser instanceof AbstractMultipleParser abstractMultipleParser) { parserClassNames = abstractMultipleParser.getAllParsers().stream().map(ParserUtils::getParserClassname).toList(); } else { parserClassNames = List.of(getParserClassname(parser)); } parserClassNames.forEach(className -> recordParserDetails(className, metadata)); } public static String getParserClassname(Parser parser) { if (parser instanceof ExternalParser externalParser) { return externalParser.getClass().getName() + "(" + Arrays.toString(externalParser.getCommand()) + ")"; } else if (parser instanceof ParserDecorator parserDecorator) { return parserDecorator.getWrappedParser().getClass().getName(); } else { return parser.getClass().getName(); } }{code} > CompositeParser returns only one parser per content type > -------------------------------------------------------- > > Key: TIKA-4314 > URL: https://issues.apache.org/jira/browse/TIKA-4314 > Project: Tika > Issue Type: Bug > Components: core > Affects Versions: 2.9.2 > Reporter: Leszek Sliwko > Priority: Major > Attachments: CompositeParser.java, duration-test-2.avi, > geolocation-test-1.jpg, geolocation-test-2.jpg > > > External parsers can have many supported content types, but information is > lost in CompositeParser: > > public Map<MediaType, Parser> getParsers(ParseContext context) { > Map<MediaType, Parser> map = new HashMap<>(); > for (Parser parser : parsers) { > for (MediaType type : parser.getSupportedTypes(context)) > { map.put(registry.normalize(type), parser); } > } > return map; > } > > To recreate - parse any avi file (content type: video/x-msvideo), Only the > exiftool will by picked up and the ffmpeg parser won't be executed. -- This message was sent by Atlassian Jira (v8.20.10#820010)