[ 
https://issues.apache.org/jira/browse/TIKA-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17883882#comment-17883882
 ] 

Tim Allison edited comment on TIKA-4314 at 9/23/24 1:26 PM:
------------------------------------------------------------

Tika's algorithm is to pick one parser per file type. I _think_ the above is as 
designed on purpose. We have special sorting so that non-tika parsers override 
tika parsers, and then we sort by class name within Tika parsers if there isn't 
a custom parser.

We started some initial work on a "MultipleParser" that applies several parsers 
to a given file type for metadata extraction.  The one concrete implementation 
we have is: 
https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/parser/multiple/SupplementingParser.java
 Note, though, that here, too, there's a single parser (the "MultipleParser") 
for a given file type.

If the goal is to run both exiftool and ffmpeg on avi (for example), then you 
might be able to do something with the SupplementingParser (I haven't looked 
carefully, but I worry that that not play well with the ExternalParser), or 
maybe a combination of the SupplementingParser with 
{{o.a.t.parser.external2.ExternalParsers}} or you may need to write your own 
parser.




was (Author: talli...@mitre.org):
Tika's algorithm is to pick one parser per file type. I _think_ the above is as 
designed on purpose. We have special sorting so that non-tika parsers override 
tika parsers, and then we sort by class name within Tika parsers if there isn't 
a custom parser.

We started some initial work on a "MultipleParser" that applies several parsers 
to a given file type for metadata extraction.  The one concrete implementation 
we have is: 
https://github.com/apache/tika/blob/main/tika-core/src/main/java/org/apache/tika/parser/multiple/SupplementingParser.java
 Note, though, that here, too, there's a single parser (the "MultipleParser" 
for a given file type.

If the goal is to run both exiftool and ffmpeg on avi (for example), then you 
might be able to do something with the SupplementingParser (I haven't looked 
carefully, but I worry that that not play well with the ExternalParser), or 
maybe a combination of the SupplementingParser with 
{{o.a.t.parser.external2.ExternalParsers}} or you may need to write your own 
parser.



> CompositeParser returns only one parser per content type
> --------------------------------------------------------
>
>                 Key: TIKA-4314
>                 URL: https://issues.apache.org/jira/browse/TIKA-4314
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.9.2
>            Reporter: Leszek Sliwko
>            Priority: Major
>
> External parsers can have many supported content types, but information is 
> lost in CompositeParser:
>  
> public Map<MediaType, Parser> getParsers(ParseContext context) {
>   Map<MediaType, Parser> map = new HashMap<>();
>   for (Parser parser : parsers) {
>     for (MediaType type : parser.getSupportedTypes(context))
> {        map.put(registry.normalize(type), parser); }
>    }
>    return map;
> }
>  
> To recreate - parse any avi file (content type: video/x-msvideo), Only the 
> exiftool will by picked up and the ffmpeg parser won't be executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to