[
https://issues.apache.org/jira/browse/TIKA-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17728138#comment-17728138
]
Tim Allison commented on TIKA-3941:
-----------------------------------
I started a branch to work on this:
https://github.com/apache/tika/tree/TIKA-3941
Need to figure out if we can (or are?) bypassing double detection -- we're
calling detect and then parse.
Need to add unit tests for digesting.
Need to figure out if we are bypassing double digesting...or how to make that
the default option.
> Consider having pipesserver return intermediate results
> -------------------------------------------------------
>
> Key: TIKA-3941
> URL: https://issues.apache.org/jira/browse/TIKA-3941
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Minor
>
> If the pipes server crashes, the only information that the pipesclient
> receives is of the crash. It would be useful at a minimum to have the pipes
> server report an intermediate result after file detection.
> Ideally, at a minimum, the pipesclient could report file type, content-length
> (if possible) and digest information.
>
> On another ticket (future work), we could extend intermediate results to
> include partial parses/metadata extraction. The challenge here is that the
> underlying metadata objects are not thread safe...so we'll punt this to deal
> with later if necessary.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)