[ 
https://issues.apache.org/jira/browse/TIKA-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17942007#comment-17942007
 ] 

Tim Allison edited comment on TIKA-4395 at 4/8/25 9:03 PM:
-----------------------------------------------------------

As you pointed out, this is tricky.

If the caller uses a TikaInputStream, then we can eventually cache to disk if 
needed without a problem and then run detection and parsing on the tmp file.

If the caller uses any other type of InputStream, we have to set some limit on 
that so that we can reset it before the parse. 

In 3.x, should we log a warning if a user uses a non-TikaInputStream, and then 
in 4.x require a TikaInputStream for detectors?

Are there better options?


was (Author: talli...@mitre.org):
As you pointed out, this is tricky.

If the caller uses a TikaInputStream, then we can eventually cache to disk 
without a problem and then run detection and parsing on the tmp file.

If the caller uses any other type of InputStream, we have to set some limit on 
that so that we can reset it before the parse. 

In 3.x, should we log a warning if a user uses a non-TikaInputStream, and then 
in 4.x require a TikaInputStream for detectors?

Are there better options?

> cannot get any slide content for pptx file
> ------------------------------------------
>
>                 Key: TIKA-4395
>                 URL: https://issues.apache.org/jira/browse/TIKA-4395
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 2.9.3, 3.1.0
>            Reporter: james
>            Assignee: Tim Allison
>            Priority: Major
>
> i have a reasonably large pptx file from which i don't get any slide content. 
>  i get slide notes, and some ocr from embedded images, but not the slide 
> content itself.  unfortunately, i cannot share the file, but i can answer 
> questions about it if necessary (and can probably share some of the internal 
> structure related files). 
>  
> using poi 5.4.0, not in streaming mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to