[ https://issues.apache.org/jira/browse/TIKA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407650#comment-16407650 ]
Hudson commented on TIKA-2585: ------------------------------ FAILURE: Integrated in Jenkins build tika-2.x-windows #220 (See [https://builds.apache.org/job/tika-2.x-windows/220/]) TIKA-2585 Support for creating a TikaInputStream from a Factory that (nick: rev 682c38db038df7d3e55189623bdc8efb7eb0d0fd) * (add) tika-core/src/main/java/org/apache/tika/io/InputStreamFactory.java * (edit) tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java * (edit) tika-core/src/test/java/org/apache/tika/io/TikaInputStreamTest.java > TikaInputStream support for resetting via a factory of InputStreams > ------------------------------------------------------------------- > > Key: TIKA-2585 > URL: https://issues.apache.org/jira/browse/TIKA-2585 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 2.0, 1.17 > Reporter: Nick Burch > Priority: Major > Fix For: 1.18 > > > As raised in the 2.0 breaking changes thread, currently the only way that > Tika has of handling the need to fully read an InputStream multiple times is > to use {{TikaInputStream.getFile()}} which will spool to a temp file if not > already file-based. (Reading a few kb is handled via buffering and > mark/reset, but that doesn't scale for huge full files) > In some cases, grabbing a fresh {{InputStream}} is actually cheaper than Tika > spooling to a temp file, but we've no way of a caller expressing that > So, before we make too much extra use of re-processing the whole input > several times (eg for the augmenting-parsers and fallback-parsers), we should > provide a way for callers to instead supply new {{InputStream}} instances on > demand -- This message was sent by Atlassian JIRA (v7.6.3#76005)