[ https://issues.apache.org/jira/browse/TIKA-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085640#comment-16085640 ]
Tim Allison commented on TIKA-2428: ----------------------------------- Thank you, [~lfcnassif], for reporting this and finding the cause. >From the Javadocs for FileInputStream: {noformat} This method may skip more bytes than are remaining in the backing file. This produces no exception and the number of bytes skipped may include some number of bytes that were beyond the EOF of the backing file. Attempting to read from the stream after skipping past the end will result in -1 indicating the end of the file. {noformat} >From the Javadocs for InputStream: {noformat} The skip method may, for a variety of reasons, end up skipping over some smaller number of bytes, possibly 0. This may result from any of a number of conditions; reaching end of file before n bytes have been skipped is only one possibility. The actual number of bytes skipped is returned. {noformat} If bytes skipped is more than requested, we've hit EOF. If bytes skipped == 0, we need to test with a read, according to [guava|https://github.com/google/guava/blob/master/guava/src/com/google/common/io/ByteStreams.java#L779] > EMFParser loops forever with corrupted files > -------------------------------------------- > > Key: TIKA-2428 > URL: https://issues.apache.org/jira/browse/TIKA-2428 > Project: Tika > Issue Type: Bug > Components: parser > Affects Versions: 1.15, 1.16 > Reporter: Luis Filipe Nassif > Attachments: Carved-1285676.emf, Carved-1296288.emf, Carved-912866.emf > > > EMFParser hangs with the attached corrupted EMF files. > Sorry [~talli...@apache.org]! Just now having time to test against our > forensic test corpus... -- This message was sent by Atlassian JIRA (v6.4.14#64029)