Tim Barrett created TIKA-1464:
---------------------------------

             Summary: Too many open files in system when parsing thousands of 
files
                 Key: TIKA-1464
                 URL: https://issues.apache.org/jira/browse/TIKA-1464
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.6
         Environment: Os-X 10.10, Windows 8.1 (probably all op systems)
            Reporter: Tim Barrett
            Priority: Blocker


Our big data project parses many thousands of different kinds of files 
sequentially. Up to and including Tika 1.5 this has been trouble free and Tika 
has been a pleasure to use. The files parsed are PDF, MSOffice and MSG files in 
roughly equal measure.

We switched to Tika 1.6 last week and this was a good enhancement for us as a 
number of files (MSOffice) that previously failed to parse do now parse 
correctly under Tika 1.6.

However we have seen that a Too many open files in system exception is raised 
somewhere above 10000 files having been parsed. On a windows server this 
exception is not raised but the system eventually begins to crawl.

Watching the system's behaviour with the apache tmp files we see that the 
apache tika files *are* being deleted from the file system, but lsof is showing 
all these files as remaining open by the running process using Tika. It would 
appear that the files are being deleted but handles to these files are not 
being cleared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to