Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-06 Thread Priya Ch
The line of code which I highlighted in the screenshot is within the spark source code. Spark implements sort-based shuffle implementation and the spilled files are merged using the merge sort. Here is the link https://issues.apache.org/jira/secure/attachment/12655884/Sort-basedshuffledesign.pdf w

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-06 Thread Hamel Kothari
If you're filling up the number of open files, odds are there's one code path that's opening most of these files. If that's the case, these files will likely be named similarly and easy to pick out if you just sort the output of "lsof" once you find the group that is clearly the largest, you can th

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-06 Thread Priya Ch
Running 'lsof' will let us know the open files but how do we come to know the root cause behind opening too many files. Thanks, Padma CH On Wed, Jan 6, 2016 at 8:39 AM, Hamel Kothari wrote: > The "Too Many Files" part of the exception is just indicative of the fact > that when that call was mad

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-05 Thread Hamel Kothari
The "Too Many Files" part of the exception is just indicative of the fact that when that call was made, too many files were already open. It doesn't necessarily mean that that line is the source of all of the open files, that's just the point at which it hit its limit. What I would recommend is to

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2016-01-05 Thread Priya Ch
Yes, the fileinputstream is closed. May be i didn't show in the screen shot . As spark implements, sort-based shuffle, there is a parameter called maximum merge factor which decides the number of files that can be merged at once and this avoids too many open files. I am suspecting that it is somet