The line of code which I highlighted in the screenshot is within the spark
source code. Spark implements sort-based shuffle implementation and the
spilled files are merged using the merge sort.
Here is the link
https://issues.apache.org/jira/secure/attachment/12655884/Sort-basedshuffledesign.pdf
w
If you're filling up the number of open files, odds are there's one code
path that's opening most of these files. If that's the case, these files
will likely be named similarly and easy to pick out if you just sort the
output of "lsof" once you find the group that is clearly the largest, you
can th
Running 'lsof' will let us know the open files but how do we come to know
the root cause behind opening too many files.
Thanks,
Padma CH
On Wed, Jan 6, 2016 at 8:39 AM, Hamel Kothari
wrote:
> The "Too Many Files" part of the exception is just indicative of the fact
> that when that call was mad
The "Too Many Files" part of the exception is just indicative of the fact
that when that call was made, too many files were already open. It doesn't
necessarily mean that that line is the source of all of the open files,
that's just the point at which it hit its limit.
What I would recommend is to
Yes, the fileinputstream is closed. May be i didn't show in the screen shot
.
As spark implements, sort-based shuffle, there is a parameter called
maximum merge factor which decides the number of files that can be merged
at once and this avoids too many open files. I am suspecting that it is
somet