If you're filling up the number of open files, odds are there's one code
path that's opening most of these files. If that's the case, these files
will likely be named similarly and easy to pick out if you just sort the
output of "lsof" once you find the group that is clearly the largest, you
can then backtrack into the source and find the line of code that is
creating them by searching for the filename/folder name in the source.

On Wed, Jan 6, 2016 at 4:00 AM Priya Ch <learnings.chitt...@gmail.com>
wrote:

> Running 'lsof' will let us know the open files but how do we come to know
> the root cause behind opening too many files.
>
> Thanks,
> Padma CH
>
> On Wed, Jan 6, 2016 at 8:39 AM, Hamel Kothari <hamelkoth...@gmail.com>
> wrote:
>
>> The "Too Many Files" part of the exception is just indicative of the fact
>> that when that call was made, too many files were already open. It doesn't
>> necessarily mean that that line is the source of all of the open files,
>> that's just the point at which it hit its limit.
>>
>> What I would recommend is to try to run this code again and use "lsof" on
>> one of the spark executors (perhaps run it in a for loop, writing the
>> output to separate files) until it fails and see which files are being
>> opened, if there's anything that seems to be taking up a clear majority
>> that might key you in on the culprit.
>>
>> On Tue, Jan 5, 2016 at 9:48 PM Priya Ch <learnings.chitt...@gmail.com>
>> wrote:
>>
>>> Yes, the fileinputstream is closed. May be i didn't show in the screen
>>> shot .
>>>
>>> As spark implements, sort-based shuffle, there is a parameter called
>>> maximum merge factor which decides the number of files that can be merged
>>> at once and this avoids too many open files. I am suspecting that it is
>>> something related to this.
>>>
>>> Can someone confirm on this ?
>>>
>>> On Tue, Jan 5, 2016 at 11:19 PM, Annabel Melongo <
>>> melongo_anna...@yahoo.com> wrote:
>>>
>>>> Vijay,
>>>>
>>>> Are you closing the fileinputstream at the end of each loop (
>>>> in.close())? My guess is those streams aren't close and thus the "too many
>>>> open files" exception.
>>>>
>>>>
>>>> On Tuesday, January 5, 2016 8:03 AM, Priya Ch <
>>>> learnings.chitt...@gmail.com> wrote:
>>>>
>>>>
>>>> Can some one throw light on this ?
>>>>
>>>> Regards,
>>>> Padma Ch
>>>>
>>>> On Mon, Dec 28, 2015 at 3:59 PM, Priya Ch <learnings.chitt...@gmail.com
>>>> > wrote:
>>>>
>>>> Chris, we are using spark 1.3.0 version. we have not set  
>>>> spark.streaming.concurrentJobs
>>>> this parameter. It takes the default value.
>>>>
>>>> Vijay,
>>>>
>>>>   From the tack trace it is evident that 
>>>> org.apache.spark.util.collection.ExternalSorter$$anonfun$writePartitionedFile$1.apply$mcVI$sp(ExternalSorter.scala:730)
>>>> is throwing the exception. I opened the spark source code and visited the
>>>> line which is throwing this exception i.e
>>>>
>>>> [image: Inline image 1]
>>>>
>>>> The lie which is marked in red is throwing the exception. The file is
>>>> ExternalSorter.scala in org.apache.spark.util.collection package.
>>>>
>>>> i went through the following blog
>>>> http://blog.cloudera.com/blog/2015/01/improving-sort-performance-in-apache-spark-its-a-double/
>>>> and understood that there is merge factor which decide the number of
>>>> on-disk files that could be merged. Is it some way related to this ?
>>>>
>>>> Regards,
>>>> Padma CH
>>>>
>>>> On Fri, Dec 25, 2015 at 7:51 PM, Chris Fregly <ch...@fregly.com> wrote:
>>>>
>>>> and which version of Spark/Spark Streaming are you using?
>>>>
>>>> are you explicitly setting the spark.streaming.concurrentJobs to
>>>> something larger than the default of 1?
>>>>
>>>> if so, please try setting that back to 1 and see if the problem still
>>>> exists.
>>>>
>>>> this is a dangerous parameter to modify from the default - which is why
>>>> it's not well-documented.
>>>>
>>>>
>>>> On Wed, Dec 23, 2015 at 8:23 AM, Vijay Gharge <vijay.gha...@gmail.com>
>>>> wrote:
>>>>
>>>> Few indicators -
>>>>
>>>> 1) during execution time - check total number of open files using lsof
>>>> command. Need root permissions. If it is cluster not sure much !
>>>> 2) which exact line in the code is triggering this error ? Can you
>>>> paste that snippet ?
>>>>
>>>>
>>>> On Wednesday 23 December 2015, Priya Ch <learnings.chitt...@gmail.com>
>>>> wrote:
>>>>
>>>> ulimit -n 65000
>>>>
>>>> fs.file-max = 65000 ( in etc/sysctl.conf file)
>>>>
>>>> Thanks,
>>>> Padma Ch
>>>>
>>>> On Tue, Dec 22, 2015 at 6:47 PM, Yash Sharma <yash...@gmail.com> wrote:
>>>>
>>>> Could you share the ulimit for your setup please ?
>>>> - Thanks, via mobile,  excuse brevity.
>>>> On Dec 22, 2015 6:39 PM, "Priya Ch" <learnings.chitt...@gmail.com>
>>>> wrote:
>>>>
>>>> Jakob,
>>>>
>>>>    Increased the settings like fs.file-max in /etc/sysctl.conf and
>>>> also increased user limit in /etc/security/limits.conf. But still see
>>>> the same issue.
>>>>
>>>> On Fri, Dec 18, 2015 at 12:54 AM, Jakob Odersky <joder...@gmail.com>
>>>> wrote:
>>>>
>>>> It might be a good idea to see how many files are open and try
>>>> increasing the open file limit (this is done on an os level). In some
>>>> application use-cases it is actually a legitimate need.
>>>>
>>>> If that doesn't help, make sure you close any unused files and streams
>>>> in your code. It will also be easier to help diagnose the issue if you send
>>>> an error-reproducing snippet.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Vijay Gharge
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Chris Fregly*
>>>> Principal Data Solutions Engineer
>>>> IBM Spark Technology Center, San Francisco, CA
>>>> http://spark.tc | http://advancedspark.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>

Reply via email to