Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-04-14 Thread Nezih Yigitbasi
> when there are spill attempts. (Note that even if the patch I have for > SPARK-14560 doesn't fix your issue, it might still make those debug logs a > bit more clear, since it'll report memory used by Spillables.) > > Imran > > On Mon, Apr 4, 2016 at 10:52 PM, Nezih

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-04-04 Thread Nezih Yigitbasi
Nope, I didn't have a chance to track the root cause, and IIRC we didn't observe it when dyn. alloc. is off. On Mon, Apr 4, 2016 at 6:16 PM Reynold Xin wrote: > BTW do you still see this when dynamic allocation is off? > > On Mon, Apr 4, 2016 at 6:16 PM, Reynold Xin wrote: > >> Nezih, >> >> Hav

Re: how about a custom coalesce() policy?

2016-04-02 Thread Nezih Yigitbasi
gt;> parallelism would be a useful feature for any data store. >> >> Hemant >> >> Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> >> www.snappydata.io >> >> On Fri, Apr 1, 2016 at 10:33 PM, Nezih Yigitbasi < >>

Re: how about a custom coalesce() policy?

2016-04-01 Thread Nezih Yigitbasi
seful. >> >> The only thing is that we are slowly migrating to the Dataset/DataFrame >> API, and leave RDD mostly as is as a lower level API. Maybe we should do >> both? In either case it would be great to discuss the API on a pull >> request. Cheers. >> >>

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-03-22 Thread Nezih Yigitbasi
Interesting. After experimenting with various parameters increasing spark.sql.shuffle.partitions and decreasing spark.buffer.pageSize helped my job go through. BTW I will be happy to help getting this issue fixed. Nezih On Tue, Mar 22, 2016 at 1:07 AM james wrote: Hi, > I also found 'Unable to

Re: java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-03-21 Thread Nezih Yigitbasi
gt; 2016-03-21 10:29 GMT-07:00 Nezih Yigitbasi >: > >> Hi Spark devs, >> I am using 1.6.0 with dynamic allocation on yarn. I am trying to run a >> relatively big application with 10s of jobs and 100K+ tasks and my app >> fails with the exception below. The closest j

java.lang.OutOfMemoryError: Unable to acquire bytes of memory

2016-03-21 Thread Nezih Yigitbasi
Hi Spark devs, I am using 1.6.0 with dynamic allocation on yarn. I am trying to run a relatively big application with 10s of jobs and 100K+ tasks and my app fails with the exception below. The closest jira issue I could find is SPARK-11293 , which

SparkContext.stop() takes too long to complete

2016-03-19 Thread Nezih Yigitbasi
Hi Spark experts, I am using Spark 1.5.2 on YARN with dynamic allocation enabled. I see in the driver/application master logs that the app is marked as SUCCEEDED and then SparkContext stop is called. However, this stop sequence takes > 10 minutes to complete, and YARN resource manager kills the app

how about a custom coalesce() policy?

2016-02-24 Thread Nezih Yigitbasi
Hi Spark devs, I have sent an email about my problem some time ago where I want to merge a large number of small files with Spark. Currently I am using Hive with the CombineHiveInputFormat and I can control the size of the output files with the max split size parameter (which is used for coalescin

Re: question about combining small parquet files

2015-11-30 Thread Nezih Yigitbasi
of small files is discussed recently > > http://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-fast-data-with-impala-without-kudu/ > > > AFAIK Spark supports views too. > > > -- > Ruslan Dautkhanov > > On Thu, Nov 26, 2015 at 10:43 AM, Nezih Yigitbasi <

question about combining small parquet files

2015-11-26 Thread Nezih Yigitbasi
Hi Spark people, I have a Hive table that has a lot of small parquet files and I am creating a data frame out of it to do some processing, but since I have a large number of splits/files my job creates a lot of tasks, which I don't want. Basically what I want is the same functionality that Hive pro