Re: Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-01-22 Thread StanZhai
I'm using Parallel GC. rxin wrote > Are you using G1 GC? G1 sometimes uses a lot more memory than the size > allocated. > > > On Sun, Jan 22, 2017 at 12:58 AM StanZhai < > mail@ > > wrote: > >> Hi all, >> >> >> >> We just upgraded our Spark from 1.6.2 to 2.1.0. >> >> >> >> Our Spark applicatio

Re: Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-01-22 Thread Koert Kuipers
could this be related to SPARK-18787? On Sun, Jan 22, 2017 at 1:45 PM, Reynold Xin wrote: > Are you using G1 GC? G1 sometimes uses a lot more memory than the size > allocated. > > > On Sun, Jan 22, 2017 at 12:58 AM StanZhai wrote: > >> Hi all, >> >> >> >> We just upgraded our Spark from 1.6.2 t

Re: A question about creating persistent table when in-memory catalog is used

2017-01-22 Thread Xiao Li
Agree. : ) 2017-01-22 11:20 GMT-08:00 Reynold Xin : > To be clear there are two separate "hive" we are talking about here. One > is the catalog, and the other is the Hive serde and UDF support. We want to > get to a point that the choice of catalog does not impact the functionality > in Spark oth

Re: A question about creating persistent table when in-memory catalog is used

2017-01-22 Thread Reynold Xin
To be clear there are two separate "hive" we are talking about here. One is the catalog, and the other is the Hive serde and UDF support. We want to get to a point that the choice of catalog does not impact the functionality in Spark other than where the catalog is stored. On Sun, Jan 22, 2017 at

Re: A question about creating persistent table when in-memory catalog is used

2017-01-22 Thread Xiao Li
We have a pending PR to block users to create the Hive serde table when using InMemroyCatalog. See: https://github.com/apache/spark/pull/16587 I believe it answers your question. BTW, we still can create the regular data source tables and insert the data into the tables. The major difference is wh

Re: A question about creating persistent table when in-memory catalog is used

2017-01-22 Thread Reynold Xin
I think this is something we are going to change to completely decouple the Hive support and catalog. On Sun, Jan 22, 2017 at 4:51 AM Shuai Lin wrote: > Hi all, > > Currently when the in-memory catalog is used, e.g. through `--conf > spark.sql.catalogImplementation=in-memory`, we can create a p

Re: Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-01-22 Thread Reynold Xin
Are you using G1 GC? G1 sometimes uses a lot more memory than the size allocated. On Sun, Jan 22, 2017 at 12:58 AM StanZhai wrote: > Hi all, > > > > We just upgraded our Spark from 1.6.2 to 2.1.0. > > > > Our Spark application is started by spark-submit with config of > > `--executor-memory 35G

Spark 1.6.3 Driver OOM on createDataFrame

2017-01-22 Thread Asher Krim
Hi All, There seems to be a bug in Spark 1.6.3 which causes the driver to OOM when creating a dataframe using a lot of data in memory on the driver. Examining a heap dump, it looks like the driver is filled with multiple copies of the data. The following java code reproduces the bug: public voi

A question about creating persistent table when in-memory catalog is used

2017-01-22 Thread Shuai Lin
Hi all, Currently when the in-memory catalog is used, e.g. through `--conf spark.sql.catalogImplementation=in-memory`, we can create a persistent table, but inserting into this table would fail with error message "Hive support is required to insert into the following tables..". sql("create ta

Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-01-22 Thread StanZhai
Hi all, We just upgraded our Spark from 1.6.2 to 2.1.0. Our Spark application is started by spark-submit with config of `--executor-memory 35G` in standalone model, but the actual use of memory up to 65G after a full gc(jmap -histo:live $pid) as follow: test@c6 ~ $ ps aux | grep CoarseGrainedExe