Spark RC5 - OutOfMemoryError: Requested array size exceeds VM limit

2016-07-25 Thread Ovidiu-Cristian MARCU
Hi, I am running some tpcds queries (data is Parquet stored in hdfs) with spark 2.0 rc5 and for some queries I get this OOM: java.lang.OutOfMemoryError: Requested array size exceeds VM limit at org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:73)

Re: orc/parquet sql conf

2016-07-25 Thread Ovidiu-Cristian MARCU
75 > <https://github.com/apache/spark/pull/13775> > > Thanks! > > 2016-07-25 19:01 GMT+09:00 Ovidiu-Cristian MARCU > mailto:ovidiu-cristian.ma...@inria.fr>>: > Hi, > > Assuming I have some data in both ORC/Parquet formats, and some complex > workflow that eventu

orc/parquet sql conf

2016-07-25 Thread Ovidiu-Cristian MARCU
Hi, Assuming I have some data in both ORC/Parquet formats, and some complex workflow that eventually combine results of some queries on these datasets, I would like to get the best execution and looking at the default configs I noticed: 1) Vectorized query execution possible with Parquet only,

DMTCP and debug a failed stage in spark

2016-06-16 Thread Ovidiu-Cristian MARCU
Hi, I have a TPCDS query that fails in the stage 80 which is a ResultStage (SparkSQL). Ideally I would like to ‘checkpoint’ a previous stage which was executed successfully and replay the failed stage for debug purposes. Anyone managed to do something similar that could point some hints? Maybe s

Re: tpcds q1 - java.lang.NegativeArraySizeException

2016-06-14 Thread Ovidiu-Cristian MARCU
.akka.frameSize 128 spark.shuffle.manager sort > On 14 Jun 2016, at 00:12, Sameer Agarwal wrote: > > I'm unfortunately not able to reproduce this on master. Does the query always > fail deterministically? > > On Mon, Jun 13, 2016 at 12:54 PM,

Re: tpcds q1 - java.lang.NegativeArraySizeException

2016-06-13 Thread Ovidiu-Cristian MARCU
Yes, commit ad102af > On 13 Jun 2016, at 21:25, Reynold Xin wrote: > > Did you try this on master? > > > On Mon, Jun 13, 2016 at 11:26 AM, Ovidiu-Cristian MARCU > mailto:ovidiu-cristian.ma...@inria.fr>> > wrote: > Hi, > > Running the first query

tpcds q1 - java.lang.NegativeArraySizeException

2016-06-13 Thread Ovidiu-Cristian MARCU
Hi, Running the first query of tpcds on a standalone setup (4 nodes, tpcds2 generated for scale 10 and transformed in parquet under hdfs) it results in one exception [1]. Close to this problem I found this issue https://issues.apache.org/jira/browse/SPARK-12089

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-06 Thread Ovidiu-Cristian MARCU
+1 for moving this discussion to a proactive new (alpha/beta) release of Apache Spark 2.0! > On 06 Jun 2016, at 20:25, Ovidiu Cristian Marcu wrote: > > Any chance to start preparing a new alpha/beta release for 2.0 this month or > the preview will be pushed to maven and conside

Re: Spark 2.0.0-preview artifacts still not available in Maven

2016-06-05 Thread Ovidiu-Cristian MARCU
Hi all IMHO the preview ‘release’ is good at is is now, so no further changes required. For me the preview was a trigger to what will be the next Spark 2.0, really appreciate the effort guys made to describe it and market it:) I’ll appreciate if the Apache Spark team will start a vote for a new

Re: Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

2016-05-24 Thread Ovidiu-Cristian MARCU
Do you need more information? > On 23 May 2016, at 19:16, Ovidiu-Cristian MARCU > wrote: > > Yes, > > git log > commit dafcb05c2ef8e09f45edfb7eabf58116c23975a0 > Author: Sameer Agarwal mailto:sam...@databricks.com>> > Date: Sun May 22 23:32:39 2016 -07

Re: Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

2016-05-23 Thread Ovidiu-Cristian MARCU
Yu wrote: > > Can you tell us the commit hash using which the test was run ? > > For #2, if you can give full stack trace, that would be nice. > > Thanks > > On Mon, May 23, 2016 at 8:58 AM, Ovidiu-Cristian MARCU > mailto:ovidiu-cristian.ma...@inria.fr>> &g

Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

2016-05-23 Thread Ovidiu-Cristian MARCU
Hi 1) Using latest spark 2.0 I've managed to run TPCDSQueryBenchmark first 9 queries and then it ends in the OutOfMemoryError [1]. What was the configuration used for running this benchmark? Can you explain the meaning of 4 shuffle partitions? Thanks! On my local system I use: ./bin/spark-subm

Re: Building spark master failed

2016-05-23 Thread Ovidiu-Cristian MARCU
You’re right, I tought latest will only compile against Java8. Thanks > On 23 May 2016, at 11:35, Dongjoon Hyun wrote: > > Hi, > > That is not the latest. > > The bug was fixed 5 days ago. > > Regards, > Dongjoon. > > > On Mon, May 23,

Building spark master failed

2016-05-23 Thread Ovidiu-Cristian MARCU
Hi I have the following issue when trying to build the latest spark source code on master: /spark/common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java:147: error: cannot find symbol [error] if (process != null && process.isAlive()) { [error]

Re: [vote] Apache Spark 2.0.0-preview release (rc1)

2016-05-18 Thread Ovidiu-Cristian MARCU
ilter to target version = 2.0.0. Cheers. > > On Wed, May 18, 2016 at 9:00 AM, Ovidiu-Cristian MARCU > mailto:ovidiu-cristian.ma...@inria.fr>> > wrote: > +1 Great, I see the list of resolved issues, do you have a list of known > issue you plan to stay with this release? &g

Re: [vote] Apache Spark 2.0.0-preview release (rc1)

2016-05-18 Thread Ovidiu-Cristian MARCU
+1 Great, I see the list of resolved issues, do you have a list of known issue you plan to stay with this release? with build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Phive -Phive-thriftserver -DskipTests clean package mvn -version Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c0747832