Re: java.lang.StackOverflowError when calling count()

2016-06-13 Thread Anuj
We were getting the same problem also. Funny thing our code worked with larger data set and failed for a reduced data set. Anyway we are thinking on passing stacksize override params to jvm may be that can help you. Please give it a try and let me know. --conf spark.executor.extraJavaOptions=-Xs

Re: SparkML RandomForest java.lang.StackOverflowError

2016-04-01 Thread Joseph Bradley
gt;>>>> >>>>> Code I use to train model: >>>>> >>>>> int MAX_BINS = 16; >>>>> int NUM_CLASSES = 0; >>>>> double MIN_INFO_GAIN = 0.0; >>>>> int MAX_MEMORY_IN_MB = 256; >>>>>

Re: SparkML RandomForest java.lang.StackOverflowError

2016-04-01 Thread Joseph Bradley
int MAX_MEMORY_IN_MB = 256; >>>> double SUBSAMPLING_RATE = 1.0; >>>> boolean USE_NODEID_CACHE = true; >>>> int CHECKPOINT_INTERVAL = 10; >>>> int RANDOM_SEED = 12345; >>>> >>>> int NODE_SIZE = 5; >>>>

Re: SparkML RandomForest java.lang.StackOverflowError

2016-03-31 Thread Eugene Morozov
12345; >>> >>> int NODE_SIZE = 5; >>> int maxDepth = 30; >>> int numTrees = 50; >>> Strategy strategy = new Strategy(Algo.Regression(), Variance.instance(), >>> maxDepth, NUM_CLASSES, MAX_BINS, >>> QuantileStrategy.Sort(),

Re: SparkML RandomForest java.lang.StackOverflowError

2016-03-30 Thread Eugene Morozov
IN_INFO_GAIN = 0.0; >>>> int MAX_MEMORY_IN_MB = 256; >>>> double SUBSAMPLING_RATE = 1.0; >>>> boolean USE_NODEID_CACHE = true; >>>> int CHECKPOINT_INTERVAL = 10; >>>> int RANDOM_SEED = 12345; >>>> >>>> int NODE

Re: SparkML RandomForest java.lang.StackOverflowError

2016-03-29 Thread Eugene Morozov
>> int numTrees = 50; >>> Strategy strategy = new Strategy(Algo.Regression(), Variance.instance(), >>> maxDepth, NUM_CLASSES, MAX_BINS, >>> QuantileStrategy.Sort(), new >>> scala.collection.immutable.HashMap<>(), nodeSize, MIN_INFO_GAIN, >>&

Re: SparkML RandomForest java.lang.StackOverflowError

2016-03-29 Thread Joseph Bradley
LASSES, MAX_BINS, >> QuantileStrategy.Sort(), new scala.collection.immutable.HashMap<>(), >> nodeSize, MIN_INFO_GAIN, >> MAX_MEMORY_IN_MB, SUBSAMPLING_RATE, USE_NODEID_CACHE, >> CHECKPOINT_INTERVAL); >> RandomForestModel model = RandomFor

Re: SparkML RandomForest java.lang.StackOverflowError

2016-03-29 Thread Eugene Morozov
ERVAL); > RandomForestModel model = RandomForest.trainRegressor(labeledPoints.rdd(), > strategy, numTrees, "auto", RANDOM_SEED); > > > Any advice would be highly appreciated. > > The exception (~3000 lines long): > java.lang.StackOverflowError

SparkML RandomForest java.lang.StackOverflowError

2016-03-29 Thread Eugene Morozov
p<>(), nodeSize, MIN_INFO_GAIN, MAX_MEMORY_IN_MB, SUBSAMPLING_RATE, USE_NODEID_CACHE, CHECKPOINT_INTERVAL); RandomForestModel model = RandomForest.trainRegressor(labeledPoints.rdd(), strategy, numTrees, "auto", RANDOM_SEED); Any advice would be highly appreciated. The exception (~3

Re: Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Mohamed Nadjib MAMI
te: Hi all, I'm running SQL queries (sqlContext.sql()) on Parquet tables and facing a problem with table caching (sqlContext.cacheTable()), using spark-shell of Spark 1.5.1. After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes longer the first time (well, for

Re: Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Ted Yu
> problem with table caching (sqlContext.cacheTable()), using spark-shell of > Spark 1.5.1. > > After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes > longer the first time (well, for the lazy execution reason) but it finishes > and returns results. However

Re: Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Mohamed Nadjib MAMI
ll of Spark 1.5.1. After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes longer the first time (well, for the lazy execution reason) but it finishes and returns results. However, the weird thing is that after I run the same query again, I get the error: "java.lang.

Re: Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Ted Yu
weird thing is that after I run the same > query again, I get the error: "java.lang.StackOverflowError". > > I Googled it but didn't find the error appearing with table caching and > querying. > Any hint is appreciated. > ---

Spark SQL - java.lang.StackOverflowError after caching table

2016-03-24 Thread Mohamed Nadjib MAMI
lazy execution reason) but it finishes and returns results. However, the weird thing is that after I run the same query again, I get the error: "java.lang.StackOverflowError". I Googled it but didn't find the error appearing with table caching and querying. Any hint is appreciated.

Re: Spark Streaming: java.lang.StackOverflowError

2016-03-01 Thread Cody Koeninger
What code is triggering the stack overflow? On Mon, Feb 29, 2016 at 11:13 PM, Vinti Maheshwari wrote: > Hi All, > > I am getting below error in spark-streaming application, i am using kafka > for input stream. When i was doing with socket, it was working fine. But > when i changed to kafka it's

Spark Streaming: java.lang.StackOverflowError

2016-02-29 Thread Vinti Maheshwari
Hi All, I am getting below error in spark-streaming application, i am using kafka for input stream. When i was doing with socket, it was working fine. But when i changed to kafka it's giving error. Anyone has idea why it's throwing error, do i need to change my batch time and check pointing time?

Spark Accumulator Issue - java.io.IOException: java.lang.StackOverflowError

2015-07-24 Thread Jadhav Shweta
h(u => { uA ++= u }) var uRDD = sparkContext.parallelize(uA.value) Its failing on large dataset with following error java.io.IOException: java.lang.StackOverflowError at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140)

Spark Accumulator Issue - java.io.IOException: java.lang.StackOverflowError

2015-07-15 Thread Jadhav Shweta
h(u => { uA ++= u }) var uRDD = sparkContext.parallelize(uA.value) Its failing on large dataset with following error java.io.IOException: java.lang.StackOverflowError at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140)

java.lang.StackOverflowError when doing spark sql

2015-02-19 Thread bit1...@163.com
;main" java.lang.StackOverflowError at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254) at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scal

java.lang.stackoverflowerror when running Spark shell

2014-09-23 Thread mrshen
I tested the examples according to the docs in spark sql programming guide, but the java.lang.stackoverflowerror occurred everytime I called sqlContext.sql("..."). Meanwhile, it worked fine in a hiveContext. The Hadoop version is 2.2.0, the Spark version is 1.1.0, built with Yarn, Hive

Re: java.lang.StackOverflowError when calling count()

2014-08-12 Thread randylu
hi, TD. Thanks very much! I got it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-StackOverflowError-when-calling-count-tp5649p11980.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --

Re: java.lang.StackOverflowError when calling count()

2014-08-12 Thread Tathagata Das
The long lineage causes a long/deep Java object tree (DAG of RDD objects), which needs to be serialized as part of the task creation. When serializing, the whole object DAG needs to be traversed leading to the stackoverflow error. TD On Mon, Aug 11, 2014 at 7:14 PM, randylu wrote: > hi, TD. I

Re: java.lang.StackOverflowError when calling count()

2014-08-11 Thread randylu
hi, TD. I also fall into the trap of long lineage, and your suggestions do work well. But i don't understand why the long lineage can cause stackover, and where it takes effect? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-StackOverflowError-whe

Re: java.lang.StackOverflowError

2014-08-05 Thread Davies Liu
_gateway.py", > line 538, in __call__ > File > "/Users/ping/Desktop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling > o9564.saveAsTextFile. > : org.apache.s

Re: java.lang.StackOverflowError

2014-08-05 Thread Chengi Liu
l.Py4JJavaError: An error occurred while calling > o9564.saveAsTextFile. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.lang.StackOverflowError > java.io.Bits.putInt(Bits.java:93) > > java.io.ObjectOutputStream$BlockDataOutputStream.writeInt(ObjectOutputStream.java:1927) >

java.lang.StackOverflowError

2014-08-05 Thread Chengi Liu
rc.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o9564.saveAsTextFile. : org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError java.io.Bits.putInt(Bits.java:9

Re: java.lang.StackOverflowError when calling count()

2014-07-26 Thread Tathagata Das
Responses inline. On Wed, Jul 23, 2014 at 4:13 AM, lalit1303 wrote: > Hi, > Thanks TD for your reply. I am still not able to resolve the problem for my > use case. > I have let's say 1000 different RDD's, and I am applying a transformation > function on each RDD and I want the output of all rdd's

Re: java.lang.StackOverflowError when calling count()

2014-07-23 Thread lalit1303
Hi, Thanks TD for your reply. I am still not able to resolve the problem for my use case. I have let's say 1000 different RDD's, and I am applying a transformation function on each RDD and I want the output of all rdd's combined to a single output RDD. For, this I am doing the following: ** tempRD

Re: java.lang.StackOverflowError when calling count()

2014-05-15 Thread Tathagata Das
Just to add some more clarity in the discussion, there is a difference between caching to memory and checkpointing, when considered from the lineage point of view. When an RDD in checkpointed, the data of the RDD is saved to HDFS (or any Hadoop API compatible fault-tolerant storage) and the lineag

Re: java.lang.StackOverflowError when calling count()

2014-05-14 Thread lalit1303
gt; > duplicated on a mac desktop and a linux workstation, both running the >> >> > same >> >> > version of Spark. >> >> > >> >> > The same line of code works correctly after quite some iterations. At >> >> > the >> >>

Re: java.lang.StackOverflowError when calling count()

2014-05-14 Thread Nicholas Chammas
blem). > >> > > >> > Any thoughts on this? > >> > > >> > Thank you very much, > >> > - Guanhua > >> > > >> > > >> > > >> > CODE:print "round", r

Re: java.lang.StackOverflowError when calling count()

2014-05-14 Thread lalit1303
If we do cache() + count() after say every 50 iterations. The whole process becomes very slow. I have tried checkpoint() , cache() + count(), saveAsObjectFiles(). Nothing works. Materializing RDD's lead to drastic decrease in performance & if we don't materialize, we face stackoverflowerror. --

Re: java.lang.StackOverflowError when calling count()

2014-05-13 Thread Mayur Rustagi
> >> > was also 0 without any problem). > >> > > >> > Any thoughts on this? > >> > > >> > Thank you very much, > >> > - Guanhua > >> > > >> > > >> >

Re: java.lang.StackOverflowError when calling count()

2014-05-13 Thread Guanhua Yan
== >> File >> >>"/home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/ >>rdd.py", >> line 542, in count >> 14/05/12 16:20:28 INFO TaskSetManager: Loss was due to >> java.lang.StackOverflowError [duplicate 1]

Re: java.lang.StackOverflowError when calling count()

2014-05-13 Thread Xiangrui Meng
===== >> > CODE:print "round", round, rdd__new.count() >> > >> > File >> > >> > "/home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/rdd.py", >> > line 542

Re: java.lang.StackOverflowError when calling count()

2014-05-13 Thread Mayur Rustagi
> > CODE:print "round", round, rdd__new.count() > > > > File > > > "/home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/rdd.py", > > line 542, in count > > 14/05

Re: java.lang.StackOverflowError when calling count()

2014-05-13 Thread Xiangrui Meng
== > CODE:print "round", round, rdd__new.count() > > File > "/home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/rdd.py", > line 542, in count > 14/05/12 16:20:28 INFO TaskSetM

java.lang.StackOverflowError when calling count()

2014-05-12 Thread Guanhua Yan
anhua CODE:print "round", round, rdd__new.count() File "/home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/rdd .py", line 542, in count 14/05/12 16:20:28 INFO TaskSetManager: Loss was due to java.lang.StackO