We were getting the same problem also. Funny thing our code worked with
larger data set and failed for a reduced data set. Anyway we are thinking on
passing stacksize override params to jvm may be that can help you.
Please give it a try and let me know.
--conf spark.executor.extraJavaOptions=-Xs
gt;>>>>
>>>>> Code I use to train model:
>>>>>
>>>>> int MAX_BINS = 16;
>>>>> int NUM_CLASSES = 0;
>>>>> double MIN_INFO_GAIN = 0.0;
>>>>> int MAX_MEMORY_IN_MB = 256;
>>>>>
int MAX_MEMORY_IN_MB = 256;
>>>> double SUBSAMPLING_RATE = 1.0;
>>>> boolean USE_NODEID_CACHE = true;
>>>> int CHECKPOINT_INTERVAL = 10;
>>>> int RANDOM_SEED = 12345;
>>>>
>>>> int NODE_SIZE = 5;
>>>>
12345;
>>>
>>> int NODE_SIZE = 5;
>>> int maxDepth = 30;
>>> int numTrees = 50;
>>> Strategy strategy = new Strategy(Algo.Regression(), Variance.instance(),
>>> maxDepth, NUM_CLASSES, MAX_BINS,
>>> QuantileStrategy.Sort(),
IN_INFO_GAIN = 0.0;
>>>> int MAX_MEMORY_IN_MB = 256;
>>>> double SUBSAMPLING_RATE = 1.0;
>>>> boolean USE_NODEID_CACHE = true;
>>>> int CHECKPOINT_INTERVAL = 10;
>>>> int RANDOM_SEED = 12345;
>>>>
>>>> int NODE
>> int numTrees = 50;
>>> Strategy strategy = new Strategy(Algo.Regression(), Variance.instance(),
>>> maxDepth, NUM_CLASSES, MAX_BINS,
>>> QuantileStrategy.Sort(), new
>>> scala.collection.immutable.HashMap<>(), nodeSize, MIN_INFO_GAIN,
>>&
LASSES, MAX_BINS,
>> QuantileStrategy.Sort(), new scala.collection.immutable.HashMap<>(),
>> nodeSize, MIN_INFO_GAIN,
>> MAX_MEMORY_IN_MB, SUBSAMPLING_RATE, USE_NODEID_CACHE,
>> CHECKPOINT_INTERVAL);
>> RandomForestModel model = RandomFor
ERVAL);
> RandomForestModel model = RandomForest.trainRegressor(labeledPoints.rdd(),
> strategy, numTrees, "auto", RANDOM_SEED);
>
>
> Any advice would be highly appreciated.
>
> The exception (~3000 lines long):
> java.lang.StackOverflowError
p<>(), nodeSize, MIN_INFO_GAIN,
MAX_MEMORY_IN_MB, SUBSAMPLING_RATE, USE_NODEID_CACHE,
CHECKPOINT_INTERVAL);
RandomForestModel model =
RandomForest.trainRegressor(labeledPoints.rdd(), strategy, numTrees,
"auto", RANDOM_SEED);
Any advice would be highly appreciated.
The exception (~3
te:
Hi all,
I'm running SQL queries (sqlContext.sql()) on Parquet tables and facing a
problem with table caching (sqlContext.cacheTable()), using spark-shell of
Spark 1.5.1.
After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes longer
the first time (well, for
> problem with table caching (sqlContext.cacheTable()), using spark-shell of
> Spark 1.5.1.
>
> After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes
> longer the first time (well, for the lazy execution reason) but it finishes
> and returns results. However
ll of
Spark 1.5.1.
After I run the sqlContext.cacheTable(table), the sqlContext.sql(query) takes longer the
first time (well, for the lazy execution reason) but it finishes and returns results.
However, the weird thing is that after I run the same query again, I get the error:
"java.lang.
weird thing is that after I run the same
> query again, I get the error: "java.lang.StackOverflowError".
>
> I Googled it but didn't find the error appearing with table caching and
> querying.
> Any hint is appreciated.
>
---
lazy execution reason) but it
finishes and returns results. However, the weird thing is that after I
run the same query again, I get the error: "java.lang.StackOverflowError".
I Googled it but didn't find the error appearing with table caching and
querying.
Any hint is appreciated.
What code is triggering the stack overflow?
On Mon, Feb 29, 2016 at 11:13 PM, Vinti Maheshwari
wrote:
> Hi All,
>
> I am getting below error in spark-streaming application, i am using kafka
> for input stream. When i was doing with socket, it was working fine. But
> when i changed to kafka it's
Hi All,
I am getting below error in spark-streaming application, i am using kafka
for input stream. When i was doing with socket, it was working fine. But
when i changed to kafka it's giving error. Anyone has idea why it's
throwing error, do i need to change my batch time and check pointing time?
h(u => {
uA ++= u
})
var uRDD = sparkContext.parallelize(uA.value)
Its failing on large dataset with following error
java.io.IOException: java.lang.StackOverflowError
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140)
h(u => {
uA ++= u
})
var uRDD = sparkContext.parallelize(uA.value)
Its failing on large dataset with following error
java.io.IOException: java.lang.StackOverflowError
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1140)
;main" java.lang.StackOverflowError
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scal
I tested the examples according to the docs in spark sql programming guide,
but the java.lang.stackoverflowerror occurred everytime I called
sqlContext.sql("...").
Meanwhile, it worked fine in a hiveContext. The Hadoop version is 2.2.0, the
Spark version is 1.1.0, built with Yarn, Hive
hi, TD. Thanks very much! I got it.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-StackOverflowError-when-calling-count-tp5649p11980.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
The long lineage causes a long/deep Java object tree (DAG of RDD objects),
which needs to be serialized as part of the task creation. When
serializing, the whole object DAG needs to be traversed leading to the
stackoverflow error.
TD
On Mon, Aug 11, 2014 at 7:14 PM, randylu wrote:
> hi, TD. I
hi, TD. I also fall into the trap of long lineage, and your suggestions do
work well. But i don't understand why the long lineage can cause stackover,
and where it takes effect?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-StackOverflowError-whe
_gateway.py",
> line 538, in __call__
> File
> "/Users/ping/Desktop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o9564.saveAsTextFile.
> : org.apache.s
l.Py4JJavaError: An error occurred while calling
> o9564.saveAsTextFile.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> serialization failed: java.lang.StackOverflowError
> java.io.Bits.putInt(Bits.java:93)
>
> java.io.ObjectOutputStream$BlockDataOutputStream.writeInt(ObjectOutputStream.java:1927)
>
rc.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o9564.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task
serialization failed: java.lang.StackOverflowError
java.io.Bits.putInt(Bits.java:9
Responses inline.
On Wed, Jul 23, 2014 at 4:13 AM, lalit1303 wrote:
> Hi,
> Thanks TD for your reply. I am still not able to resolve the problem for my
> use case.
> I have let's say 1000 different RDD's, and I am applying a transformation
> function on each RDD and I want the output of all rdd's
Hi,
Thanks TD for your reply. I am still not able to resolve the problem for my
use case.
I have let's say 1000 different RDD's, and I am applying a transformation
function on each RDD and I want the output of all rdd's combined to a single
output RDD. For, this I am doing the following:
**
tempRD
Just to add some more clarity in the discussion, there is a difference
between caching to memory and checkpointing, when considered from the
lineage point of view.
When an RDD in checkpointed, the data of the RDD is saved to HDFS (or any
Hadoop API compatible fault-tolerant storage) and the lineag
gt; > duplicated on a mac desktop and a linux workstation, both running the
>> >> > same
>> >> > version of Spark.
>> >> >
>> >> > The same line of code works correctly after quite some iterations. At
>> >> > the
>> >>
blem).
> >> >
> >> > Any thoughts on this?
> >> >
> >> > Thank you very much,
> >> > - Guanhua
> >> >
> >> >
> >> >
> >> > CODE:print "round", r
If we do cache() + count() after say every 50 iterations. The whole process
becomes very slow.
I have tried checkpoint() , cache() + count(), saveAsObjectFiles().
Nothing works.
Materializing RDD's lead to drastic decrease in performance & if we don't
materialize, we face stackoverflowerror.
--
> >> > was also 0 without any problem).
> >> >
> >> > Any thoughts on this?
> >> >
> >> > Thank you very much,
> >> > - Guanhua
> >> >
> >> >
> >> >
==
>> File
>>
>>"/home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/
>>rdd.py",
>> line 542, in count
>> 14/05/12 16:20:28 INFO TaskSetManager: Loss was due to
>> java.lang.StackOverflowError [duplicate 1]
=====
>> > CODE:print "round", round, rdd__new.count()
>> >
>> > File
>> >
>> > "/home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/rdd.py",
>> > line 542
> > CODE:print "round", round, rdd__new.count()
> >
> > File
> >
> "/home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/rdd.py",
> > line 542, in count
> > 14/05
==
> CODE:print "round", round, rdd__new.count()
>
> File
> "/home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/rdd.py",
> line 542, in count
> 14/05/12 16:20:28 INFO TaskSetM
anhua
CODE:print "round", round, rdd__new.count()
File
"/home1/ghyan/Software/spark-0.9.0-incubating-bin-hadoop2/python/pyspark/rdd
.py", line 542, in count
14/05/12 16:20:28 INFO TaskSetManager: Loss was due to
java.lang.StackO
38 matches
Mail list logo