I have some suggestions you may try
1) input RDD ,use the persist method ,this may much save running time
2) from the UI,you can see cluster spend much time in shuffle stage , this
can adjust through some conf parameters ,such as"
spark.shuffle.memoryFraction" "spark.memory.fraction"
good luck
Hi:
there is a little error in source code LDA.scala at line 180, as
follows:
def setBeta(beta: Double): this.type = setBeta(beta)
which cause "java.lang.StackOverflowError". It's easy to see there is
error
--
View this message in context:
http://apache-spark-user-list.1001560.n3.n
I am not sure this can help you. I have 57 million rating,about 4million user
and 4k items. I used 7-14 total-executor-cores,executal-memory 13g,cluster
have 4 nodes,each have 4cores,max memory 16g.
I found set as follows may help avoid this problem:
conf.set("spark.shuffle.memoryFraction","0.
I get the key point . The problem is in sc.sequenceFile,From API description
"RDD will create many references to the same objecty" ,So I revise the code
"sessions.getBytes" to "sessions.getBytes.clone",
It seems to work.
Thanks.
--
View this message in context:
http://apache-spark-user-list.1
Hi
Recently I have some problems about rdd behaviors.It's about
"RDD.first","RDD.toArray" method when RDD only has one element.
I get the different result in different method from one element RDD
where i
should have the same result. I will give more detail after the code.
you can try to decrease the rank value.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-ALS-java-lang-OutOfMemoryError-Java-heap-space-tp20584p20711.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi
Recently I have some problems about rdd behaviors.It's about
"RDD.first","RDD.toArray" method when RDD only has one element. I can't get
the correct element in RDD. I will give more detail after the code.
My code was as follows:
//get and rdd with just one row RDD[(Long,A
i think you can try to set lower spark.storage.memoryFraction,for example 0.4
conf.set("spark.storage.memoryFraction","0.4") //default 0.6
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-processing-multiple-RDDs-tp18628p18659.html
Sent from the Ap
You should supply more information about your input data.
For example ,I generate a IndexRowMatrix from ALS algorithm input data
format,my code like this:
val inputData = sc.textFile(fname).map{
line=>
val parts = line.trim.split(' ')
(parts(0).toLong,parts(1).toInt,parts(2).
Hi
Recently i want to save a big RDD[(k,v)] in form of index and data ,I
deceide to use hadoop mapFile. I tried some examples like this
:https://gist.github.com/airawat/6538748
I runs the code well and generate a index and data file. I can use
command
"hadoop fs -text /spark/out2
Hi
Recently I want to save a big RDD[(k,v)] in form of index and data ,I
deceide to use hadoop mapFile. I tried some examples like this
:https://gist.github.com/airawat/6538748
I runs the code well and generate a index and data file. I can use
command
"hadoop fs -text /spark/out2
thanks
after add the code:
spark.io.compression.codec org.apache.spark.io.LZ4CompressionCodec
in spark-defaults.conf,It runs well.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-use-snappy-compression-when-saveAsSequenceFile-tp17350p18240.html
yes ,I use standalone mode,I have set the "spark.io.compression.codec"
with code :
conf.set("spark.io.compression.codec","org.apache.spark.io.LZ4CompressionCodec")
It seems have no influence on function "saveAsSequenceFile" which still
used snappy compression internal. Thanks.
--
Vi
Here is error log,I abstract as follows:
INFO [binaryTest---main]: before first
WARN [org.apache.spark.scheduler.TaskSetManager---Result resolver
thread-0]: Lost task 0.0 in stage 0.0 (TID 0, spark-dev136):
org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] null
org.xeri
Hi:
After update spark to version1.1.0, I experienced a snappy error which
was
posted here
http://apache-spark-user-list.1001560.n3.nabble.com/Update-gcc-version-Still-snappy-error-tt15137.html
. I avoid this problem with
code:conf.set("spark.io.compression.codec","org.apache.spark.io.LZ4C
Hi:
I want to use SVD in my work. I tried some examples and have some
confusions. The input the 4*3 matrix as follows:
2 0 0
0 3 2
0 3 1
2 0 3
My input file text as follows which is corresponding to the matrix
0 0 2
1 1 3
1 2 2
I update the spark version form 1.02 to 1.10 , experienced an snappy version
issue with the new Spark-1.1.0. After update the glibc version, occured a
another issue. I abstract the log as follows:
14/09/25 11:29:18 WARN [org.apache.hadoop.util.NativeCodeLoader---main]:
Unable to load native-hadoo
17 matches
Mail list logo