Re: How can number of partitions be set in "spark-env.sh"?

2014-10-28 Thread Wanda Hawk
Is this what are you looking for ? In Shark, default reducer number is 1 and is controlled by the property mapred.reduce.tasks. Spark SQL deprecates this property in favor ofspark.sql.shuffle.partitions, whose default value is 200. Users may customize this property via SET: SET spark.sql.shuffl

Re: Spark SQL reduce number of java threads

2014-10-28 Thread Wanda Hawk
I am trying to get a software trace and I need to get the number of active threads as low as I can in order to inspect the "active" part of the workload From: Prashant Sharma To: Wanda Hawk Cc: "user@spark.apache.org" Sent: Tuesday, Oc

Spark SQL reduce number of java threads

2014-10-28 Thread Wanda Hawk
Hello I am trying to reduce the number of java threads (about 80 on my system) to as few as there can be. What settings can be done in spark-1.1.0/conf/spark-env.sh ? (or other places as well) I am also using hadoop for storing data on hdfs Thank you, Wanda

shared object between threads

2014-07-15 Thread Wanda Hawk
How can I declare in spark a shared object by all the threads that does not block execution by locking the entire array (threads are supposed to access different lines from a 2 dimensional array) ? For example, I would like to declare a 2 dimensional array. Each thread should write on its corre

Re: KMeans code is rubbish

2014-07-13 Thread Wanda Hawk
kmeans multiple times and choose the best answer.  You can do this by changing the runs parameter from the default value (1) to something larger (say 10). -Ameet On Fri, Jul 11, 2014 at 1:20 AM, Wanda Hawk wrote: I also took a look at  spark-1.0.0/examples/src/main/scala/org/apache/spark

Re: KMeans code is rubbish

2014-07-11 Thread Wanda Hawk
teration (delta = 0.0) > Final centers: > DenseVector(5.0, 2.0) > DenseVector(2.0, 2.0) > > > > On Thu, Jul 10, 2014 at 2:17 AM, Wanda Hawk wrote: >> >> so this is what I am running: >> "./bin/run-example SparkKMeans ~/Documents/2dim2.txt 2 0.001"

Re: KMeans code is rubbish

2014-07-10 Thread Wanda Hawk
BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS Finished iteration (delta = 3.0) Finished iteration (delta = 0.0) Final centers: DenseVector(5.0, 2.0) DenseVector(2.0, 2.0) On Thu, Jul 10, 2014 at 2:17 AM, Wanda Hawk wrote: so this is what I am running:  >"./bin/run-example SparkKMe

Re: KMeans code is rubbish

2014-07-10 Thread Wanda Hawk
.setRuns(10) or something to try 10 times instead of once. On Thu, Jul 10, 2014 at 9:44 AM, Wanda Hawk wrote: > Can someone please run the standard kMeans code on this input with 2 centers > ?: > 2 1 > 1 2 > 3 2 > 2 3 > 4 1 > 5 1 > 6 1 > 4 2 > 6 2 > 4 3 > 5 3

Re: KMeans code is rubbish

2014-07-10 Thread Wanda Hawk
ertrand Dechoux wrote: A picture is worth a thousand... Well, a picture with this dataset, what you are expecting and what you get, would help answering your initial question. Bertrand On Thu, Jul 10, 2014 at 10:44 AM, Wanda Hawk wrote: Can someone please run the standard kMeans code on this i

KMeans code is rubbish

2014-07-10 Thread Wanda Hawk
Can someone please run the standard kMeans code on this input with 2 centers ?: 2 1 1 2 3 2 2 3 4 1 5 1 6 1 4 2 6 2 4 3 5 3 6 3 The obvious result should be (2,2) and (5,2) ... (you can draw them if you don't believe me ...) Thanks,  Wanda

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-03 Thread Wanda Hawk
the KMeans implemented in MLlib directly: http://spark.apache.org/docs/latest/mllib-clustering.html -Xiangrui On Wed, Jul 2, 2014 at 9:50 AM, Wanda Hawk wrote: > I can run it now with the suggested method. However, I have encountered a > new problem that I have not faced before (sent another ema

Re: java options for spark-1.0.0

2014-07-03 Thread Wanda Hawk
hether any options are different. It seems like in both cases, your young generation is quite large (11 GB), which doesn’t make lot of sense with a heap of 15 GB. But maybe I’m misreading something. Matei On Jul 2, 2014, at 4:50 AM, Wanda Hawk wrote: I ran SparkKMeans with a big file (~ 7 GB of d

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-02 Thread Wanda Hawk
xamples put all the example jars in the cp, which you won't need) As you said the error you see is indicative of the class not being available/seen at runtime but it's hard to tell why. On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk wrote: > I want to make some minor modifications in

java options for spark-1.0.0

2014-07-02 Thread Wanda Hawk
I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with spark-0.8.0 with this line in bash.rc " export _JAVA_OPTIONS="-Xmx15g -Xms15g -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails" ". It finished in a decent time, ~50 seconds, and I had only a few "Full GC" message

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-01 Thread Wanda Hawk
Got it ! Ran the jar with spark-submit. Thanks ! On Wednesday, July 2, 2014 9:16 AM, Wanda Hawk wrote: I want to make some minor modifications in the SparkMeans.scala so running the basic example won't do.  I have also packed my code under a "jar" file with sbt. It complete

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-01 Thread Wanda Hawk
ngrui Meng wrote: You can use either bin/run-example or bin/spark-summit to run example code. "scalac -d classes/ SparkKMeans.scala" doesn't recognize Spark classpath. There are examples in the official doc: http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-he

SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-01 Thread Wanda Hawk
Hello, I have installed spark-1.0.0 with scala2.10.3. I have built spark with "sbt/sbt assembly" and added "/home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar" to my CLASSPATH variable.  Then I went here "../spark-1.0.0/examples/src/main/scala/org/apache/sp