Is this what are you looking for ?
In Shark, default reducer number is 1 and is controlled by the property
mapred.reduce.tasks. Spark SQL deprecates this property in favor
ofspark.sql.shuffle.partitions, whose default value is 200. Users may customize
this property via SET:
SET spark.sql.shuffl
I am trying to get a software trace and I need to get the number of active
threads as low as I can in order to inspect the "active" part of the workload
From: Prashant Sharma
To: Wanda Hawk
Cc: "user@spark.apache.org"
Sent: Tuesday, Oc
Hello
I am trying to reduce the number of java threads (about 80 on my system) to as
few as there can be.
What settings can be done in spark-1.1.0/conf/spark-env.sh ? (or other places
as well)
I am also using hadoop for storing data on hdfs
Thank you,
Wanda
How can I declare in spark a shared object by all the threads that does not
block execution by locking the entire array (threads are supposed to access
different lines from a 2 dimensional array) ?
For example, I would like to declare a 2 dimensional array. Each thread should
write on its corre
kmeans multiple times and choose the best answer. You can do this by
changing the runs parameter from the default value (1) to something larger (say
10).
-Ameet
On Fri, Jul 11, 2014 at 1:20 AM, Wanda Hawk wrote:
I also took a look at
spark-1.0.0/examples/src/main/scala/org/apache/spark
teration (delta = 0.0)
> Final centers:
> DenseVector(5.0, 2.0)
> DenseVector(2.0, 2.0)
>
>
>
> On Thu, Jul 10, 2014 at 2:17 AM, Wanda Hawk wrote:
>>
>> so this is what I am running:
>> "./bin/run-example SparkKMeans
~/Documents/2dim2.txt 2 0.001"
BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeRefBLAS
Finished iteration (delta = 3.0)
Finished iteration (delta = 0.0)
Final centers:
DenseVector(5.0, 2.0)
DenseVector(2.0, 2.0)
On Thu, Jul 10, 2014 at 2:17 AM, Wanda Hawk wrote:
so this is what I am running:
>"./bin/run-example SparkKMe
.setRuns(10) or
something to try 10 times instead of once.
On Thu, Jul 10, 2014 at 9:44 AM, Wanda Hawk wrote:
> Can someone please run the standard kMeans code on this input with 2 centers
> ?:
> 2 1
> 1 2
> 3 2
> 2 3
> 4 1
> 5 1
> 6 1
> 4 2
> 6 2
> 4 3
> 5 3
ertrand Dechoux
wrote:
A picture is worth a thousand... Well, a picture with this dataset, what you
are expecting and what you get, would help answering your initial question.
Bertrand
On Thu, Jul 10, 2014 at 10:44 AM, Wanda Hawk wrote:
Can someone please run the standard kMeans code on this i
Can someone please run the standard kMeans code on this input with 2 centers ?:
2 1
1 2
3 2
2 3
4 1
5 1
6 1
4 2
6 2
4 3
5 3
6 3
The obvious result should be (2,2) and (5,2) ... (you can draw them if you
don't believe me ...)
Thanks,
Wanda
the KMeans implemented in MLlib directly:
http://spark.apache.org/docs/latest/mllib-clustering.html
-Xiangrui
On Wed, Jul 2, 2014 at 9:50 AM, Wanda Hawk wrote:
> I can run it now with the suggested method. However, I have encountered a
> new problem that I have not faced before (sent another ema
hether any options are different. It seems like in both cases, your
young generation is quite large (11 GB), which doesn’t make lot of sense with a
heap of 15 GB. But maybe I’m misreading something.
Matei
On Jul 2, 2014, at 4:50 AM, Wanda Hawk wrote:
I ran SparkKMeans with a big file (~ 7 GB of d
xamples put all the example jars in the cp, which you won't
need)
As you said the error you see is indicative of the class not being
available/seen at runtime but it's hard to tell why.
On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk wrote:
> I want to make some minor modifications in
I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with
spark-0.8.0 with this line in bash.rc " export _JAVA_OPTIONS="-Xmx15g -Xms15g
-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails" ". It finished in a
decent time, ~50 seconds, and I had only a few "Full GC" message
Got it ! Ran the jar with spark-submit. Thanks !
On Wednesday, July 2, 2014 9:16 AM, Wanda Hawk wrote:
I want to make some minor modifications in the SparkMeans.scala so running the
basic example won't do.
I have also packed my code under a "jar" file with sbt. It complete
ngrui Meng wrote:
You can use either bin/run-example or bin/spark-summit to run example
code. "scalac -d classes/ SparkKMeans.scala" doesn't recognize Spark
classpath. There are examples in the official doc:
http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-he
Hello,
I have installed spark-1.0.0 with scala2.10.3. I have built spark with "sbt/sbt
assembly" and added
"/home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar"
to my CLASSPATH variable.
Then I went here
"../spark-1.0.0/examples/src/main/scala/org/apache/sp
17 matches
Mail list logo