Re: Executing spark jobs with predefined Hadoop user

2014-04-12 Thread Asaf Lahav
Thank you all very much for your responses We are going to test these recommendations. Adnan, in regards to the HDFS URI, this is actually the manner in which we are accessing the file system already. It was simply removed from the post. Thank you, Asaf On Thu, Apr 10, 2014 at 5:33 PM, Sha

Re: Master registers itself at startup?

2014-04-12 Thread Mark Baker
On Sat, Apr 12, 2014 at 9:19 AM, ge ko wrote: > Hi, > > I'm wondering why the master is registering itself at startup, exactly 3 > times (same number as the number of workers). Log excerpt: > "" > 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger > started > 2014-04-11 21:08:1

Re: Huge matrix

2014-04-12 Thread Xiaoli Li
Hi Tom, Thank you very much for your detailed explanation. I think it is very helpful to me. On Sat, Apr 12, 2014 at 1:06 PM, Tom V wrote: > The last writer is suggesting using the triangle inequality to cut down > the search space. If c is the centroid of cluster C, then the closest any > p

Re: Huge matrix

2014-04-12 Thread Tom V
The last writer is suggesting using the triangle inequality to cut down the search space. If c is the centroid of cluster C, then the closest any point in C is to x is ||x-c|| - r(C), where r(C) is the (precomputed) radius of the cluster---the distance of the farthest point in C to c. Whether you

Re: Huge matrix

2014-04-12 Thread Xiaoli Li
Hi Guillaume, This sounds a good idea to me. I am a newbie here. Could you further explain how will you determine which clusters to keep? According to the distance between each element with each cluster center? Will you keep several clusters for each element for searching nearest neighbours? Thank

Re: Changing number of workers for benchmarking purposes

2014-04-12 Thread Kalpit Shah
In spark release 0.7.1, I added support for running multiple worker processes on a single slave machine. I built it for performance testing multiple workers on a single machine in standalone mode. Set the following in conf/spark-env.sh and bounce your cluster : export SPARK_WORKER_INSTANCES=3 Th

Re: Compile SimpleApp.scala encountered error, please can any one help?

2014-04-12 Thread jni2000
Thanks, Prabeesh. I figured it out. The java file did conflict with the scala file. Thanks for the hint. Jmaes -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Compile-SimpleApp-scala-encountered-error-please-can-any-one-help-tp4160p4168.html Sent from the A

Re: Compile SimpleApp.scala encountered error, please can any one help?

2014-04-12 Thread jni2000
prabeesh Thanks for the reply. By one copy of SimpleApp.scala, do you mean one copy of this .scala file? I only have one in a newly create test project. I do have one copy of SimpleApp.java but in a different directory (src/main/java), .scala file is in src/main/scala directory. Will java and scal

Re: Huge matrix

2014-04-12 Thread Guillaume Pitel
Hi, I'm doing this here for multiple tens of millions of elements (and the goal is to reach multiple billions), on a relatively small cluster (7 nodes 4 cores 32GB RAM). We use multiprobe KLSH. All you have to do is run a Kmeans on your data, then compute the distanc

Re: Huge matrix

2014-04-12 Thread Xiaoli Li
Hi Reza, Thank you for your information. I will try it. On Fri, Apr 11, 2014 at 11:21 PM, Reza Zadeh wrote: > Hi Xiaoli, > > There is a PR currently in progress to allow this, via the sampling scheme > described in this paper: stanford.edu/~rezab/papers/dimsum.pdf > > The PR is at https://git

cannot exec. job: "TaskSchedulerImpl: Initial job has not accepted any resources"

2014-04-12 Thread ge ko
Hi, I'm starting using Spark and have installed Spark within CDH5 using ClouderaManager. I set up one master (hadoop-pg-5) and 3 workers (hadoop-pg-7[-8,-9]). Master WebUI looks good, all workers seem to be registered. If I open "spark-shell" and try to execute the wordcount example, the executio

Master registers itself at startup?

2014-04-12 Thread ge ko
Hi, I'm wondering why the master is registering itself at startup, exactly 3 times (same number as the number of workers). Log excerpt: "" 2014-04-11 21:08:15,363 INFO akka.event.slf4j.Slf4jLogger: Slf4jLogger started 2014-04-11 21:08:15,478 INFO Remoting: Starting remoting 2014-04-11 21:08:15,838

cannot exec. job: "TaskSchedulerImpl: Initial job has not accepted any resources"

2014-04-12 Thread Gerd Koenig
Hi, I'm starting using Spark and have installed Spark within CDH5 using ClouderaManager. I set up one master (hadoop-pg-5) and 3 workers (hadoop-pg-7[-8,-9]). Master WebUI looks good, all workers seem to be registered. If I open "spark-shell" and try to execute the wordcount example, the executio