Re: hadoopRDD stalls reading entire directory

2014-06-02 Thread Russell Jurney
Nothing appears to be running on hivecluster2:8080. 'sudo jps' does show [hivedata@hivecluster2 ~]$ sudo jps 9953 PepAgent 13797 JournalNode 7618 NameNode 6574 Jps 12716 Worker 16671 RunJar 18675 Main 18177 JobTracker 10918 Master 18139 TaskTracker 7674 DataNode I kill all processes listed. I r

Re: hadoopRDD stalls reading entire directory

2014-06-02 Thread Aaron Davidson
You may have to do "sudo jps", because it should definitely list your processes. What does hivecluster2:8080 look like? My guess is it says there are 2 applications registered, and one has taken all the executors. There must be two applications running, as those are the only things that keep open

Re: hadoopRDD stalls reading entire directory

2014-06-02 Thread Russell Jurney
If it matters, I have servers running at http://hivecluster2:4040/stages/ and http://hivecluster2:4041/stages/ When I run rdd.first, I see an item at http://hivecluster2:4041/stages/ but no tasks are running. Stage ID 1, first at :46, Tasks: Succeeded/Total 0/16. On Mon, Jun 2, 2014 at 10:09 AM,

Re: hadoopRDD stalls reading entire directory

2014-06-02 Thread Russell Jurney
Looks like just worker and master processes are running: [hivedata@hivecluster2 ~]$ jps 10425 Jps [hivedata@hivecluster2 ~]$ ps aux|grep spark hivedata 10424 0.0 0.0 103248 820 pts/3S+ 10:05 0:00 grep spark root 10918 0.5 1.4 4752880 230512 ? Sl May27 41:43 java -cp :

Re: hadoopRDD stalls reading entire directory

2014-06-01 Thread Aaron Davidson
Sounds like you have two shells running, and the first one is talking all your resources. Do a "jps" and kill the other guy, then try again. By the way, you can look at http://localhost:8080 (replace localhost with the server your Spark Master is running on) to see what applications are currently

Re: hadoopRDD stalls reading entire directory

2014-06-01 Thread Russell Jurney
Thanks again. Run results here: https://gist.github.com/rjurney/dc0efae486ba7d55b7d5 This time I get a port already in use exception on 4040, but it isn't fatal. Then when I run rdd.first, I get this over and over: 14/06/01 18:35:40 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted a

Re: hadoopRDD stalls reading entire directory

2014-06-01 Thread Aaron Davidson
You can avoid that by using the constructor that takes a SparkConf, a la val conf = new SparkConf() conf.setJars("avro.jar", ...) val sc = new SparkContext(conf) On Sun, Jun 1, 2014 at 2:32 PM, Russell Jurney wrote: > Followup question: the docs to make a new SparkContext require that I know >

Re: hadoopRDD stalls reading entire directory

2014-06-01 Thread Russell Jurney
Followup question: the docs to make a new SparkContext require that I know where $SPARK_HOME is. However, I have no idea. Any idea where that might be? On Sun, Jun 1, 2014 at 10:28 AM, Aaron Davidson wrote: > Gotcha. The easiest way to get your dependencies to your Executors would > probably be

Re: hadoopRDD stalls reading entire directory

2014-06-01 Thread Aaron Davidson
Gotcha. The easiest way to get your dependencies to your Executors would probably be to construct your SparkContext with all necessary jars passed in (as the "jars" parameter), or inside a SparkConf with setJars(). Avro is a "necessary jar", but it's possible your application also needs to distribu

Re: hadoopRDD stalls reading entire directory

2014-05-31 Thread Russell Jurney
Thanks for the fast reply. I am running CDH 4.4 with the Cloudera Parcel of Spark 0.9.0, in standalone mode. On Saturday, May 31, 2014, Aaron Davidson wrote: > First issue was because your cluster was configured incorrectly. You could > probably read 1 file because that was done on the driver n