wordcount accross several files

2014-12-03 Thread BC
I'm trying to run wordcount on several files, but stuck in failing to pass the output from one file to another. Any help would be appreciate. sc = SparkContext() for datafile in inputfiles: lines = sc.textFile(indir + "/" + datafile, 1) counts = lines.flatMap(lambda x: x.split(' ')) \

Re: problem starting the history server on EC2

2014-06-10 Thread bc Wong
What's the permission on /root itself? On Jun 10, 2014 6:29 PM, "zhen" wrote: > I created a Spark 1.0 cluster on EC2 using the provided scripts. However, I > do not seem to be able to start the history server on the master node. I > used the following command: > > ./start-history-server.sh /root/

Re: spark on yarn is trying to use file:// instead of hdfs://

2014-06-20 Thread bc Wong
Koert, is there any chance that your fs.defaultFS isn't setup right? On Fri, Jun 20, 2014 at 9:57 AM, Koert Kuipers wrote: > yeah sure see below. i strongly suspect its something i misconfigured > causing yarn to try to use local filesystem mistakenly. > > * > > [koert@cdh5