Re: continuing processing when errors occur

2014-07-24 Thread Art Peel
Sorry, I sent this to the dev list instead of user. Please ignore. I'll re-post to the correct list. Regards, Art On Thu, Jul 24, 2014 at 11:09 AM, Art Peel wrote: > Our system works with RDDs generated from Hadoop files. It processes each > record in a Hadoop file and for

continuing processing when errors occur

2014-07-24 Thread Art Peel
Our system works with RDDs generated from Hadoop files. It processes each record in a Hadoop file and for a subset of those records generates output that is written to an external system via RDD.foreach. There are no dependencies between the records that are processed. If writing to the external s

Re: thoughts on spark_ec2.py?

2014-04-28 Thread Art Peel
ts within Spark, at least for now. > > Cheers, > Andrew > > On Friday, April 25, 2014, Art Peel wrote: > > > I've been setting up Spark cluster on EC2 using the provided > > ec2/spark_ec2.py script and am very happy I didn't have to write it from > > scrat

thoughts on spark_ec2.py?

2014-04-25 Thread Art Peel
I've been setting up Spark cluster on EC2 using the provided ec2/spark_ec2.py script and am very happy I didn't have to write it from scratch. Thanks for providing it. There have been some issues, though, and I have had to make some additions. So far, they are all additions of command-line option