Sorry, I sent this to the dev list instead of user. Please ignore. I'll
re-post to the correct list.
Regards,
Art
On Thu, Jul 24, 2014 at 11:09 AM, Art Peel wrote:
> Our system works with RDDs generated from Hadoop files. It processes each
> record in a Hadoop file and for
Our system works with RDDs generated from Hadoop files. It processes each
record in a Hadoop file and for a subset of those records generates output
that is written to an external system via RDD.foreach. There are no
dependencies between the records that are processed.
If writing to the external s
ts within Spark, at least for now.
>
> Cheers,
> Andrew
>
> On Friday, April 25, 2014, Art Peel wrote:
>
> > I've been setting up Spark cluster on EC2 using the provided
> > ec2/spark_ec2.py script and am very happy I didn't have to write it from
> > scrat
I've been setting up Spark cluster on EC2 using the provided
ec2/spark_ec2.py script and am very happy I didn't have to write it from
scratch. Thanks for providing it.
There have been some issues, though, and I have had to make some additions.
So far, they are all additions of command-line option