Re: Spark Streaming in Production

2014-12-12 Thread twizansk
Thanks for the reply. I might be misunderstanding something basic.As far as I can tell, the cluster manager (e.g. Mesos) manages the master and worker nodes but not the drivers or receivers, those are external to the spark cluster: http://spark.apache.org/docs/latest/cluster-overview.html I

Spark Streaming in Production

2014-12-11 Thread twizansk
Hi, I'm looking for resources and examples for the deployment of spark streaming in production. Specifically, I would like to know how high availability and fault tolerance of receivers is typically achieved. The workers are managed by the spark framework and are therefore fault tolerant out

Re: Python, Spark and HBase

2014-05-28 Thread twizansk
The code which causes the error is: The code which causes the error is: sc = SparkContext("local", "My App") rdd = sc.newAPIHadoopFile( name, 'org.apache.hadoop.hbase.mapreduce.TableInputFormat', 'org.apache.hadoop.hbase.io.ImmutableBytesWritable', 'org.apache.hadoop.hbase.client

Re: Python, Spark and HBase

2014-05-28 Thread twizansk
In my code I am not referencing PythonRDD or PythonRDDnewAPIHadoopFile at all. I am calling SparkContext.newAPIHadoopFile with: inputformat_class='org.apache.hadoop.hbase.mapreduce.TableInputFormat' key_class='org.apache.hadoop.hbase.io.ImmutableBytesWritable', value_class='org.apache.hadoop.hba

Re: Python, Spark and HBase

2014-05-28 Thread twizansk
Hi Nick, I finally got around to downloading and building the patch. I pulled the code from https://github.com/MLnick/spark-1/tree/pyspark-inputformats I am running on a CDH5 node. While the code in the CDH branch is different from spark master, I do believe that I have resolved any inconsist

Re: Python, Spark and HBase

2014-05-21 Thread twizansk
Thanks Nick and Matei. I'll take a look at the patch and keep you updated. Tommer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6176.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Python, Spark and HBase

2014-05-20 Thread twizansk
Hello, This seems like a basic question but I have been unable to find an answer in the archives or other online sources. I would like to know if there is any way to load a RDD from HBase in python. In Java/Scala I can do this by initializing a NewAPIHadoopRDD with a TableInputFormat class. Is