Thanks for the reply. I might be misunderstanding something basic.As far
as I can tell, the cluster manager (e.g. Mesos) manages the master and
worker nodes but not the drivers or receivers, those are external to the
spark cluster:
http://spark.apache.org/docs/latest/cluster-overview.html
I
Hi,
I'm looking for resources and examples for the deployment of spark streaming
in production. Specifically, I would like to know how high availability and
fault tolerance of receivers is typically achieved.
The workers are managed by the spark framework and are therefore fault
tolerant out
The code which causes the error is:
The code which causes the error is:
sc = SparkContext("local", "My App")
rdd = sc.newAPIHadoopFile(
name,
'org.apache.hadoop.hbase.mapreduce.TableInputFormat',
'org.apache.hadoop.hbase.io.ImmutableBytesWritable',
'org.apache.hadoop.hbase.client
In my code I am not referencing PythonRDD or PythonRDDnewAPIHadoopFile at
all. I am calling SparkContext.newAPIHadoopFile with:
inputformat_class='org.apache.hadoop.hbase.mapreduce.TableInputFormat'
key_class='org.apache.hadoop.hbase.io.ImmutableBytesWritable',
value_class='org.apache.hadoop.hba
Hi Nick,
I finally got around to downloading and building the patch.
I pulled the code from
https://github.com/MLnick/spark-1/tree/pyspark-inputformats
I am running on a CDH5 node. While the code in the CDH branch is different
from spark master, I do believe that I have resolved any inconsist
Thanks Nick and Matei. I'll take a look at the patch and keep you updated.
Tommer
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6176.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hello,
This seems like a basic question but I have been unable to find an answer in
the archives or other online sources.
I would like to know if there is any way to load a RDD from HBase in python.
In Java/Scala I can do this by initializing a NewAPIHadoopRDD with a
TableInputFormat class. Is