Hi,
We are running Hbase in fully distributed mode. I tried to connect to Hbase via
pyspark and then write to hbase using saveAsNewAPIHadoopDataset , but it failed
the error says:
Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
: java.lang.ClassNotFoundException:
org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
I have been able to create pythonconverters.jar and then did below:
1. I think we have to copy this to a location on HDFS, /sparkjars/ seems a
good a directory to create as any. I think the file has to be world readable
2. Set the spark_jar_hdfs_path property in Cloudera Manager e.g.
hdfs:///sparkjars
It still doesn't seem to work can someone please help me with this.
Regards,
Puneet
dunnhumby limited is a limited company registered in England and Wales with
registered number 02388853 and VAT registered number 927 5871 83. Our
registered office is at Aurora House, 71-75 Uxbridge Road, London W5 5SL. The
contents of this message and any attachments to it are confidential and may be
legally privileged. If you have received this message in error you should
delete it from your system immediately and advise the sender. dunnhumby may
monitor and record all emails. The views expressed in this email are those of
the sender and not those of dunnhumby.