Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

Antony Mayi Wed, 24 Dec 2014 16:13:04 -0800

I just run it by hand from pyspark shell. here is the steps:
pyspark --jars 
/usr/lib/spark/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar
>>> conf = {"hbase.zookeeper.quorum": "localhost",
...         "hbase.mapred.outputtable": "test",...         
"mapreduce.outputformat.class": 
"org.apache.hadoop.hbase.mapreduce.TableOutputFormat",...         
"mapreduce.job.output.key.class": 
"org.apache.hadoop.hbase.io.ImmutableBytesWritable",...         
"mapreduce.job.output.value.class": "org.apache.hadoop.io.Writable"}>>> keyConv 
= 
"org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter">>>
 valueConv = 
"org.apache.spark.examples.pythonconverters.StringListToPutConverter">>> 
sc.parallelize([['testkey', 'f1', 'testqual', 'testval']], 1).map(lambda x: 
(x[0], x)).saveAsNewAPIHadoopDataset(...         conf=conf,...         
keyConverter=keyConv,...         valueConverter=valueConv)
then it spills few of the INFO level messages about submitting a task etc but 
then it just hangs. very same code runs ok on spark 1.1.0 - the records gets 
stored in hbase.
thanks,Antony.


 

     On Thursday, 25 December 2014, 0:37, Ted Yu <yuzhih...@gmail.com> wrote:
   
 

 I went over the jstack but didn't find any call related to hbase or 
zookeeper.Do you find anything important in the logs ?
Looks like container launcher was waiting for the script to return some result:
   
   -         at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:715)
   -         at org.apache.hadoop.util.Shell.runCommand(Shell.java:524)

On Wed, Dec 24, 2014 at 3:11 PM, Antony Mayi <antonym...@yahoo.com> wrote:

this is it (jstack of particular yarn container) -> http://pastebin.com/eAdiUYKK
thanks, Antony. 

     On Wednesday, 24 December 2014, 16:34, Ted Yu <yuzhih...@gmail.com> wrote:
   
 

 bq. even when testing with the example from the stock hbase_outputformat.py
Can you take jstack of the above and pastebin it ?
Thanks
On Wed, Dec 24, 2014 at 4:49 AM, Antony Mayi <antonym...@yahoo.com.invalid> 
wrote:

Hi,
have been using this without any issues with spark 1.1.0 but after upgrading to 
1.2.0 saving a RDD from pyspark using saveAsNewAPIHadoopDataset into HBase just 
hangs - even when testing with the example from the stock hbase_outputformat.py.
anyone having same issue? (and able to solve?)
using hbase 0.98.6 and yarn-client mode.
thanks,Antony.

Re: saveAsNewAPIHadoopDataset against hbase hanging in pyspark 1.2.0

Reply via email to