Re: Spark 1.5.2 memory error

2016-02-02 Thread Jim Green
Look at part#3 in below blog: http://www.openkb.info/2015/06/resource-allocation-configurations-for.html You may want to increase the executor memory, not just the spark.yarn.executor.memoryOverhead. On Tue, Feb 2, 2016 at 2:14 PM, Stefan Panayotov wrote: > For the memoryOvethead I have the def

Array column stored as “.bag” in parquet file instead of “REPEATED INT64"

2015-08-27 Thread Jim Green
Hi Team, Say I have a test.json file: {"c1":[1,2,3]} I can create a parquet file like : var df = sqlContext.load("/tmp/test.json","json") var df_c = df.repartition(1) df_c.select("*").save("/tmp/testjson_spark","parquet”) The output parquet file’s schema is like: c1: OPTIONAL F:1 .bag:

Re: Is SPARK-3322 fixed in latest version of Spark?

2015-08-05 Thread Jim Green
: > ConnectionManager has been deprecated and is no longer used by default > (NettyBlockTransferService is the replacement). Hopefully you would no > longer see these messages unless you have explicitly flipped it back on. > > On Tue, Aug 4, 2015 at 6:14 PM, Jim Green wrote: >

Re: Is SPARK-3322 fixed in latest version of Spark?

2015-08-04 Thread Jim Green
And also https://issues.apache.org/jira/browse/SPARK-3106 This one is still open. On Tue, Aug 4, 2015 at 6:12 PM, Jim Green wrote: > *Symotom:* > Even sample job fails: > $ MASTER=spark://xxx:7077 run-example org.apache.spark.examples.SparkPi 10 > Pi is roughly 3.14

Is SPARK-3322 fixed in latest version of Spark?

2015-08-04 Thread Jim Green
*Symotom:* Even sample job fails: $ MASTER=spark://xxx:7077 run-example org.apache.spark.examples.SparkPi 10 Pi is roughly 3.140636 ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(xxx,) not found WARN ConnectionManager: All connections not cleaned up Found https

Resource allocation configurations for Spark on Yarn

2015-06-12 Thread Jim Green
Hi Team, Sharing one article which summarize the Resource allocation configurations for Spark on Yarn: Resource allocation configurations for Spark on Yarn -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop/Datab

Re: New combination-like RDD based on two RDDs

2015-02-04 Thread Jim Green
You should use join: val rdd1 = sc.parallelize(List((1,(3)), (2,(5)), (3,(6 val rdd2 = sc.parallelize(List((2,(1)), (2,(3)), (3,(9 rdd1.join(rdd2).collect res0: Array[(Int, (Int, Int))] = Array((2,(5,1)), (2,(5,3)), (3,(6,9))) Please see my cheat sheet at * 3.14 join(otherDataset, [numTas

Scala on Spark functions examples cheatsheet.

2015-02-02 Thread Jim Green
Hi Team, I just spent some time these 2 weeks on Scala and tried all Scala on Spark functions in the Spark Programming Guide . If you need example codes of Scala on Spark functions, I created this cheat sheet

Spark impersonation

2015-02-02 Thread Jim Green
Hi Team, Does spark support impersonation? For example, when spark on yarn/hive/hbase/etc..., which user is used by default? The user which starts the spark job? Any suggestions related to impersonation? -- Thanks, www.openkb.info (Open KnowledgeBase for Hadoop/Database/OS/Network/Tool)

Re: Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-28 Thread Jim Green
Thanks for all respnding. Finally I figured out the way to use bulk load to hbase using scala on spark. The sample code is here which others can refer in future: http://www.openkb.info/2015/01/how-to-use-scala-on-spark-to-load-data.html Thanks! On Tue, Jan 27, 2015 at 6:27 PM, Jim Green wrote

Re: Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Jim Green
= Bytes.toBytes("val_xxx") > >val kv = new KeyValue(rowkeyBytes,colfam,qual,value) >List(kv) > } > > > Thanks, > Sun > -- > fightf...@163.com > > > *From:* Jim Green > *Date:

Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Jim Green
tBytes(), "c1".getBytes(), ("value_xxx").getBytes()) (new ImmutableBytesWritable(Bytes.toBytes(x)), put) }) rdd.saveAsNewAPIHadoopFile("/tmp/13", classOf[ImmutableBytesWritable], classOf[KeyValue], classOf[HFileOutputFormat], conf) On Tue, Jan 27, 2015 at 12:17 PM,

Re: Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Jim Green
public void write(ImmutableBytesWritable row, KeyValue kv) > > Meaning, KeyValue is expected, not Put. > > On Tue, Jan 27, 2015 at 10:54 AM, Jim Green wrote: > >> Hi Team, >> >> I need some help on writing a scala to bulk load some data into hbase. >> *Env:* &

Bulk loading into hbase using saveAsNewAPIHadoopFile

2015-01-27 Thread Jim Green
Hi Team, I need some help on writing a scala to bulk load some data into hbase. *Env:* hbase 0.94 spark-1.0.2 I am trying below code to just bulk load some data into hbase table “t1”. import org.apache.spark._ import org.apache.spark.rdd.NewHadoopRDD import org.apache.hadoop.hbase.{HBaseConfigur