Re: HBase connector does not read ZK configuration from Spark session

2018-02-23 Thread Deepak Sharma
Hi Dharmin With the 1st approach , you will have to read the properties from the --files using this below: SparkFiles.get('file.txt') Or else , you can copy the file to hdfs , read it using sc.textFile and use the property within it. If you add files using --files , it gets copied to executor's w

Re: HBase connector does not read ZK configuration from Spark session

2018-02-22 Thread Jorge Machado
Can it be that you are missing the HBASE_HOME var ? Jorge Machado > On 23 Feb 2018, at 04:55, Dharmin Siddesh J wrote: > > I am trying to write a Spark program that reads data from HBase and store it > in DataFrame. > > I am able to run it perfectly with hbase-site.xml in the $SPARK_HOM

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher, I found a profile for Spark 2.11 and removed it. Now, it brings in 2.10. I ran some code and got further. Now, I get this error below when I do a “df.show”. java.lang.AbstractMethodError at org.apache.spark.Logging$class.log(Logging.scala:50) at org.apache.spark.sql.execu

Re: HBase Spark

2017-02-03 Thread Asher Krim
You can see in the tree what's pulling in 2.11. Your option then will be to either shade them and add an explicit dependency on 2.10.5 in your pom. Alternatively, you can explore upgrading your project to 2.11 (which will require using a 2_11 build of spark) On Fri, Feb 3, 2017 at 2:03 PM, Benjam

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher, You’re right. I don’t see anything but 2.11 being pulled in. Do you know where I can change this? Cheers, Ben > On Feb 3, 2017, at 10:50 AM, Asher Krim wrote: > > Sorry for my persistence, but did you actually run "mvn dependency:tree > -Dverbose=true"? And did you see only scala 2.1

Re: HBase Spark

2017-02-03 Thread Asher Krim
Sorry for my persistence, but did you actually run "mvn dependency:tree -Dverbose=true"? And did you see only scala 2.10.5 being pulled in? On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim wrote: > Asher, > > It’s still the same. Do you have any other ideas? > > Cheers, > Ben > > > On Feb 3, 2017,

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher, It’s still the same. Do you have any other ideas? Cheers, Ben > On Feb 3, 2017, at 8:16 AM, Asher Krim wrote: > > Did you check the actual maven dep tree? Something might be pulling in a > different version. Also, if you're seeing this locally, you might want to > check which version

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
I'll clean up any .m2 or .ivy directories. And try again. I ran this on our lab cluster for testing. Cheers, Ben On Fri, Feb 3, 2017 at 8:16 AM Asher Krim wrote: > Did you check the actual maven dep tree? Something might be pulling in a > different version. Also, if you're seeing this locally

Re: HBase Spark

2017-02-03 Thread Asher Krim
Did you check the actual maven dep tree? Something might be pulling in a different version. Also, if you're seeing this locally, you might want to check which version of the scala sdk your IDE is using Asher Krim Senior Software Engineer On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim wrote: > Hi

Re: HBase Spark

2017-02-02 Thread Benjamin Kim
Hi Asher, I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java (1.8) version as our installation. The Scala (2.10.5) version is already the same as ours. But I’m still getting the same error. Can you think of anything else? Cheers, Ben > On Feb 2, 2017, at 11:06 AM, Asher

Re: HBase Spark

2017-02-02 Thread Asher Krim
Ben, That looks like a scala version mismatch. Have you checked your dep tree? Asher Krim Senior Software Engineer On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim wrote: > Elek, > > Can you give me some sample code? I can’t get mine to work. > > import org.apache.spark.sql.{SQLContext, _} > impor

Re: HBase Spark

2017-02-02 Thread Benjamin Kim
Elek, Can you give me some sample code? I can’t get mine to work. import org.apache.spark.sql.{SQLContext, _} import org.apache.spark.sql.execution.datasources.hbase._ import org.apache.spark.{SparkConf, SparkContext} def cat = s"""{ |"table":{"namespace":"ben", "name":"dmp_test", "tableCod

Re: HBase Spark

2017-01-31 Thread Benjamin Kim
Elek, If I cannot use the HBase Spark module, then I’ll give it a try. Thanks, Ben > On Jan 31, 2017, at 1:02 PM, Marton, Elek wrote: > > > I tested this one with hbase 1.2.4: > > https://github.com/hortonworks-spark/shc > > Marton > > On 01/31/2017 09:17 PM, Benjamin Kim wrote: >> Does a

Re: HBase Spark

2017-01-31 Thread Marton, Elek
I tested this one with hbase 1.2.4: https://github.com/hortonworks-spark/shc Marton On 01/31/2017 09:17 PM, Benjamin Kim wrote: Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I tried to build it from source, but I cannot get it to work. Thanks, Ben -

Re: Hbase and Spark

2017-01-29 Thread Sudev A C
Hi Masf, Do try the official Hbase Spark. https://hbase.apache.org/book.html#spark I think you will have to build the jar from source and run your spark program with --packages . https://spark-packages.org/package/hortonworks-spark/shc says it's not yet published to Spark packages or Maven Repo.

Re: Hbase Connection not seraializible in Spark -> foreachrdd

2016-09-22 Thread KhajaAsmath Mohammed
Thanks Das and Ayan. Do you have any refrences on how to create connection pool for hbase inside foreachpartitions as mentioned in guide. In my case, I have to use kerberos hbase cluster. On Wed, Sep 21, 2016 at 6:39 PM, Tathagata Das wrote: > http://spark.apache.org/docs/latest/streaming-progr

Re: Hbase Connection not seraializible in Spark -> foreachrdd

2016-09-21 Thread Tathagata Das
http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd On Wed, Sep 21, 2016 at 4:26 PM, ayan guha wrote: > Connection object is not serialisable. You need to implement a getorcreate > function which would run on each executors to create hbase co

Re: Hbase Connection not seraializible in Spark -> foreachrdd

2016-09-21 Thread ayan guha
Connection object is not serialisable. You need to implement a getorcreate function which would run on each executors to create hbase connection locally. On 22 Sep 2016 08:34, "KhajaAsmath Mohammed" wrote: > Hello Everyone, > > I am running spark application to push data from kafka. I am able to

RE: HBase-Spark Module

2016-07-29 Thread David Newberger
Hi Ben, This seems more like a question for community.cloudera.com. However, it would be in hbase not spark I believe. https://repository.cloudera.com/artifactory/webapp/#/artifacts/browse/tree/General/cloudera-release-repo/org/apache/hbase/hbase-spark David Newberger -Original Message--

Re: HBase / Spark Kerberos problem

2016-05-19 Thread Arun Natva
.scala > [2] > https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala > > > From: John Trengrove [mailto:john.trengr...@servian.com.au] > Sent: 19 May 2016 08:09 > To: philipp.meyerhoe...@thomsonreuters.com &g

RE: HBase / Spark Kerberos problem

2016-05-19 Thread philipp.meyerhoefer
edentials” and the .count() on my HBase RDD works fine. From: Ellis, Tom (Financial Markets IT) [mailto:tom.el...@lloydsbanking.com] Sent: 19 May 2016 09:51 To: 'John Trengrove'; Meyerhoefer, Philipp (TR Technology & Ops) Cc: user Subject: RE: HBase / Spark Kerberos problem Yeah

RE: HBase / Spark Kerberos problem

2016-05-19 Thread Ellis, Tom (Financial Markets IT)
ngrove [mailto:john.trengr...@servian.com.au] Sent: 19 May 2016 08:09 To: philipp.meyerhoe...@thomsonreuters.com Cc: user Subject: Re: HBase / Spark Kerberos problem -- This email has reached the Bank via an external source -- Have you had a look at this issue? https://issues.apache.org/jira/browse/SPARK-12279

Re: HBase / Spark Kerberos problem

2016-05-19 Thread John Trengrove
Have you had a look at this issue? https://issues.apache.org/jira/browse/SPARK-12279 There is a comment by Y Bodnar on how they successfully got Kerberos and HBase working. 2016-05-18 18:13 GMT+10:00 : > Hi all, > > I have been puzzling over a Kerberos problem for a while now and wondered > if

Re: HBASE

2016-03-09 Thread Mich Talebzadeh
I agree with Ted's assessment. Big Data space is getting crowded with an amazing array of tools and utilities some disappearing like meteors. Hadoop is definitely a keeper. So are Hive and Spark. Hive is the most stable Data Warehouse on Big Data and Spark is offering an array of impressive tools.

Re: HBASE

2016-03-09 Thread Ted Yu
bq. it is kind of columnar NoSQL database. The storage format in HBase is not columnar. I would suggest you build upon what you already know (Spark and Hive) and expand on that. Also, if your work uses Big Data technologies, those would be the first to consider getting to know better. On Wed, Ma

Re: Hbase in spark

2016-02-26 Thread Ted Yu
I know little about your use case. Did you mean that your data is relatively evenly distributed in Spark domain but showed skew in the bulk load phase ? On Fri, Feb 26, 2016 at 9:02 AM, Renu Yadav wrote: > Hi Ted, > > Thanks for the reply. I am using spark hbase module only but the problem > is

Re: Hbase in spark

2016-02-26 Thread Ted Yu
In hbase, there is hbase-spark module which supports bulk load. This module is to be backported in the upcoming 1.3.0 release. There is some pending work, such as HBASE-15271 . FYI On Fri, Feb 26, 2016 at 8:50 AM, Renu Yadav wrote: > Has anybody implemented bulk load into hbase using spark? >

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-22 Thread Ajinkya Kale
I tried --jars which supposedly does that but that did not work. On Fri, Jan 22, 2016 at 4:33 PM Ajinkya Kale wrote: > Hi Ted, > Is there a way for the executors to have the hbase-protocol jar on their > classpath ? > > On Fri, Jan 22, 2016 at 4:00 PM Ted Yu wrote: > >> The class path formation

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-22 Thread Ajinkya Kale
Hi Ted, Is there a way for the executors to have the hbase-protocol jar on their classpath ? On Fri, Jan 22, 2016 at 4:00 PM Ted Yu wrote: > The class path formations on driver and executors are different. > > Cheers > > On Fri, Jan 22, 2016 at 3:25 PM, Ajinkya Kale > wrote: > >> Is this issue

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-22 Thread Ted Yu
The class path formations on driver and executors are different. Cheers On Fri, Jan 22, 2016 at 3:25 PM, Ajinkya Kale wrote: > Is this issue only when the computations are in distributed mode ? > If I do (pseudo code) : > rdd.collect.call_to_hbase I dont get this error, > > but if I do : > rdd

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-22 Thread Ajinkya Kale
Is this issue only when the computations are in distributed mode ? If I do (pseudo code) : rdd.collect.call_to_hbase I dont get this error, but if I do : rdd.call_to_hbase.collect it throws this error. On Wed, Jan 20, 2016 at 6:50 PM Ajinkya Kale wrote: > Unfortunately I cannot at this moment

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ajinkya Kale
Unfortunately I cannot at this moment (not a decision I can make) :( On Wed, Jan 20, 2016 at 6:46 PM Ted Yu wrote: > I am not aware of a workaround. > > Can you upgrade to 0.98.4+ release ? > > Cheers > > On Wed, Jan 20, 2016 at 6:26 PM, Ajinkya Kale > wrote: > >> Hi Ted, >> >> Thanks for respo

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ted Yu
I am not aware of a workaround. Can you upgrade to 0.98.4+ release ? Cheers On Wed, Jan 20, 2016 at 6:26 PM, Ajinkya Kale wrote: > Hi Ted, > > Thanks for responding. > Is there a work around for 0.98.0 ? Adding the hbase-protocol jar to > HADOOP_CLASSPATH didnt work for me. > > On Wed, Jan 20,

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ajinkya Kale
Hi Ted, Thanks for responding. Is there a work around for 0.98.0 ? Adding the hbase-protocol jar to HADOOP_CLASSPATH didnt work for me. On Wed, Jan 20, 2016 at 6:14 PM Ted Yu wrote: > 0.98.0 didn't have fix from HBASE-8 > > Please upgrade your hbase version and try again. > > If still there

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ted Yu
0.98.0 didn't have fix from HBASE-8 Please upgrade your hbase version and try again. If still there is problem, please pastebin the stack trace. Thanks On Wed, Jan 20, 2016 at 5:41 PM, Ajinkya Kale wrote: > > I have posted this on hbase user list but i thought makes more sense on > spark

Re: HBase Spark Streaming giving error after restore

2015-10-16 Thread Aniket Bhatnagar
Can you try changing classOf[OutputFormat[String, BoxedUnit]] to classOf[OutputFormat[String, Put]] while configuring hconf? On Sat, Oct 17, 2015, 11:44 AM Amit Hora wrote: > Hi, > > Regresta for delayed resoonse > please find below full stack trace > > ava.lang.ClassCastException: scala.runtime

Re: HBase Spark Streaming giving error after restore

2015-10-16 Thread Amit Hora
Hi, Regresta for delayed resoonse please find below full stack trace ava.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to org.apache.hadoop.hbase.client.Mutation at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:85) at

Re: HBase Spark Streaming giving error after restore

2015-10-16 Thread Ted Yu
Can you show the complete stack trace ? Subclass of Mutation is expected. Put is a subclass. Have you tried replacing BoxedUnit with Put in your code ? Cheers On Fri, Oct 16, 2015 at 6:02 AM, Amit Singh Hora wrote: > Hi All, > > I am using below code to stream data from kafka to hbase ,everyt

Re: Hbase Spark streaming issue.

2015-09-24 Thread Shixiong Zhu
Looks like you have an incompatible hbase-default.xml in some place. You can use the following code to find the location of "hbase-default.xml" println(Thread.currentThread().getContextClassLoader().getResource("hbase-default.xml")) Best Regards, Shixiong Zhu 2015-09-21 15:46 GMT+08:00 Siva : >

Re: Hbase Lookup

2015-09-03 Thread Ted Yu
Ayan: Please read this: http://hbase.apache.org/book.html#cp Cheers On Thu, Sep 3, 2015 at 2:13 PM, ayan guha wrote: > Hi > > Thanks for your comments. My driving point is instead of loading Hbase > data entirely I want to process record by record lookup and that is best > done in UDF or map fu

Re: Hbase Lookup

2015-09-03 Thread ayan guha
Hi Thanks for your comments. My driving point is instead of loading Hbase data entirely I want to process record by record lookup and that is best done in UDF or map function. I also would loved to do it in Spark but no production cluster yet here :( @Franke: I do not have enough competency on co

Re: Hbase Lookup

2015-09-03 Thread Tao Lu
But I don't see how it works here with phoenix or hbase coprocessor. Remember we are joining 2 big data sets here, one is the big file in HDFS, and records in HBASE. The driving force comes from Hadoop cluster. On Thu, Sep 3, 2015 at 11:37 AM, Jörn Franke wrote: > If you use pig or spark you

Re: Hbase Lookup

2015-09-03 Thread Jörn Franke
If you use pig or spark you increase the complexity from an operations management perspective significantly. Spark should be seen from a platform perspective if it make sense. If you can do it directly with hbase/phoenix or only hbase coprocessor then this should be preferred. Otherwise you pay mor

Re: Hbase Lookup

2015-09-03 Thread Tao Lu
Yes. Ayan, you approach will work. Or alternatively, use Spark, and write a Scala/Java function which implements similar logic in your Pig UDF. Both approaches look similar. Personally, I would go with Spark solution, it will be slightly faster, and easier if you already have Spark cluster setup

Re: Hbase Lookup

2015-09-02 Thread ayan guha
Thanks for your info. I am planning to implement a pig udf to do record look ups. Kindly let me know if this is a good idea. Best Ayan On Thu, Sep 3, 2015 at 2:55 PM, Jörn Franke wrote: > > You may check if it makes sense to write a coprocessor doing an upsert for > you, if it does not exist al

Re: Hbase Lookup

2015-09-02 Thread Jörn Franke
You may check if it makes sense to write a coprocessor doing an upsert for you, if it does not exist already. Maybe phoenix for Hbase supports this already. Another alternative, if the records do not have an unique Id, is to put them into a text index engine, such as Solr or Elasticsearch, which d

Re: HBase HTable constructor hangs

2015-04-30 Thread Ted Yu
Thanks for your confirmation. On Thu, Apr 30, 2015 at 10:17 AM, Tridib Samanta wrote: > You are right. After I moved from HBase 0.98.1 to 1.0.0 this problem got > solved. Thanks all! > > -- > Date: Wed, 29 Apr 2015 06:58:59 -0700 > Subject

RE: HBase HTable constructor hangs

2015-04-30 Thread Tridib Samanta
You are right. After I moved from HBase 0.98.1 to 1.0.0 this problem got solved. Thanks all! Date: Wed, 29 Apr 2015 06:58:59 -0700 Subject: Re: HBase HTable constructor hangs From: yuzhih...@gmail.com To: tridib.sama...@live.com CC: d...@ocirs.com; user@spark.apache.org Can you verify whether

Re: HBase HTable constructor hangs

2015-04-29 Thread Ted Yu
Table.java:192) > > Thanks > Tridib > > -- > From: d...@ocirs.com > Date: Tue, 28 Apr 2015 22:24:39 -0700 > Subject: Re: HBase HTable constructor hangs > To: tridib.sama...@live.com > > In that case, something else is failing and the r

RE: HBase HTable constructor hangs

2015-04-29 Thread Tridib Samanta
...@ocirs.com Date: Tue, 28 Apr 2015 22:24:39 -0700 Subject: Re: HBase HTable constructor hangs To: tridib.sama...@live.com In that case, something else is failing and the reason HBase looks like it hangs is that the hbase timeout or retry count is too high. Try setting the following conf and hbase will only

RE: HBase HTable constructor hangs

2015-04-28 Thread Tridib Samanta
I run the spark-job jar as standalone and execute the HBase client from a main method, it works fine. Same client unable to connect/hangs when the jar is distributed in spark. Thanks Tridib Date: Tue, 28 Apr 2015 21:25:41 -0700 Subject: Re: HBase HTable constructor hangs From: yuzhih...@gmai

Re: HBase HTable constructor hangs

2015-04-28 Thread Ted Yu
; org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.ut

RE: HBase HTable constructor hangs

2015-04-28 Thread Tridib Samanta
oncurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Date: Tue, 28 Apr 2015 19:35:26 -0700 Subject: Re: HBase HTable constructor hangs From: yuzhih...@gmail.co

Re: HBase HTable constructor hangs

2015-04-28 Thread Ted Yu
Can you give us more information ? Such as hbase release, Spark release. If you can pastebin jstack of the hanging HTable process, that would help. BTW I used http://search-hadoop.com/?q=spark+HBase+HTable+constructor+hangs and saw a very old thread with this subject. Cheers On Tue, Apr 28, 201

Re: HBase HTable constructor hangs

2015-04-28 Thread tridib
I am exactly having same issue. I am running hbase and spark in docker container. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/HBase-HTable-constructor-hangs-tp4926p22696.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

RE: hbase sql query

2015-03-12 Thread Udbhav Agarwal
JavaSchemaRdds and sqlContext.sql only. Isn’t it ?? Thanks, Udbhav Agarwal From: Todd Nist [mailto:tsind...@gmail.com] Sent: 12 March, 2015 6:19 PM To: Udbhav Agarwal Cc: Akhil Das; user@spark.apache.org Subject: Re: hbase sql query Ah, missed that java was a requirement. What distribution of

Re: hbase sql query

2015-03-12 Thread Todd Nist
scala, I was looking for some help with > java Apis. > > > > *Thanks,* > > *Udbhav Agarwal* > > > > *From:* Todd Nist [mailto:tsind...@gmail.com] > *Sent:* 12 March, 2015 5:28 PM > *To:* Udbhav Agarwal > *Cc:* Akhil Das; user@spark.apache.org > *Subject:*

RE: hbase sql query

2015-03-12 Thread Udbhav Agarwal
Thanks Todd, But this link is also based on scala, I was looking for some help with java Apis. Thanks, Udbhav Agarwal From: Todd Nist [mailto:tsind...@gmail.com] Sent: 12 March, 2015 5:28 PM To: Udbhav Agarwal Cc: Akhil Das; user@spark.apache.org Subject: Re: hbase sql query Have you

Re: hbase sql query

2015-03-12 Thread Todd Nist
avaSchemaRdd and then sqlContext.sql(sql query). Ryt ? > > > > > > > > > > *Thanks,* > > *Udbhav Agarwal* > > > > *From:* Akhil Das [mailto:ak...@sigmoidanalytics.com] > *Sent:* 12 March, 2015 11:43 AM > *To:* Udbhav Agarwal > *Cc:* use

RE: hbase sql query

2015-03-12 Thread Udbhav Agarwal
@spark.apache.org Subject: Re: hbase sql query Like this? val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]).cache() Here's a complete example&

Re: hbase sql query

2015-03-11 Thread Akhil Das
Like this? val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]).cache() Here's a complete example

Re: HBase 0.96+ with Spark 1.0+

2014-09-18 Thread Ted Yu
[2]: HBase 0.96+ with Spark 1.0+ >>>> >>>> >>>> Dependency hell... My fav problem :). >>>> >>>> I had run into a similar issue with hbase and jetty. I cant remember thw >>>> exact fix, but is are excerpts from my dependenc

Re: HBase 0.96+ with Spark 1.0+

2014-09-18 Thread Reinis Vicups
% "3.0.20100224" ) On Sep 11, 2014 8:05 PM, mailto:sp...@orbit-x.de>> wrote: Hi guys, any luck with this issue, anyone? I aswell tried all the possible exclusion combos to a no avail. thanks for your ideas

RE: HBase and non-existent TableInputFormat

2014-09-16 Thread abraham.jacob
To: Jacob, Abraham (Financial&Risk) Cc: tq00...@gmail.com; user Subject: Re: HBase and non-existent TableInputFormat Btw, there are some examples in the Spark GitHub repo that you may find helpful. Here's one<https://github.com/apache/spark/blob/master/examples/src/main/scala/org/

Re: HBase and non-existent TableInputFormat

2014-09-16 Thread Nicholas Chammas
Btw, there are some examples in the Spark GitHub repo that you may find helpful. Here's one related to HBase. On Tue, Sep 16, 2014 at 1:22 PM, wrote: > *Hi, * > > > > *I had a similar

RE: HBase and non-existent TableInputFormat

2014-09-16 Thread abraham.jacob
Hi, I had a similar situation in which I needed to read data from HBase and work with the data inside of a spark context. After much ggling, I finally got mine to work. There are a bunch of steps that you need to do get this working - The problem is that the spark context does not know anyt

Re: HBase and non-existent TableInputFormat

2014-09-16 Thread Ted Yu
hbase-client module serves client facing APIs. hbase-server module is supposed to host classes used on server side. There is still some work to be done so that the above goal is achieved. On Tue, Sep 16, 2014 at 9:06 AM, Y. Dong wrote: > Thanks Ted. It is indeed in hbase-server. Just curious, w

Re: HBase and non-existent TableInputFormat

2014-09-16 Thread Ted Yu
bq. TableInputFormat does not even exist in hbase-client API It is in hbase-server module. Take a look at http://hbase.apache.org/book.html#mapreduce.example.read On Tue, Sep 16, 2014 at 8:18 AM, Y. Dong wrote: > Hello, > > I’m currently using spark-core 1.1 and hbase 0.98.5 and I want to simp

Re: HBase 0.96+ with Spark 1.0+

2014-09-14 Thread Reinis Vicups
hadoop2MapRedClient, hadoop2Common, "org.mortbay.jetty" % "servlet-api" % "3.0.20100224" ) On Sep 11, 2014 8:05 PM, mailto:sp...@orbit-x.de>> wrote: Hi guys, any luck with this issue, an

Re: Hbase

2014-08-01 Thread Madabhattula Rajesh Kumar
Hi Akhil, Thank you very much for your help and support. Regards, Rajesh On Fri, Aug 1, 2014 at 7:57 PM, Akhil Das wrote: > Here's a piece of code. In your case, you are missing the call() method > inside the map function. > > > import java.util.Iterator; > > import java.util.List; > > import

Re: Hbase

2014-08-01 Thread Akhil Das
Here's a piece of code. In your case, you are missing the call() method inside the map function. import java.util.Iterator; import java.util.List; import org.apache.commons.configuration.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.KeyValue;

Re: Hbase

2014-08-01 Thread Madabhattula Rajesh Kumar
Hi Akhil, Thank you for your response. I'm facing below issues. I'm not able to print the values. Am I missing any thing. Could you please look into this issue. JavaPairRDD hBaseRDD = sc.newAPIHadoopRDD( conf, TableInputFormat.class, ImmutableBytesWritable.class, Resu

Re: Hbase

2014-07-31 Thread Akhil Das
You can use a map function like the following and do whatever you want with the Result. Function, Iterator>{ > public Iterator call(Tuple2 Result> test) { > Result tmp = (Result) test._2; > List kvl = *tmp.getColumn("post".getBytes(), > "title".getBytes());* > for(KeyValue

Re: HBase 0.96+ with Spark 1.0+

2014-06-28 Thread Stephen Boesch
Hi Siyuan, Thanks for the input. We are preferring to use the SparkBuild.scala instead of maven. I did not see any protobuf.version related settings in that file. But - as noted by Sean Owen - in any case the issue we are facing presently is about the duplicate incompatible javax.servlet entries

Re: HBase 0.96+ with Spark 1.0+

2014-06-28 Thread Siyuan he
Hi Stephen, I am using spark1.0+ HBase0.96.2. This is what I did: 1) rebuild spark using: mvn -Dhadoop.version=2.3.0 -Dprotobuf.version=2.5.0 -DskipTests clean package 2) In spark-env.sh, set SPARK_CLASSPATH = /path-to/hbase-protocol-0.96.2-hadoop2.jar Hopefully it can help. Siyuan On Sat, Jun

Re: HBase 0.96+ with Spark 1.0+

2014-06-28 Thread Stephen Boesch
Thanks Sean. I had actually already added exclusion rule for org.mortbay.jetty - and that had not resolved it. Just in case I used your precise formulation: val excludeMortbayJetty = ExclusionRule(organization = "org.mortbay.jetty") .. ,("org.apache.spark" % "spark-core_2.10" % sparkVersion w

Re: HBase 0.96+ with Spark 1.0+

2014-06-28 Thread Sean Owen
This sounds like an instance of roughly the same item as in https://issues.apache.org/jira/browse/SPARK-1949 Have a look at adding that exclude to see if it works. On Fri, Jun 27, 2014 at 10:21 PM, Stephen Boesch wrote: > The present trunk is built and tested against HBase 0.94. > > > I have tri

Re: hbase scan performance

2014-04-10 Thread Patrick Wendell
This job might still be faster... in MapReduce there will be other overheads in addition to the fact that doing sequential reads from HBase is slow. But it's possible the bottleneck is the HBase scan performance. - Patrick On Wed, Apr 9, 2014 at 10:10 AM, Jerry Lam wrote: > Hi Dave, > > This i

Re: hbase scan performance

2014-04-09 Thread Jerry Lam
Hi Dave, This is HBase solution to the poor scan performance issue: https://issues.apache.org/jira/browse/HBASE-8369 I encountered the same issue before. To the best of my knowledge, this is not a mapreduce issue. It is hbase issue. If you are planning to swap out mapreduce and replace it with sp

Re: HBase row count

2014-02-26 Thread Nick Pentreath
Currently no there is no way to save the web ui details. There was some discussion around adding this on the mailing list but no change as yet — Sent from Mailbox for iPhone On Tue, Feb 25, 2014 at 7:23 PM, Soumitra Kumar wrote: > Found the issue, actually splits in HBase was not uniform, so on

Re: HBase row count

2014-02-25 Thread Soumitra Kumar
Found the issue, actually splits in HBase was not uniform, so one job was taking 90% of time. BTW, is there a way to save the details available port 4040 after job is finished? On Tue, Feb 25, 2014 at 7:26 AM, Nick Pentreath wrote: > It's tricky really since you may not know upfront how much da

Re: HBase row count

2014-02-25 Thread Nick Pentreath
It's tricky really since you may not know upfront how much data is in there. You could possibly take a look at how much data is in the HBase tables to get an idea. It may take a bit of trial and error, like running out of memory trying to cache the dataset, and checking the Spark UI on port 4040 t

Re: HBase row count

2014-02-25 Thread Soumitra Kumar
Thanks Nick. How do I figure out if the RDD fits in memory? On Tue, Feb 25, 2014 at 1:04 AM, Nick Pentreath wrote: > cache only caches the data on the first action (count) - the first time it > still needs to read the data from the source. So the first time you call > count it will take the sam

Re: HBase row count

2014-02-25 Thread Koert Kuipers
i find them both somewhat confusing actually. * RDD.cache is lazy, and mutates the RDD in place * RDD.unpersist has a direct effect of unloading, and also mutates the RDD in place to disable future lazy caching i have found that if i need to unload an RDD from memory, but still want it to be cache

Re: HBase row count

2014-02-25 Thread Cheng Lian
BTW, unlike RDD.cache(), the reverse operation RDD.unpersist() is not lazy, which is somewhat confusing... On Tue, Feb 25, 2014 at 7:48 PM, Cheng Lian wrote: > RDD.cache() is a lazy operation, the method itself doesn't perform the > cache operation, it just asks Spark runtime to cache the conte

Re: HBase row count

2014-02-25 Thread Cheng Lian
RDD.cache() is a lazy operation, the method itself doesn't perform the cache operation, it just asks Spark runtime to cache the content of the RDD when the first action is invoked. In your case, the first action is the first count() call, which conceptually does 3 things: 1. Performs the HBase

Re: HBase row count

2014-02-25 Thread Nick Pentreath
cache only caches the data on the first action (count) - the first time it still needs to read the data from the source. So the first time you call count it will take the same amount of time whether cache is enabled or not. The second time you call count on a cached RDD, you should see that it take

Re: HBase row count

2014-02-24 Thread Soumitra Kumar
I did try with 'hBaseRDD.cache()', but don't see any improvement. My expectation is that with cache enabled, there should not be any penalty of 'hBaseRDD.count' call. On Mon, Feb 24, 2014 at 11:29 PM, Nick Pentreath wrote: > Yes, you''re initiating a scan for each count call. The normal way to

Re: HBase row count

2014-02-24 Thread Nick Pentreath
Yes, you''re initiating a scan for each count call. The normal way to improve this would be to use cache(), which is what you have in your commented out line: // hBaseRDD.cache() If you uncomment that line, you should see an improvement overall. If caching is not an option for some reason (maybe