On Tue, Apr 19, 2016 at 11:07 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> The same question can be asked w.r.t. examples for other projects, such as > flume > and kafka. > The main difference being that flume and kafka integration are part of Spark itself. HBase integration is not. > On Tue, Apr 19, 2016 at 11:01 AM, Marcin Tustin <mtus...@handybook.com> > wrote: > >> Let's posit that the spark example is much better than what is available >> in HBase. Why is that a reason to keep it within Spark? >> >> On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> bq. HBase's current support, even if there are bugs or things that >>> still need to be done, is much better than the Spark example >>> >>> In my opinion, a simple example that works is better than a buggy >>> package. >>> >>> I hope before long the hbase-spark module in HBase can arrive at a state >>> which we can advertise as mature - but we're not there yet. >>> >>> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <van...@cloudera.com> >>> wrote: >>> >>>> You're completely missing my point. I'm saying that HBase's current >>>> support, even if there are bugs or things that still need to be done, >>>> is much better than the Spark example, which is basically a call to >>>> "SparkContext.hadoopRDD". >>>> >>>> Spark's example is not helpful in learning how to build an HBase >>>> application on Spark, and clashes head on with how the HBase >>>> developers think it should be done. That, and because it brings too >>>> many dependencies for something that is not really useful, is why I'm >>>> suggesting removing it. >>>> >>>> >>>> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> > There is an Open JIRA for fixing the documentation: HBASE-15473 >>>> > >>>> > I would say the refguide link you provided should not be considered as >>>> > complete. >>>> > >>>> > Note it is marked as Blocker by Sean B. >>>> > >>>> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <van...@cloudera.com >>>> > >>>> > wrote: >>>> >> >>>> >> You're entitled to your own opinions. >>>> >> >>>> >> While you're at it, here's some much better documentation, from the >>>> >> HBase project themselves, than what the Spark example provides: >>>> >> http://hbase.apache.org/book.html#spark >>>> >> >>>> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yuzhih...@gmail.com> >>>> wrote: >>>> >> > bq. it's actually in use right now in spite of not being in any >>>> upstream >>>> >> > HBase release >>>> >> > >>>> >> > If it is not in upstream, then it is not relevant for discussion on >>>> >> > Apache >>>> >> > mailing list. >>>> >> > >>>> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin < >>>> van...@cloudera.com> >>>> >> > wrote: >>>> >> >> >>>> >> >> Alright, if you prefer, I'll say "it's actually in use right now >>>> in >>>> >> >> spite of not being in any upstream HBase release", and it's more >>>> >> >> useful than a single example file in the Spark repo for those who >>>> >> >> really want to integrate with HBase. >>>> >> >> >>>> >> >> Spark's example is really very trivial (just uses one of HBase's >>>> input >>>> >> >> formats), which makes it not very useful as a blueprint for >>>> developing >>>> >> >> HBase apps with Spark. >>>> >> >> >>>> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yuzhih...@gmail.com> >>>> wrote: >>>> >> >> > bq. I wouldn't call it "incomplete". >>>> >> >> > >>>> >> >> > I would call it incomplete. >>>> >> >> > >>>> >> >> > Please see HBASE-15333 'Enhance the filter to handle short, >>>> integer, >>>> >> >> > long, >>>> >> >> > float and double' which is a bug fix. >>>> >> >> > >>>> >> >> > Please exclude presence of related of module in vendor distro >>>> from >>>> >> >> > this >>>> >> >> > discussion. >>>> >> >> > >>>> >> >> > Thanks >>>> >> >> > >>>> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin >>>> >> >> > <van...@cloudera.com> >>>> >> >> > wrote: >>>> >> >> >> >>>> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yuzhih...@gmail.com> >>>> >> >> >> wrote: >>>> >> >> >> > I want to note that the hbase-spark module in HBase is >>>> incomplete. >>>> >> >> >> > Zhan >>>> >> >> >> > has >>>> >> >> >> > several patches pending review. >>>> >> >> >> >>>> >> >> >> I wouldn't call it "incomplete". Lots of functionality is >>>> there, >>>> >> >> >> which >>>> >> >> >> doesn't mean new ones, or more efficient implementations of >>>> existing >>>> >> >> >> ones, can't be added. >>>> >> >> >> >>>> >> >> >> > hbase-spark module is currently only in master branch which >>>> would >>>> >> >> >> > be >>>> >> >> >> > released as 2.0 >>>> >> >> >> >>>> >> >> >> Just as a side note, it's part of CDH 5.7.0, not that it >>>> matters >>>> >> >> >> much >>>> >> >> >> for upstream HBase. >>>> >> >> >> >>>> >> >> >> -- >>>> >> >> >> Marcelo >>>> >> >> > >>>> >> >> > >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> -- >>>> >> >> Marcelo >>>> >> > >>>> >> > >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Marcelo >>>> > >>>> > >>>> >>>> >>>> >>>> -- >>>> Marcelo >>>> >>> >>> >> >> Want to work at Handy? Check out our culture deck and open roles >> <http://www.handy.com/careers> >> Latest news <http://www.handy.com/press> at Handy >> Handy just raised $50m >> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> >> led >> by Fidelity >> >> > -- Marcelo