Re: Spark RDD + HBase: adoption trend

2021-01-20 Thread Sean Owen
RDDs are still relevant in a few ways - there is no Dataset in Python for example, so RDD is still the 'typed' API. They still underpin DataFrames. And of course it's still there because there's probably still a lot of code out there that uses it. Occasionally it's still useful to drop into that AP

Re: Spark RDD + HBase: adoption trend

2021-01-20 Thread Jacek Laskowski
Hi Marco, IMHO RDD is only for very sophisticated use cases that very few Spark devs would be capable of. I consider RDD API a sort of Spark assembler and most Spark devs should stick to Dataset API. Speaking of HBase, see https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/bigta

Spark RDD + HBase: adoption trend

2021-01-20 Thread Marco Firrincieli
Hi, my name is Marco and I'm one of the developers behind  https://github.com/unicredit/hbase-rdd  a project we are currently reviewing for various reasons. We were basically wondering if RDD "is still a thing" nowadays (we see lots of usage for DataFrames or Datasets) and we're not sure how much