RDDs are still relevant in a few ways - there is no Dataset in Python for
example, so RDD is still the 'typed' API. They still underpin DataFrames.
And of course it's still there because there's probably still a lot of code
out there that uses it. Occasionally it's still useful to drop into that
AP
Hi Marco,
IMHO RDD is only for very sophisticated use cases that very few Spark devs
would be capable of. I consider RDD API a sort of Spark assembler and most
Spark devs should stick to Dataset API.
Speaking of HBase, see
https://github.com/GoogleCloudPlatform/java-docs-samples/tree/master/bigta
Hi, my name is Marco and I'm one of the developers behind
https://github.com/unicredit/hbase-rdd
a project we are currently reviewing for various reasons.
We were basically wondering if RDD "is still a thing" nowadays (we see lots of
usage for DataFrames or Datasets) and we're not sure how much