RDDs are still relevant in a few ways - there is no Dataset in Python for example, so RDD is still the 'typed' API. They still underpin DataFrames. And of course it's still there because there's probably still a lot of code out there that uses it. Occasionally it's still useful to drop into that API for certain operations.
If that's a connector to read data from HBase - you probably do want to return DataFrames ideally. Unless you're relying on very specific APIs from very specific versions, I wouldn't think a distro's Spark or HBase is much different? On Wed, Jan 20, 2021 at 7:44 AM Marco Firrincieli <mfi...@hotmail.com> wrote: > Hi, my name is Marco and I'm one of the developers behind > https://github.com/unicredit/hbase-rdd > a project we are currently reviewing for various reasons. > > We were basically wondering if RDD "is still a thing" nowadays (we see > lots of usage for DataFrames or Datasets) and we're not sure how much of > the community still works/uses RDDs. > > Also, for lack of time, we always mainly worked using Cloudera-flavored > Hadoop/HBase & Spark versions. We were thinking the community would then > help us organize the project in a more "generic" way, but that didn't > happen. > > So I figured I would ask here what is the gut feeling of the Spark > community so to better define the future of our little library. > > Thanks > > -Marco > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >