RDDs are still relevant in a few ways - there is no Dataset in Python for
example, so RDD is still the 'typed' API. They still underpin DataFrames.
And of course it's still there because there's probably still a lot of code
out there that uses it. Occasionally it's still useful to drop into that
API for certain operations.

If that's a connector to read data from HBase - you probably do want to
return DataFrames ideally.
Unless you're relying on very specific APIs from very specific versions, I
wouldn't think a distro's Spark or HBase is much different?

On Wed, Jan 20, 2021 at 7:44 AM Marco Firrincieli <mfi...@hotmail.com>
wrote:

> Hi, my name is Marco and I'm one of the developers behind
> https://github.com/unicredit/hbase-rdd
> a project we are currently reviewing for various reasons.
>
> We were basically wondering if RDD "is still a thing" nowadays (we see
> lots of usage for DataFrames or Datasets) and we're not sure how much of
> the community still works/uses RDDs.
>
> Also, for lack of time, we always mainly worked using Cloudera-flavored
> Hadoop/HBase & Spark versions. We were thinking the community would then
> help us organize the project in a more "generic" way, but that didn't
> happen.
>
> So I figured I would ask here what is the gut feeling of the Spark
> community so to better define the future of our little library.
>
> Thanks
>
> -Marco
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to