Hi,
I would recommend reading the book by Matei Zaharia. One of the main
differentiating factors between Spark 1.x and subsequent releases has been
optimization and hence dataframes, and in no way RDD is going away because
dataframes are built on RDD's. The use of RDD's are allowed and is
recommen
Just 2 cents on just one of contributors: while SQL semantic can express
various use cases data scientists encounter, I also agree someone who are
end users who are more familiar with code instead of SQL can feel it is not
flexible.
But counterless efforts have been incorporated into Spark SQL (an
Thanks for bringing this issue to the mailing list.
As an addition, I would also ask the same questions about DStreams and
Structured Streaming APIs.
Structured Streaming is high level and it makes difficult to express all
business logic in it, although Databricks are pushing it and recommending
f
Hi,
There are some functions like map, flatMap, reduce and ..., that construct
the base data processing operation in big data (and Apache Spark). But
Spark, in new versions, introduces the high-level Dataframe API and
recommend using it. This is while there are no such functions in Dataframe
API an