subject:"Processing Flexibility Between RDD and Dataframe API"

Re: Processing Flexibility Between RDD and Dataframe API

2018-10-29 Thread Gourav Sengupta

Hi, I would recommend reading the book by Matei Zaharia. One of the main differentiating factors between Spark 1.x and subsequent releases has been optimization and hence dataframes, and in no way RDD is going away because dataframes are built on RDD's. The use of RDD's are allowed and is recommen

Re: Processing Flexibility Between RDD and Dataframe API

2018-10-29 Thread Jungtaek Lim

Just 2 cents on just one of contributors: while SQL semantic can express various use cases data scientists encounter, I also agree someone who are end users who are more familiar with code instead of SQL can feel it is not flexible. But counterless efforts have been incorporated into Spark SQL (an

Re: Processing Flexibility Between RDD and Dataframe API

2018-10-28 Thread Adrienne Kole

Thanks for bringing this issue to the mailing list. As an addition, I would also ask the same questions about DStreams and Structured Streaming APIs. Structured Streaming is high level and it makes difficult to express all business logic in it, although Databricks are pushing it and recommending f

Processing Flexibility Between RDD and Dataframe API

2018-10-28 Thread Soheil Pourbafrani

Hi, There are some functions like map, flatMap, reduce and ..., that construct the base data processing operation in big data (and Apache Spark). But Spark, in new versions, introduces the high-level Dataframe API and recommend using it. This is while there are no such functions in Dataframe API an

Re: Processing Flexibility Between RDD and Dataframe API

Re: Processing Flexibility Between RDD and Dataframe API

Re: Processing Flexibility Between RDD and Dataframe API

Processing Flexibility Between RDD and Dataframe API

4 matches

Site Navigation

Mail list logo

Footer information