subject:"Re\: DataFrame versus Dataset creation and usage"

Re: DataFrame versus Dataset creation and usage

2016-06-28 Thread Martin Serrano

Xinh, Thanks for the clarification. I'm new to Spark and trying to navigate the different APIs. I was just following some examples and retrofitting them, but I see now I should stick with plain RDDs until my schema is known (at the end of the data pipeline). Thanks again! On 06/24/2016 04:5

Re: DataFrame versus Dataset creation and usage

2016-06-24 Thread Xinh Huynh

Hi Martin, Since your schema is dynamic, how would you use Datasets? Would you know ahead of time the row type T in a Dataset[T]? One option is to start with DataFrames in the beginning of your data pipeline, figure out the field types, and then switch completely over to RDDs or Dataset in the ne

Re: DataFrame versus Dataset creation and usage

2016-06-24 Thread Martin Serrano

Indeed. But I'm dealing with 1.6 for now unfortunately. On 06/24/2016 02:30 PM, Ted Yu wrote: In Spark 2.0, Dataset and DataFrame are unified. Would this simplify your use case ? On Fri, Jun 24, 2016 at 7:27 AM, Martin Serrano mailto:mar...@attivio.com>> wrote: Hi, I'm exposing a custom sourc

Re: DataFrame versus Dataset creation and usage

2016-06-24 Thread Ted Yu

In Spark 2.0, Dataset and DataFrame are unified. Would this simplify your use case ? On Fri, Jun 24, 2016 at 7:27 AM, Martin Serrano wrote: > Hi, > > I'm exposing a custom source to the Spark environment. I have a question > about the best way to approach this problem. > > I created a custom r