>
> 1) Is there any difference in terms performance when we use datasets over
> dataframes? Is it significant to choose 1 over other. I do realise there
> would be some overhead due case classes but how significant is that? Are
> there any other implications.
As long as you use the DataFrame func
t; step4.collect()
>
>
>
> step4._jdf.queryExecution().debug().codegen()
>
>
>
> You will see the generated code.
>
>
>
> Regards,
>
> Dhaval
>
>
>
> *From:* [External] Akhilanand
> *Sent:* Tuesday, February 19, 2019 10:29 AM
> *To:* Koert
t;sum(id)")
step4.collect()
step4._jdf.queryExecution().debug().codegen()
You will see the generated code.
Regards,
Dhaval
From: [External] Akhilanand
Sent: Tuesday, February 19, 2019 10:29 AM
To: Koert Kuipers
Cc: user
Subject: Re: Difference between dataset and dataframe
Thanks for
Kuipers
Cc: user
Subject: Re: Difference between dataset and dataframe
Thanks for the reply. But can you please tell why dataframes are performant
than datasets? Any specifics would be helpful.
Also, could you comment on the tungsten code gen part of my question?
On Feb 18, 2019, at 10:4
Thanks for the reply. But can you please tell why dataframes are performant
than datasets? Any specifics would be helpful.
Also, could you comment on the tungsten code gen part of my question?
> On Feb 18, 2019, at 10:47 PM, Koert Kuipers wrote:
>
> in the api DataFrame is just Dataset[Row].
in the api DataFrame is just Dataset[Row]. so this makes you think Dataset
is the generic api. interestingly enough under the hood everything is
really Dataset[Row], so DataFrame is really the "native" language for spark
sql, not Dataset.
i find DataFrame to be significantly more performant. in ge