Hi Weichen,
Thank you very much for the explanation.
On Fri, Oct 13, 2017 at 6:56 PM, Weichen Xu
wrote:
> Hi Supun,
>
> Dataframe API is NOT using the old RDD implementation under the covers,
> dataframe has its own implementation. (Dataframe use binary row format and
> columnar storage when ca
Hi Supun,
Dataframe API is NOT using the old RDD implementation under the covers,
dataframe has its own implementation. (Dataframe use binary row format and
columnar storage when cached). So dataframe has no relationship with the
`RDD[Row]` you want get.
When calling `df.rdd`, and then cache, it
@Vadim Would it be true to say the `.rdd` *may* be creating a new job -
depending on whether the DataFrame/DataSet had already been materialized
via an action or checkpoint? If the only prior operations on the
DataFrame had been transformations then the dataframe would still not have
been calcu
When you do `Dataset.rdd` you actually create a new job
here you can see what it does internally:
https://github.com/apache/spark/blob/master/sql/core/
src/main/scala/org/apache/spark/sql/Dataset.scala#L2816-L2828
On Fri, Oct 13, 2017 at 5:24 PM, Supun Nakandala
wrote:
> Hi Weichen,
>
> Thank
Hi Weichen,
Thank you for the reply.
My understanding was Dataframe API is using the old RDD implementation
under the covers though it presents a different API. And calling
df.rdd will simply give access to the underlying RDD. Is this assumption
wrong? I would appreciate if you can shed more insi
You should use `df.cache()`
`df.rdd.cache()` won't work, because `df.rdd` generate a new RDD from the
original `df`. and then cache the new RDD.
On Fri, Oct 13, 2017 at 3:35 PM, Supun Nakandala
wrote:
> Hi all,
>
> I have been experimenting with cache/persist/unpersist methods with
> respect to
Hi all,
I have been experimenting with cache/persist/unpersist methods with respect
to both Dataframes and RDD APIs. However, I am experiencing different
behaviors Ddataframe API compared RDD API such Dataframes are not getting
cached when count() is called.
Is there a difference between how thes