Hello,
   The question came from the point that dataframe uses tungsten
improvements with usage of catalyst optimizer. So there would be some
additional work spark does to convert an RDD to dataframe to use the
optimizations/improvements available to dataframes.

Regards,
Pranav

On Fri, Jun 24, 2016 at 2:05 PM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi Jorn,
>
> You can measure the time for ser/deser yourself using web UI or
> SparkListeners.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Jun 24, 2016 at 10:14 AM, Jörn Franke <jornfra...@gmail.com>
> wrote:
> > I would push the Spark people to provide equivalent functionality . In
> the end it is a deserialization/serialization process which should not be
> done back and forth because it is one of the more costly aspects during
> processing. It needs to convert Java objects to a binary representation. It
> is ok to do it once, because afterwards the access in binary form is much
> more efficient, but this will be completely irrelevant if you convert back
> and forth all the time.
> >
> > I have heard somewhere the figure that serialization/deserialization
> takes 80% of the time in the big data world, but i would be happy to see
> this figure be confirmed empirically for different scenarios. Unfortunately
> I do not have a source for this figure so do not take it as granted.
> >
> >> On 24 Jun 2016, at 08:00, pan <pranav.na...@gmail.com> wrote:
> >>
> >> Hello,
> >>   I am trying to understand the cost of converting an RDD to Dataframe
> and
> >> back. Would a conversion back and forth very frequently cost
> performance.
> >>
> >> I do observe that some operations like join are implemented very
> differently
> >> for RDD (pair) and Dataframe so trying to figure out the cose of
> converting
> >> one to another
> >>
> >> Regards,
> >> Pranav
> >>
> >>
> >>
> >> --
> >> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Cost-of-converting-RDD-s-to-dataframe-and-back-tp27222.html
> >> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>

Reply via email to