Yes yes true. I just wonder if somebody took measurements for all different 
types of problems in the Big Data area and created some scientific analysis how 
much time is wasted on serialization deserialization to support the figure of 
80% ;)



> On 24 Jun 2016, at 10:35, Jacek Laskowski <ja...@japila.pl> wrote:
> 
> Hi Jorn,
> 
> You can measure the time for ser/deser yourself using web UI or 
> SparkListeners.
> 
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
> 
> 
>> On Fri, Jun 24, 2016 at 10:14 AM, Jörn Franke <jornfra...@gmail.com> wrote:
>> I would push the Spark people to provide equivalent functionality . In the 
>> end it is a deserialization/serialization process which should not be done 
>> back and forth because it is one of the more costly aspects during 
>> processing. It needs to convert Java objects to a binary representation. It 
>> is ok to do it once, because afterwards the access in binary form is much 
>> more efficient, but this will be completely irrelevant if you convert back 
>> and forth all the time.
>> 
>> I have heard somewhere the figure that serialization/deserialization takes 
>> 80% of the time in the big data world, but i would be happy to see this 
>> figure be confirmed empirically for different scenarios. Unfortunately I do 
>> not have a source for this figure so do not take it as granted.
>> 
>>> On 24 Jun 2016, at 08:00, pan <pranav.na...@gmail.com> wrote:
>>> 
>>> Hello,
>>>  I am trying to understand the cost of converting an RDD to Dataframe and
>>> back. Would a conversion back and forth very frequently cost performance.
>>> 
>>> I do observe that some operations like join are implemented very differently
>>> for RDD (pair) and Dataframe so trying to figure out the cose of converting
>>> one to another
>>> 
>>> Regards,
>>> Pranav
>>> 
>>> 
>>> 
>>> --
>>> View this message in context: 
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Cost-of-converting-RDD-s-to-dataframe-and-back-tp27222.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to