Dataframes - sole data structure for parallel computations?

Tracewski, Lukasz Thu, 08 Oct 2015 07:43:01 -0700

Hi,

Many people interpret this slide from Databricks
https://ogirardot.files.wordpress.com/2015/05/future-of-spark.png
as indication that Dataframes API is going to be the main processing unit of 
Spark and sole access point to MLlib, Streaming and such. Is it true? My 
impression was that Dataframes are an additional abstraction layer with some 
promising optimisation coming from Tungsten project, but that's all. RDDs are 
there to stay. They are a natural selection when it comes to e.g. processing 
images.


Here is one article that advertises Dataframes as a "sole data structure for 
parallel computations":
https://ogirardot.wordpress.com/2015/05/29/rdds-are-the-new-bytecode-of-apache-spark/
 (paragraph 4)

Cheers,
Lucas




=============================================================================== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
===============================================================================

Dataframes - sole data structure for parallel computations?

Reply via email to