Hi,

Many people interpret this slide from Databricks
https://ogirardot.files.wordpress.com/2015/05/future-of-spark.png
as indication that Dataframes API is going to be the main processing unit of 
Spark and sole access point to MLlib, Streaming and such. Is it true? My 
impression was that Dataframes are an additional abstraction layer with some 
promising optimisation coming from Tungsten project, but that's all. RDDs are 
there to stay. They are a natural selection when it comes to e.g. processing 
images.

Here is one article that advertises Dataframes as a "sole data structure for 
parallel computations":
https://ogirardot.wordpress.com/2015/05/29/rdds-are-the-new-bytecode-of-apache-spark/
 (paragraph 4)

Cheers,
Lucas




=============================================================================== 
Please access the attached hyperlink for an important electronic communications 
disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 

Reply via email to