Hi, Many people interpret this slide from Databricks https://ogirardot.files.wordpress.com/2015/05/future-of-spark.png as indication that Dataframes API is going to be the main processing unit of Spark and sole access point to MLlib, Streaming and such. Is it true? My impression was that Dataframes are an additional abstraction layer with some promising optimisation coming from Tungsten project, but that's all. RDDs are there to stay. They are a natural selection when it comes to e.g. processing images.
Here is one article that advertises Dataframes as a "sole data structure for parallel computations": https://ogirardot.wordpress.com/2015/05/29/rdds-are-the-new-bytecode-of-apache-spark/ (paragraph 4) Cheers, Lucas =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===============================================================================