With DataFrames you loose type-safety. Depending on the language you are using this can also be considered a drawback.
On 15 December 2015 at 15:08, Jakob Odersky <joder...@gmail.com> wrote: > By using DataFrames you will not need to specify RDD operations explicity, > instead the operations are built and optimized for by using the information > available in the DataFrame's schema. > The only draw-back I can think of is some loss of generality: given a > dataframe containing types A, you will be able to include types B even if B > is a sub-type of A. However, in real use-cases I have never run into this > problem. > > I once had a related question on RDDs and DataFrames, here is the answer I > got from Michael Armbrust: > > Here is how I view the relationship between the various components of >> Spark: >> >> - *RDDs - *a low level API for expressing DAGs that will be executed in >> parallel by Spark workers >> - *Catalyst -* an internal library for expressing trees that we use to >> build relational algebra and expression evaluation. There's also an >> optimizer and query planner than turns these into logical concepts into RDD >> actions. >> - *Tungsten -* an internal optimized execution engine that can compile >> catalyst expressions into efficient java bytecode that operates directly on >> serialized binary data. It also has nice low level data structures / >> algorithms like hash tables and sorting that operate directly on serialized >> data. These are used by the physical nodes that are produced by the >> query planner (and run inside of RDD operation on workers). >> - *DataFrames - *a user facing API that is similar to SQL/LINQ for >> constructing dataflows that are backed by catalyst logical plans >> - *Datasets* - a user facing API that is similar to the RDD API for >> constructing dataflows that are backed by catalyst logical plans >> >> So everything is still operating on RDDs but I anticipate most users will >> eventually migrate to the higher level APIs for convenience and automatic >> optimization >> > > Hope that also helps you get an idea of the different concepts and their > potential advantages/drawbacks. >