With DataFrames you loose type-safety. Depending on the language you are
using this can also be considered a drawback.

On 15 December 2015 at 15:08, Jakob Odersky <joder...@gmail.com> wrote:

> By using DataFrames you will not need to specify RDD operations explicity,
> instead the operations are built and optimized for by using the information
> available in the DataFrame's schema.
> The only draw-back I can think of is some loss of generality: given a
> dataframe containing types A, you will be able to include types B even if B
> is a sub-type of A. However, in real use-cases I have never run into this
> problem.
>
> I once had a related question on RDDs and DataFrames, here is the answer I
> got from Michael Armbrust:
>
> Here is how I view the relationship between the various components of
>> Spark:
>>
>>  - *RDDs - *a low level API for expressing DAGs that will be executed in
>> parallel by Spark workers
>>  - *Catalyst -* an internal library for expressing trees that we use to
>> build relational algebra and expression evaluation.  There's also an
>> optimizer and query planner than turns these into logical concepts into RDD
>> actions.
>>  - *Tungsten -* an internal optimized execution engine that can compile
>> catalyst expressions into efficient java bytecode that operates directly on
>> serialized binary data.  It also has nice low level data structures /
>> algorithms like hash tables and sorting that operate directly on serialized
>> data.  These are used by the physical nodes that are produced by the
>> query planner (and run inside of RDD operation on workers).
>>  - *DataFrames - *a user facing API that is similar to SQL/LINQ for
>> constructing dataflows that are backed by catalyst logical plans
>>  - *Datasets* - a user facing API that is similar to the RDD API for
>> constructing dataflows that are backed by catalyst logical plans
>>
>> So everything is still operating on RDDs but I anticipate most users will
>> eventually migrate to the higher level APIs for convenience and automatic
>> optimization
>>
>
> Hope that also helps you get an idea of the different concepts and their
> potential advantages/drawbacks.
>

Reply via email to