I dont understand why one thinks RDD of case object doesn't have
types(schema) ? If spark can convert RDD to DataFrame which means it
understood the schema. SO then from that point why one has to use SQL
features to do further processing? If all spark need for optimizations is
schema then what this additional SQL features buys ? If there is a way to
avoid SQL feature using DataFrame I don't mind it. But looks like I have to
convert all my existing transformation to things like
df1.join(df2,df1('abc') == df2('abc'), 'left_outer') .. that's plain ugly
and error prone in my opinion.

On Tue, Feb 2, 2016 at 5:49 PM, Jerry Lam <chiling...@gmail.com> wrote:

> Hi Michael,
>
> Is there a section in the spark documentation demonstrate how to serialize
> arbitrary objects in Dataframe? The last time I did was using some User
> Defined Type (copy from VectorUDT).
>
> Best Regards,
>
> Jerry
>
> On Tue, Feb 2, 2016 at 8:46 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> A principal difference between RDDs and DataFrames/Datasets is that the
>>> latter have a schema associated to them. This means that they support only
>>> certain types (primitives, case classes and more) and that they are
>>> uniform, whereas RDDs can contain any serializable object and must not
>>> necessarily be uniform. These properties make it possible to generate very
>>> efficient serialization and other optimizations that cannot be achieved
>>> with plain RDDs.
>>>
>>
>> You can use Encoder.kryo() as well to serialize arbitrary objects, just
>> like with RDDs.
>>
>
>

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Reply via email to