I dont understand why one thinks RDD of case object doesn't have types(schema) ? If spark can convert RDD to DataFrame which means it understood the schema. SO then from that point why one has to use SQL features to do further processing? If all spark need for optimizations is schema then what this additional SQL features buys ? If there is a way to avoid SQL feature using DataFrame I don't mind it. But looks like I have to convert all my existing transformation to things like df1.join(df2,df1('abc') == df2('abc'), 'left_outer') .. that's plain ugly and error prone in my opinion.
On Tue, Feb 2, 2016 at 5:49 PM, Jerry Lam <chiling...@gmail.com> wrote: > Hi Michael, > > Is there a section in the spark documentation demonstrate how to serialize > arbitrary objects in Dataframe? The last time I did was using some User > Defined Type (copy from VectorUDT). > > Best Regards, > > Jerry > > On Tue, Feb 2, 2016 at 8:46 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> A principal difference between RDDs and DataFrames/Datasets is that the >>> latter have a schema associated to them. This means that they support only >>> certain types (primitives, case classes and more) and that they are >>> uniform, whereas RDDs can contain any serializable object and must not >>> necessarily be uniform. These properties make it possible to generate very >>> efficient serialization and other optimizations that cannot be achieved >>> with plain RDDs. >>> >> >> You can use Encoder.kryo() as well to serialize arbitrary objects, just >> like with RDDs. >> > > -- [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] <https://twitter.com/Xactly> [image: Facebook] <https://www.facebook.com/XactlyCorp> [image: YouTube] <http://www.youtube.com/xactlycorporation>