how to merge two dataframes

Yana Kadiyska Fri, 30 Oct 2015 12:12:01 -0700

Hi folks,

I have a need to "append" two dataframes -- I was hoping to use UnionAll
but it seems that this operation treats the underlying dataframes as
sequence of columns, rather than a map.


In particular, my problem is that the columns in the two DFs are not in the
same order --notice that my customer_id somehow comes out a string:

This is Spark 1.4.1

case class Test(epoch: Long,browser:String,customer_id:Int,uri:String)
val test = Test(1234l,"firefox",999,"http://foobar";)

case class Test1( customer_id :Int,    uri:String,    browser:String,
 epoch :Long)
val test1 = Test1(888,"http://foobar","ie",12343)
val df=sc.parallelize(Seq(test)).toDF
val df1=sc.parallelize(Seq(test1)).toDF
df.unionAll(df1)

//res2: org.apache.spark.sql.DataFrame = [epoch: bigint, browser:
string, customer_id: string, uri: string]



Is unionAll the wrong operation? Any special incantations? Or advice on how
to otherwise get this to succeeed?

how to merge two dataframes

Reply via email to