Re: how to merge two dataframes

Ted Yu Fri, 30 Oct 2015 12:34:32 -0700

How about the following ?

scala> df.registerTempTable("df")
scala> df1.registerTempTable("df1")
scala> sql("select customer_id, uri, browser, epoch from df union select
customer_id, uri, browser, epoch from df1").show()
+-----------+-------------+-------+-----+
|customer_id|          uri|browser|epoch|
+-----------+-------------+-------+-----+
|        999|http://foobar|firefox| 1234|
|        888|http://foobar|     ie|12343|
+-----------+-------------+-------+-----+


Cheers

On Fri, Oct 30, 2015 at 12:11 PM, Yana Kadiyska <[email protected]>
wrote:

> Hi folks,
>
> I have a need to "append" two dataframes -- I was hoping to use UnionAll
> but it seems that this operation treats the underlying dataframes as
> sequence of columns, rather than a map.
>
> In particular, my problem is that the columns in the two DFs are not in
> the same order --notice that my customer_id somehow comes out a string:
>
> This is Spark 1.4.1
>
> case class Test(epoch: Long,browser:String,customer_id:Int,uri:String)
> val test = Test(1234l,"firefox",999,"http://foobar";)
>
> case class Test1( customer_id :Int,    uri:String,    browser:String,   epoch 
> :Long)
> val test1 = Test1(888,"http://foobar","ie",12343)
> val df=sc.parallelize(Seq(test)).toDF
> val df1=sc.parallelize(Seq(test1)).toDF
> df.unionAll(df1)
>
> //res2: org.apache.spark.sql.DataFrame = [epoch: bigint, browser: string, 
> customer_id: string, uri: string]
>
> 
>
> Is unionAll the wrong operation? Any special incantations? Or advice on
> how to otherwise get this to succeeed?
>

Re: how to merge two dataframes

Reply via email to