Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

Shixiong(Ryan) Zhu Mon, 08 May 2017 12:26:12 -0700

This is because RDD.union doesn't check the schema, so you won't see the
problem unless you run RDD and hit the incompatible column problem. For
RDD, You may not see any error if you don't use the incompatible column.


Dataset.union requires compatible schema. You can print ds.schema and
ds1.schema and check if they are same.

On Mon, May 8, 2017 at 11:07 AM, Dirceu Semighini Filho <
[email protected]> wrote:

> Hello,
> I've a very complex case class structure, with a lot of fields.
> When I try to union two datasets of this class, it doesn't work with the
> following error :
> ds.union(ds1)
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Union
> can only be performed on tables with the compatible column types
>
> But when use it's rdd, the union goes right:
> ds.rdd.union(ds1.rdd)
> res8: org.apache.spark.rdd.RDD[
>
> Is there any reason for this to happen (besides a bug ;) )
>
>
>

Re: Why does dataset.union fails but dataset.rdd.union execute correctly?

Reply via email to