This is because RDD.union doesn't check the schema, so you won't see the problem unless you run RDD and hit the incompatible column problem. For RDD, You may not see any error if you don't use the incompatible column.
Dataset.union requires compatible schema. You can print ds.schema and ds1.schema and check if they are same. On Mon, May 8, 2017 at 11:07 AM, Dirceu Semighini Filho < [email protected]> wrote: > Hello, > I've a very complex case class structure, with a lot of fields. > When I try to union two datasets of this class, it doesn't work with the > following error : > ds.union(ds1) > Exception in thread "main" org.apache.spark.sql.AnalysisException: Union > can only be performed on tables with the compatible column types > > But when use it's rdd, the union goes right: > ds.rdd.union(ds1.rdd) > res8: org.apache.spark.rdd.RDD[ > > Is there any reason for this to happen (besides a bug ;) ) > > >
