I think it's simply because as[T] is lazy. You will see the right schema if you do `df.as[T].map(identity)`.
On Tue, Jan 7, 2020 at 4:42 PM Enrico Minack <m...@enrico.minack.dev> wrote: > Hi Devs, > > I'd like to propose a stricter version of as[T]. Given the interface def > as[T](): Dataset[T], it is counter-intuitive that the schema of the > returned Dataset[T] is not agnostic to the schema of the originating > Dataset. The schema should always be derived only from T. > > I am proposing a stricter version so that user code does not need to pair > an .as[T] with a select(schemaOfT.fields.map(col(_.name)): _*) whenever > your code expects Dataset[T] to really contain only columns of T. > > https://github.com/apache/spark/pull/26969 > > Regards, > Enrico >