I think it's simply because as[T] is lazy. You will see the right schema if
you do `df.as[T].map(identity)`.



On Tue, Jan 7, 2020 at 4:42 PM Enrico Minack <m...@enrico.minack.dev> wrote:

> Hi Devs,
>
> I'd like to propose a stricter version of as[T]. Given the interface def
> as[T](): Dataset[T], it is counter-intuitive that the schema of the
> returned Dataset[T] is not agnostic to the schema of the originating
> Dataset. The schema should always be derived only from T.
>
> I am proposing a stricter version so that user code does not need to pair
> an .as[T] with a select(schemaOfT.fields.map(col(_.name)): _*) whenever
> your code expects Dataset[T] to really contain only columns of T.
>
> https://github.com/apache/spark/pull/26969
>
> Regards,
> Enrico
>

Reply via email to