Hi Devs,

I'd like to propose a stricter version of as[T]. Given the interface def as[T](): Dataset[T], it is counter-intuitive that the schema of the returned Dataset[T] is not agnostic to the schema of the originating Dataset. The schema should always be derived only from T.

I am proposing a stricter version so that user code does not need to pair an .as[T] with a select(schemaOfT.fields.map(col(_.name)): _*) whenever your code expects Dataset[T] to really contain only columns of T.

https://github.com/apache/spark/pull/26969

Regards,
Enrico

Reply via email to