There are now a couple different pull-requests each attempting to address the need for an enhancement providing Typed Dataset support for Avro Objects. These PRs and their respective JIRA tickets are
- https://github.com/apache/spark/pull/22878 : https://issues.apache.org/jira/browse/SPARK-25789 (originally in Databricks/spark-avro, https://github.com/databricks/spark-avro/pull/217 : https://github.com/databricks/spark-avro/issues/169) - https://github.com/apache/spark/pull/24299 : https://issues.apache.org/jira/browse/SPARK-27388 - https://github.com/apache/spark/pull/24367 : https://issues.apache.org/jira/browse/SPARK-27457 Approaches between these differ considerably, and respective coverages may not be equal. Some analysis of tradeoffs and perhaps a deeper analysis of workarounds would be necessary. Full disclosure, I contributed significantly to Spark#22878/Spark-Avro#217, so I don't think I'll say more about the topics in this thread, but I would be looking to Spark committers for some more direction either here or in the PR threads. I'd be happy to be respond to questions from the community. The topic of and request for Typed Datasets of Avro goes back to Spark-Avro#169 <https://github.com/databricks/spark-avro/issues/169>. I saw relatively recently that project was folded into Spark-proper, but the need for Statically type, Dataset support (as opposed to dynamically typed Dataframe support) continues. Hoping a resolution can come out of this visibility. Aleksander Eskilson https://github.com/bdrillard