I've seen a number of visuals showing the processing time benefits of using Datasets+DataFrames over RDDs, but I'd assume that there are performance benefits to using a defined case class instead a generic Dataset[Row]. The tale of three Spark APIs post mentions "If you want higher degree of type-safety at compile time, want typed JVM objects, *take advantage of Catalyst optimization, and benefit from Tungsten’s efficient code generation, use Dataset.*"
Are there any comparisons showing the performance differences between Datasets and DataFrames? Or more information about how Catalyst/Tungsten handle them differently? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-there-a-processing-speed-difference-between-DataFrames-and-Datasets-tp28117.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org