I am working on building a custom ML pipeline-model / estimator to impute missing values, e.g. I want to fill with last good known value. Using a window function is slow / will put the data into a single partition. I built some sample code to use the RDD API however, it some None / null problems with empty partitions.
How should this be implemented properly to handle such empty partitions? http://stackoverflow.com/questions/41474175/spark-mappartitionswithindex-handling-empty-partitions Kind regards, Georg -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/handling-of-empty-partitions-tp20496.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org