Hi, I am trying to create a DF from a Python dictionary and encountered an issue where some of the nested fields are being returned as None (on collect). I have created a sample here with the output: https://gist.github.com/sachdevm/04c27ec91adbe2fdbe5969f4af723642.
The sample contains two snippets -- one which exhibits the stated issue and another which works correctly. My suspicion is that when parsing nested dictionary objects in the Row class, the datatype for all values is being incorrectly set to that of the first key encountered (in the above example "duration") and when the conversion fails, it is being set as None. In the second example in the gist, all values in the nested dictionary are strings and all data is preserved correctly. I am using version 2.1.0: > >>> print pyspark.__version__ > 2.1.0 Please let me know if I am missing something or there some issue in the code sample itself. Thanks, Manish -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-pyspark-sql-Potential-bug-in-toDF-using-nested-structures-tp29000.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
