[Spark SQL] [pyspark.sql]: Potential bug in toDF using nested structures

msachdev Wed, 26 Jul 2017 04:34:30 -0700

Hi,

I am trying to create a DF from a Python dictionary and encountered an
issue where some of the nested fields are being returned as None (on
collect). I have created a sample here with the output:
https://gist.github.com/sachdevm/04c27ec91adbe2fdbe5969f4af723642.


The sample contains two snippets -- one which exhibits the stated issue and
another which works correctly. My suspicion is that when parsing nested
dictionary objects in the Row class, the datatype for all values is being
incorrectly set to that of the first key encountered (in the above example
"duration") and when the conversion fails, it is being set as None. In the
second example in the gist, all values in the nested dictionary are strings
and all data is preserved correctly.

I am using version 2.1.0:

> >>> print pyspark.__version__
> 2.1.0



Please let me know if I am missing something or there some issue in the
code sample itself.

Thanks,
Manish




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-pyspark-sql-Potential-bug-in-toDF-using-nested-structures-tp29000.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

[Spark SQL] [pyspark.sql]: Potential bug in toDF using nested structures

Reply via email to