Having pyspark.sql.types.StructType implement iter()

Nicholas Chammas Fri, 08 May 2015 14:43:41 -0700

StructType looks an awful lot like a Python dictionary.

However, it doesn’t implement __iter__()
<https://docs.python.org/3/library/stdtypes.html#iterator-types>, so doing
a quick conversion like this doesn’t work:


>>> df = sqlContext.jsonRDD(sc.parallelize(['{"name": "El Magnifico"}']))>>> 
>>> df.schema
StructType(List(StructField(name,StringType,true)))>>> dict(df.schema)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'StructType' object is not iterable

This would be super helpful for doing any custom schema manipulations
without having to go through the whole .json() -> json.loads() ->
manipulate() -> json.dumps() -> .fromJson() charade.

Same goes for Row, which offers an asDict()
<https://spark.apache.org/docs/1.3.1/api/python/pyspark.sql.html#pyspark.sql.Row.asDict>
method but doesn’t support the more Pythonic dict(Row).

Does this make sense?

Nick

Having pyspark.sql.types.StructType implement __iter__()

Reply via email to

Having pyspark.sql.types.StructType implement iter()