This is really strange.
>>> # Spark 1.3.1
>>> print type(results)
<class 'pyspark.sql.dataframe.DataFrame'>
>>> a = results.take(1)[0]
>>> print type(a)
<class 'pyspark.sql.types.Row'>
>>> print pyspark.sql.types.Row
<class 'pyspark.sql.types.Row'>
>>> print type(a) == pyspark.sql.types.Row
False
>>> print isinstance(a, pyspark.sql.types.Row)
False
If I set a as follows, then the type checks pass fine.
a = pyspark.sql.types.Row('name')('Nick')
Is this a bug? What can I do to narrow down the source?
results is a massive DataFrame of spark-perf results.
Nick