Felix Cheung created ZEPPELIN-185:
-------------------------------------
Summary: z.show does not work on DataFrame in pyspark
Key: ZEPPELIN-185
URL: https://issues.apache.org/jira/browse/ZEPPELIN-185
Project: Zeppelin
Issue Type: Bug
Components: Core, Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung
Assignee: Felix Cheung
I’ve tested this out and found these issues. Firstly,
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
# Code should be changed to this – it does not work in pyspark CLI otherwise
rdd = sc.parallelize(["1","2","3"])
Data = Row('first')
df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
Secondly,
z.show() doesn’t seem to work properly in Python – I see the same error below:
“AttributeError: 'DataFrame' object has no attribute '_get_object_id'"
#Python/PySpark – doesn’t work
rdd = sc.parallelize(["1","2","3"])
Data = Row('first')
df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
print df
print df.collect()
z.show(df)
AttributeError: 'DataFrame' object has no attribute ‘_get_object_id'
#Scala – this works
val a = sc.parallelize(List("1", "2", "3"))
val df = a.toDF()
z.show(df)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)