Felix Cheung created ZEPPELIN-185:
-------------------------------------

             Summary: z.show does not work on DataFrame in pyspark
                 Key: ZEPPELIN-185
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-185
             Project: Zeppelin
          Issue Type: Bug
          Components: Core, Interpreters
    Affects Versions: 0.6.0
            Reporter: Felix Cheung
            Assignee: Felix Cheung


I’ve tested this out and found these issues. Firstly,

http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
# Code should be changed to this – it does not work in pyspark CLI otherwise
rdd = sc.parallelize(["1","2","3"])
Data = Row('first')
df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))

Secondly,
z.show() doesn’t seem to work properly in Python – I see the same error below: 
“AttributeError: 'DataFrame' object has no attribute '_get_object_id'"
#Python/PySpark – doesn’t work
rdd = sc.parallelize(["1","2","3"])
Data = Row('first')
df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
print df
print df.collect()
z.show(df)
        AttributeError: 'DataFrame' object has no attribute ‘_get_object_id'

#Scala – this works
val a = sc.parallelize(List("1", "2", "3"))
val df = a.toDF()
z.show(df)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to