Could you elaborate? Are you referring to working around this issue?The fix for this has been merged.
> From: [email protected] > Date: Mon, 10 Aug 2015 11:48:13 +0000 > Subject: Re: [jira] [Created] (ZEPPELIN-185) z.show does not work on > DataFrame in pyspark > To: [email protected] > > Does anyone knows how to solve this one? my users are using python and > iterating through the DF each time is not useful > Eran > > On Sat, Jul 25, 2015 at 10:06 PM Felix Cheung (JIRA) <[email protected]> > wrote: > > > Felix Cheung created ZEPPELIN-185: > > ------------------------------------- > > > > Summary: z.show does not work on DataFrame in pyspark > > Key: ZEPPELIN-185 > > URL: https://issues.apache.org/jira/browse/ZEPPELIN-185 > > Project: Zeppelin > > Issue Type: Bug > > Components: Core, Interpreters > > Affects Versions: 0.6.0 > > Reporter: Felix Cheung > > Assignee: Felix Cheung > > > > > > I’ve tested this out and found these issues. Firstly, > > > > > > http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame > > # Code should be changed to this – it does not work in pyspark CLI > > otherwise > > rdd = sc.parallelize(["1","2","3"]) > > Data = Row('first') > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d))) > > > > Secondly, > > z.show() doesn’t seem to work properly in Python – I see the same error > > below: “AttributeError: 'DataFrame' object has no attribute > > '_get_object_id'" > > #Python/PySpark – doesn’t work > > rdd = sc.parallelize(["1","2","3"]) > > Data = Row('first') > > df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d))) > > print df > > print df.collect() > > z.show(df) > > AttributeError: 'DataFrame' object has no attribute > > ‘_get_object_id' > > > > #Scala – this works > > val a = sc.parallelize(List("1", "2", "3")) > > val df = a.toDF() > > z.show(df) > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v6.3.4#6332) > >
