This should be fixed in 1.2, could you try it? On Mon, Dec 29, 2014 at 8:04 PM, guoxu1231 <guoxu1...@gmail.com> wrote: > Hi pyspark guys, > > I have a json file, and its struct like below: > > {"NAME":"George", "AGE":35, "ADD_ID":1212, "POSTAL_AREA":1, > "TIME_ZONE_ID":1, "INTEREST":[{"INTEREST_NO":1, "INFO":"x"}, > {"INTEREST_NO":2, "INFO":"y"}]} > {"NAME":"John", "AGE":45, "ADD_ID":1213, "POSTAL_AREA":1, "TIME_ZONE_ID":1, > "INTEREST":[{"INTEREST_NO":2, "INFO":"x"}, {"INTEREST_NO":3, "INFO":"y"}]} > > I'm using spark sql api to manipulate the json data in pyspark shell, > > *sqlContext = SQLContext(sc) > A400= sqlContext.jsonFile('jason_file_path')* > /Row(ADD_ID=1212, AGE=35, INTEREST=[Row(INFO=u'x', INTEREST_NO=1), > Row(INFO=u'y', INTEREST_NO=2)], NAME=u'George', POSTAL_AREA=1, > TIME_ZONE_ID=1) > Row(ADD_ID=1213, AGE=45, INTEREST=[Row(INFO=u'x', INTEREST_NO=2), > Row(INFO=u'y', INTEREST_NO=3)], NAME=u'John', POSTAL_AREA=1, > TIME_ZONE_ID=1)/ > *X = A400.flatMap(lambda i: i.INTEREST)* > The flatMap results like below, each element in json array were flatten to > tuple, not my expected pyspark.sql.Row. I can only access the flatten > results by index. but it supposed to be flatten to Row(namedTuple) and > support to access by name. > (u'x', 1) > (u'y', 2) > (u'x', 2) > (u'y', 3) > > My spark version is 1.1. > > > > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org >
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org