Re: Join operation on DStreams

2015-09-21 Thread guoxu1231
Thanks for the prompt reply. May I ask why the keyBy(f) is not supported in DStreams? any particular reason? or is it possible to add it in future release since that "stream.map(record => (keyFunction(record), record))" looks tedious. I checked the python source code, KeyBy looks like a "shortcu

Join operation on DStreams

2015-09-20 Thread guoxu1231
Hi Spark Experts, I'm trying to use join(otherStream, [numTasks]) on DStreams, and it requires called on two DStreams of (K, V) and (K, W) pairs, Usually in common RDD, we could use keyBy(f) to build the (K, V) pair, however I could not find it in DStream. My question is: What is the expected

Re: Help, pyspark.sql.List flatMap results become tuple

2014-12-30 Thread guoxu1231
Thanks Davies, it works in 1.2. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tuple-tp9961p9975.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. ---

Re: Help, pyspark.sql.List flatMap results become tuple

2014-12-29 Thread guoxu1231
named tuple degenerate to tuple. *A400.map(lambda i: map(None,i.INTEREST))* === [(u'x', 1), (u'y', 2)] [(u'x', 2), (u'y', 3)] -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Help-pyspark-sql-List-flatMap-results-become-tupl

Help, pyspark.sql.List flatMap results become tuple

2014-12-29 Thread guoxu1231
Hi pyspark guys, I have a json file, and its struct like below: {"NAME":"George", "AGE":35, "ADD_ID":1212, "POSTAL_AREA":1, "TIME_ZONE_ID":1, "INTEREST":[{"INTEREST_NO":1, "INFO":"x"}, {"INTEREST_NO":2, "INFO":"y"}]} {"NAME":"John", "AGE":45, "ADD_ID":1213, "POSTAL_AREA":1, "TIME_ZONE_ID":1, "IN