Re: Join operation on DStreams

2015-09-20 Thread Reynold Xin
stream.map(record => (keyFunction(record), record)) For future reference, this question should go to the user list, not dev list. On Sun, Sep 20, 2015 at 11:47 PM, guoxu1231 wrote: > Hi Spark Experts, > > I'm trying to use join(otherStream, [numTasks]) on DStreams, and it > requires called o

Join operation on DStreams

2015-09-20 Thread guoxu1231
Hi Spark Experts, I'm trying to use join(otherStream, [numTasks]) on DStreams, and it requires called on two DStreams of (K, V) and (K, W) pairs, Usually in common RDD, we could use keyBy(f) to build the (K, V) pair, however I could not find it in DStream. My question is: What is the expected

Re: RDD: Execution and Scheduling

2015-09-20 Thread Reynold Xin
On Sun, Sep 20, 2015 at 3:58 PM, gsvic wrote: > Concerning answers 1 and 2: > > 1) How Spark determines a node as a "slow node" and how slow is that? > There are two cases here: 1. If a node is busy (e.g. all slots are already occupied), the scheduler cannot schedule anything on it. See "Delay

Re: RDD: Execution and Scheduling

2015-09-20 Thread gsvic
Concerning answers 1 and 2: 1) How Spark determines a node as a "slow node" and how slow is that? 2) How an RDD chooses a location as a preferred location and with which criteria? Could you please also include the links of the source files for the two questions above? -- View this message

Re: Using scala-2.11 when making changes to spark source

2015-09-20 Thread Ted Yu
Maybe the following can be used for changing Scala version: http://maven.apache.org/archetype/maven-archetype-plugin/ I played with it a little bit but didn't get far. FYI On Sun, Sep 20, 2015 at 6:18 AM, Stephen Boesch wrote: > > The dev/change-scala-version.sh [2.11] script modifies in-plac

Using scala-2.11 when making changes to spark source

2015-09-20 Thread Stephen Boesch
The dev/change-scala-version.sh [2.11] script modifies in-place the pom.xml files across all of the modules. This is a git-visible change. So if we wish to make changes to spark source in our own fork's - while developing with scala 2.11 - we would end up conflating those updates with our own.