stream.map(record => (keyFunction(record), record))
For future reference, this question should go to the user list, not dev
list.
On Sun, Sep 20, 2015 at 11:47 PM, guoxu1231 wrote:
> Hi Spark Experts,
>
> I'm trying to use join(otherStream, [numTasks]) on DStreams, and it
> requires called o
Hi Spark Experts,
I'm trying to use join(otherStream, [numTasks]) on DStreams, and it
requires called on two DStreams of (K, V) and (K, W) pairs,
Usually in common RDD, we could use keyBy(f) to build the (K, V) pair,
however I could not find it in DStream.
My question is:
What is the expected
On Sun, Sep 20, 2015 at 3:58 PM, gsvic wrote:
> Concerning answers 1 and 2:
>
> 1) How Spark determines a node as a "slow node" and how slow is that?
>
There are two cases here:
1. If a node is busy (e.g. all slots are already occupied), the scheduler
cannot schedule anything on it. See "Delay
Concerning answers 1 and 2:
1) How Spark determines a node as a "slow node" and how slow is that?
2) How an RDD chooses a location as a preferred location and with which
criteria?
Could you please also include the links of the source files for the two
questions above?
--
View this message
Maybe the following can be used for changing Scala version:
http://maven.apache.org/archetype/maven-archetype-plugin/
I played with it a little bit but didn't get far.
FYI
On Sun, Sep 20, 2015 at 6:18 AM, Stephen Boesch wrote:
>
> The dev/change-scala-version.sh [2.11] script modifies in-plac
The dev/change-scala-version.sh [2.11] script modifies in-place the
pom.xml files across all of the modules. This is a git-visible change. So
if we wish to make changes to spark source in our own fork's - while
developing with scala 2.11 - we would end up conflating those updates with
our own.