Hi Matei,

thank you for answering.

Accordingly to what you said, am I mistaken when I say that tuples with the
same key might eventually be spread across more than one node in case an
overloaded worker can no longer accept tuples?
In other words, suppose a worker (processing key K) cannot accept more
tuples: how does Spark Streaming handle the other K-keyed tuples? Systems
like Storm do not provide any mechanism to handle such a situation.

I am pretty new to Spark, and I apologize if the question sounds too naive,
but I am experiencing some troubles in understanding Spark Internals!

Thank you, again!



2015-06-03 19:34 GMT+02:00 Matei Zaharia <matei.zaha...@gmail.com>:

> This happens automatically when you use the byKey operations, e.g.
> reduceByKey, updateStateByKey, etc. Spark Streaming keeps the state for a
> given set of keys on a specific node and sends new tuples with that key to
> that.
>
> Matei
>
> > On Jun 3, 2015, at 6:31 AM, allonsy <luke1...@gmail.com> wrote:
> >
> > Hi everybody,
> > is there in Spark anything sharing the philosophy of Storm's field
> grouping?
> >
> > I'd like to manage data partitioning across the workers by sending tuples
> > sharing the same key to the very same worker in the cluster, but I did
> not
> > find any method to do that.
> >
> > Suggestions?
> >
> > :)
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Equivalent-to-Storm-s-field-grouping-in-Spark-tp23135.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
>

Reply via email to