Hi Matei, thank you for answering.
Accordingly to what you said, am I mistaken when I say that tuples with the same key might eventually be spread across more than one node in case an overloaded worker can no longer accept tuples? In other words, suppose a worker (processing key K) cannot accept more tuples: how does Spark Streaming handle the other K-keyed tuples? Systems like Storm do not provide any mechanism to handle such a situation. I am pretty new to Spark, and I apologize if the question sounds too naive, but I am experiencing some troubles in understanding Spark Internals! Thank you, again! 2015-06-03 19:34 GMT+02:00 Matei Zaharia <matei.zaha...@gmail.com>: > This happens automatically when you use the byKey operations, e.g. > reduceByKey, updateStateByKey, etc. Spark Streaming keeps the state for a > given set of keys on a specific node and sends new tuples with that key to > that. > > Matei > > > On Jun 3, 2015, at 6:31 AM, allonsy <luke1...@gmail.com> wrote: > > > > Hi everybody, > > is there in Spark anything sharing the philosophy of Storm's field > grouping? > > > > I'd like to manage data partitioning across the workers by sending tuples > > sharing the same key to the very same worker in the cluster, but I did > not > > find any method to do that. > > > > Suggestions? > > > > :) > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Equivalent-to-Storm-s-field-grouping-in-Spark-tp23135.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > >