You can use "transform" which yields RDDs from the DStream as on each of the RDDs you can then apply partitionBy - transform also returns another DSTream while foreach doesn't
Btw what do you mean re "foreach killing the performance by not distributing the workload" - every function (provided it is not Action) applied to an RDD within foreach is distributed across the cluster since it gets applied to an RDD From: davidkl [via Apache Spark User List] [mailto:[email protected]] Sent: Thursday, April 23, 2015 10:13 AM To: Evo Eftimov Subject: Re: Custom paritioning of DSTream Hello Evo, Ranjitiyer, I am also looking for the same thing. Using foreach is not useful for me as processing the RDD as a whole won't be distributed across workers and that would kill performance in my application :-/ Let me know if you find a solution for this. Regards _____ If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DS Tream-tp22574p22630.html To unsubscribe from Custom paritioning of DSTream, click here <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jt p?macro=unsubscribe_by_code&node=22574&code=ZXZvLmVmdGltb3ZAaXNlY2MuY29tfDIy NTc0fDY0MDQ0NDg5Ng==> . <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jt p?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml. namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.vi ew.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemai l.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aem ail.naml> NAML -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DSTream-tp22574p22631.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
