Thanks everyone. Evo, could you provide a link to the Lookup RDD project? I can't seem to locate it exactly on Github. (Yes, to your point, our project is Spark streaming based). Thank you.
On Fri, Jun 5, 2015 at 6:04 AM, Evo Eftimov <evo.efti...@isecc.com> wrote: > Oops, @Yiannis, sorry to be a party pooper but the Job Server is for Spark > Batch Jobs (besides anyone can put something like that in 5 min), while I > am under the impression that Dmytiy is working on Spark Streaming app > > > > Besides the Job Server is essentially for sharing the Spark Context > between multiple threads > > > > Re Dmytiis intial question – you can load large data sets as Batch > (Static) RDD from any Spark Streaming App and then join DStream RDDs > against them to emulate “lookups” , you can also try the “Lookup RDD” – > there is a git hub project > > > > *From:* Dmitry Goldenberg [mailto:dgoldenberg...@gmail.com] > *Sent:* Friday, June 5, 2015 12:12 AM > *To:* Yiannis Gkoufas > *Cc:* Olivier Girardot; user@spark.apache.org > *Subject:* Re: How to share large resources like dictionaries while > processing data with Spark ? > > > > Thanks so much, Yiannis, Olivier, Huang! > > > > On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas <johngou...@gmail.com> > wrote: > > Hi there, > > > > I would recommend checking out > https://github.com/spark-jobserver/spark-jobserver which I think gives > the functionality you are looking for. > > I haven't tested it though. > > > > BR > > > > On 5 June 2015 at 01:35, Olivier Girardot <ssab...@gmail.com> wrote: > > You can use it as a broadcast variable, but if it's "too" large (more than > 1Gb I guess), you may need to share it joining this using some kind of key > to the other RDDs. > > But this is the kind of thing broadcast variables were designed for. > > > > Regards, > > > > Olivier. > > > > Le jeu. 4 juin 2015 à 23:50, dgoldenberg <dgoldenberg...@gmail.com> a > écrit : > > We have some pipelines defined where sometimes we need to load potentially > large resources such as dictionaries. > > What would be the best strategy for sharing such resources among the > transformations/actions within a consumer? Can they be shared somehow > across the RDD's? > > I'm looking for a way to load such a resource once into the cluster memory > and have it be available throughout the lifecycle of a consumer... > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > > >