Thanks so much, Yiannis, Olivier, Huang! On Thu, Jun 4, 2015 at 6:44 PM, Yiannis Gkoufas <johngou...@gmail.com> wrote:
> Hi there, > > I would recommend checking out > https://github.com/spark-jobserver/spark-jobserver which I think gives > the functionality you are looking for. > I haven't tested it though. > > BR > > On 5 June 2015 at 01:35, Olivier Girardot <ssab...@gmail.com> wrote: > >> You can use it as a broadcast variable, but if it's "too" large (more >> than 1Gb I guess), you may need to share it joining this using some kind of >> key to the other RDDs. >> But this is the kind of thing broadcast variables were designed for. >> >> Regards, >> >> Olivier. >> >> Le jeu. 4 juin 2015 à 23:50, dgoldenberg <dgoldenberg...@gmail.com> a >> écrit : >> >>> We have some pipelines defined where sometimes we need to load >>> potentially >>> large resources such as dictionaries. >>> >>> What would be the best strategy for sharing such resources among the >>> transformations/actions within a consumer? Can they be shared somehow >>> across the RDD's? >>> >>> I'm looking for a way to load such a resource once into the cluster >>> memory >>> and have it be available throughout the lifecycle of a consumer... >>> >>> Thanks. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-share-large-resources-like-dictionaries-while-processing-data-with-Spark-tp23162.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >