One option could be to store them as blobs in a cache like Redis and then read + broadcast them from the driver. Or you could store them in HDFS and read + broadcast from the driver.
Regards Sab On Tue, Jan 12, 2016 at 1:44 AM, Dmitry Goldenberg <dgoldenberg...@gmail.com > wrote: > We have a bunch of Spark jobs deployed and a few large resource files such > as e.g. a dictionary for lookups or a statistical model. > > Right now, these are deployed as part of the Spark jobs which will > eventually make the mongo-jars too bloated for deployments. > > What are some of the best practices to consider for maintaining and > sharing large resource files like these? > > Thanks. > -- Architect - Big Data Ph: +91 99805 99458 Manthan Systems | *Company of the year - Analytics (2014 Frost and Sullivan India ICT)* +++