Re: Best practices for sharing/maintaining large resource files for Spark jobs

Sabarish Sasidharan Mon, 11 Jan 2016 23:21:43 -0800

One option could be to store them as blobs in a cache like Redis and then
read + broadcast them from the driver. Or you could store them in HDFS and
read + broadcast from the driver.


Regards
Sab

On Tue, Jan 12, 2016 at 1:44 AM, Dmitry Goldenberg <dgoldenberg...@gmail.com
> wrote:

> We have a bunch of Spark jobs deployed and a few large resource files such
> as e.g. a dictionary for lookups or a statistical model.
>
> Right now, these are deployed as part of the Spark jobs which will
> eventually make the mongo-jars too bloated for deployments.
>
> What are some of the best practices to consider for maintaining and
> sharing large resource files like these?
>
> Thanks.
>



-- 

Architect - Big Data
Ph: +91 99805 99458

Manthan Systems | *Company of the year - Analytics (2014 Frost and Sullivan
India ICT)*
+++

Re: Best practices for sharing/maintaining large resource files for Spark jobs

Reply via email to