Re: Avoid broacasting huge variables

2015-01-18 Thread Sean Owen
Why do you say it does not work? The singleton pattern works the same as ever. It is not a pattern that involves Spark. On Jan 18, 2015 12:57 PM, "octavian.ganea" wrote: > The singleton hack works very different in spark 1.2.0 (it does not work if > the program has multiple map-reduce jobs in the

Re: Avoid broacasting huge variables

2015-01-18 Thread octavian.ganea
The singleton hack works very different in spark 1.2.0 (it does not work if the program has multiple map-reduce jobs in the same program). I guess there should be an official documentation on how to have each machine/node do an init step locally before executing any other instructions (e.g. loading

Re: Avoid broacasting huge variables

2014-09-21 Thread octavian.ganea
Using mapPartitions and passing the big index object as a parameter to it was not the best option, given the size of the big object and my RAM. The workers died before starting the actual computation. Anyway, creating a singleton object worked for me: http://apache-spark-user-list.1001560.n3.na

Re: Avoid broacasting huge variables

2014-09-20 Thread Sean Owen
Joining in a side conversation - yes this is the way to go. The data is immutable so can be shared across all executors in one JVM in a singleton. How to load it depends on where it is but there is nothing special to Spark here. For instance if the file is on HDFS then you use HDFS APIs in some cl

Re: Avoid broacasting huge variables

2014-09-20 Thread octavian.ganea
Hi Martin, Thanks. That might be really useful. Can you give me a reference or an example so that I understand how to do it ? In my case, the nodes have access to the same shared folder, so I wouldn't have to copy the file multiple times. -- View this message in context: http://apache-spark-

Re: Avoid broacasting huge variables

2014-09-20 Thread Martin Goodson
We normally copy a file to the nodes and then explicitly load it in a function passed to mapPartitions. On 9/20/14, octavian.ganea wrote: > Anyone ? > > Is there any option to load data in each node before starting any > computation like it is the initialization of mappers in Hadoop ? > > > > --

Re: Avoid broacasting huge variables

2014-09-20 Thread octavian.ganea
Anyone ? Is there any option to load data in each node before starting any computation like it is the initialization of mappers in Hadoop ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Avoid-broacasting-huge-variables-tp14696p14726.html Sent from the Apa