Yes, but once again, I'm not using Julia workers, but instead completely 
independent Julia processes, running on different machines and ruled by 
Spark, not by Julia's ClusterManager. I.e. workflow looks like this:

1. Julia process 1 starts JVM and connects to Spark master node. 
2. Julia process 1 sends serialized function to Spark master node. 
3. Spark master node notifies Spark worker nodes (say, there are N of them) 
about upcoming computations. 
4. Each Spark worker node creates its own Julia process, independent from 
Julia process 1. 
5. Each Spark worker node receives serialized function and passes it to its 
local Julia process. 

So with N workers in Spark cluster, there's in total N+1 Julia processes, 
and when function in question is created, Julia processes from 2 to N+1 
don't even exist yet.


On Friday, August 14, 2015 at 12:35:18 PM UTC+3, Tim Holy wrote:
>
> If you define the function with @everywhere, it will be defined on all 
> existing 
> workers. Likewise, `using MyPackage` loads the package on all workers. 
>
> --Tim 
>
> On Thursday, August 13, 2015 03:10:54 PM Andrei Zh wrote: 
> > Ok, after going through serialization code, it's clear that default 
> > implementation doesn't support serializing function code, but only its 
> > name. For example, here's relevant section from 
> > `deserialize(::SerializationState, ::Function)`: 
> > mod = deserialize(s)::Module 
> > name = deserialize(s)::Symbol 
> > if !isdefined(mod,name) 
> >     return (args...)->error("function $name not defined on process 
> > $(myid())") 
> > end 
> > 
> > 
> > 
> > This doesn't fit my needs (essentially, semantics of Spark), and I guess 
> > there's no existing solution for full function serialization. Thus I'm 
> > going to write new solution for this. 
> > 
> > So far the best idea I have is to get function's AST and recursively 
> > serialize it, catching calls to the other non-Base function and any 
> bound 
> > variables. But this looks quite complicated. Is there better / easier 
> way 
> > to get portable function's representation? 
> > 
> > On Monday, August 10, 2015 at 11:48:55 PM UTC+3, Andrei Zh wrote: 
> > > Yes, I incorrectly assumed `serialize` / `deserialize` use JLD format. 
> But 
> > > anyway, even when I saved the function into "example.jls" or even 
> plain 
> > > byte array (using IOBuffer and `takebuf_array`), nothing changed. Am I 
> > > missing something obvious? 
> > > 
> > > On Monday, August 10, 2015 at 11:40:03 PM UTC+3, Tim Holy wrote: 
> > >> On Monday, August 10, 2015 01:13:15 PM Tony Kelman wrote: 
> > >> > Should 
> > >> > probably use some different extension for that, .jls or something, 
> to 
> > >> 
> > >> avoid 
> > >> 
> > >> > confusion. 
> > >> 
> > >> Yes. That has been sufficiently confusing in the past, we even cover 
> this 
> > >> here: 
> > >> 
> > >> 
> https://github.com/JuliaLang/JLD.jl#saving-and-loading-variables-in-julia 
> > >> -data-format-jld 
> > >> 
> > >> --Tim 
> > >> 
> > >> > On Monday, August 10, 2015 at 12:45:35 PM UTC-7, Stefan Karpinski 
> > >> 
> > >> wrote: 
> > >> > > JLD doesn't support serializing functions but Julia itself does. 
> > >> > > 
> > >> > > On Mon, Aug 10, 2015 at 3:43 PM, Andrei Zh <[email protected] 
> > >> > > 
> > >> > > <javascript:>> wrote: 
> > >> > >> I'm afraid it's not quite true, and I found simple way to show 
> it. 
> > >> 
> > >> In the 
> > >> 
> > >> > >> next code snippet I define function `f` and serialize it to a 
> file: 
> > >> > >> 
> > >> > >> julia> f(x) = x + 1 
> > >> > >> f (generic function with 1 method) 
> > >> > >> 
> > >> > >> julia> f(5) 
> > >> > >> 6 
> > >> > >> 
> > >> > >> julia> open("example.jld", "w") do io serialize(io, f) end 
> > >> > >> 
> > >> > >> 
> > >> > >> Then I close Julia REPL and in a new session try to load and use 
> > >> 
> > >> this 
> > >> 
> > >> > >> function: 
> > >> > >> 
> > >> > >> julia> f2 = open("example.jld") do io deserialize(io) end 
> > >> > >> (anonymous function) 
> > >> > >> 
> > >> > >> julia> f2(5) 
> > >> > >> ERROR: function f not defined on process 1 
> > >> > >> 
> > >> > >>  in error at error.jl:21 
> > >> > >>  in anonymous at serialize.jl:398 
> > >> > >> 
> > >> > >> So deserialized function still refers to the old definition, 
> which 
> > >> 
> > >> is not 
> > >> 
> > >> > >> available in this new session. 
> > >> > >> 
> > >> > >> Is there any better way to serialize a function and run it on an 
> > >> > >> unrelated Julia process? 
> > >> > >> 
> > >> > >> On Monday, August 10, 2015 at 2:33:11 PM UTC+3, Jeff Waller 
> wrote: 
> > >> > >>>> My question is: does Julia's serialization produce completely 
> > >> > >>>> self-containing code that can be run on workers? In other 
> words, 
> > >> 
> > >> is it 
> > >> 
> > >> > >>>> possible to send serialized function over network to another 
> host 
> > >> 
> > >> / 
> > >> 
> > >> > >>>> Julia 
> > >> > >>>> process and applied there without any additional information 
> from 
> > >> 
> > >> the 
> > >> 
> > >> > >>>> first 
> > >> > >>>> process? 
> > >> > >>>> 
> > >> > >>>> I made some tests on a single machine, and when I defined 
> function 
> > >> > >>>> without `@everywhere`, worker failed with a message "function 
> > >> 
> > >> myfunc 
> > >> 
> > >> > >>>> not 
> > >> > >>>> defined on process 1". With `@everywhere`, my code worked, but 
> > >> 
> > >> will it 
> > >> 
> > >> > >>>> work 
> > >> > >>>> on multiple hosts with essentially independent Julia 
> processes? 
> > >> > >>> 
> > >> > >>> According to Jey here 
> > >> > >>> < 
> > >> 
> > >> 
> https://groups.google.com/forum/#!searchin/julia-users/jey/julia-users/ 
> > >> 
> > >> > >>> bolLGcSCrs0/fGGVLgNhI2YJ>, Base.serialize does what we want; 
> it's 
> > >> > >>> contained in serialize.jl 
> > >> > >>> <
> https://github.com/JuliaLang/julia/blob/master/base/serialize.jl> 
>
>

Reply via email to