Andrei Zh
I'm confused. Have you actually tried?
julia> io = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true,
append=false, size=0, maxsize=Inf, ptr=1, mark=-1)
julia> foo(x) = x + 1
foo (generic function with 1 method)
julia> serialize(io, foo)
julia> seekstart(io)
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true,
append=false, size=9, maxsize=Inf, ptr=1, mark=-1)
julia> baz = deserialize(io)
foo (generic function with 1 method)
julia> baz(1)
2
The serialization code won't recursively serialize all the of the functions
dependencies so you will have to send/serialize the code that defines the
environment (types, constants, Packages, etc).
On Friday, August 14, 2015 at 6:23:23 AM UTC-4, Andrei Zh wrote:
>
> Yes, but once again, I'm not using Julia workers, but instead completely
> independent Julia processes, running on different machines and ruled by
> Spark, not by Julia's ClusterManager. I.e. workflow looks like this:
>
> 1. Julia process 1 starts JVM and connects to Spark master node.
> 2. Julia process 1 sends serialized function to Spark master node.
> 3. Spark master node notifies Spark worker nodes (say, there are N of
> them) about upcoming computations.
> 4. Each Spark worker node creates its own Julia process, independent from
> Julia process 1.
> 5. Each Spark worker node receives serialized function and passes it to
> its local Julia process.
>
> So with N workers in Spark cluster, there's in total N+1 Julia processes,
> and when function in question is created, Julia processes from 2 to N+1
> don't even exist yet.
>
>
> On Friday, August 14, 2015 at 12:35:18 PM UTC+3, Tim Holy wrote:
>>
>> If you define the function with @everywhere, it will be defined on all
>> existing
>> workers. Likewise, `using MyPackage` loads the package on all workers.
>>
>> --Tim
>>
>> On Thursday, August 13, 2015 03:10:54 PM Andrei Zh wrote:
>> > Ok, after going through serialization code, it's clear that default
>> > implementation doesn't support serializing function code, but only its
>> > name. For example, here's relevant section from
>> > `deserialize(::SerializationState, ::Function)`:
>> > mod = deserialize(s)::Module
>> > name = deserialize(s)::Symbol
>> > if !isdefined(mod,name)
>> > return (args...)->error("function $name not defined on process
>> > $(myid())")
>> > end
>> >
>> >
>> >
>> > This doesn't fit my needs (essentially, semantics of Spark), and I
>> guess
>> > there's no existing solution for full function serialization. Thus I'm
>> > going to write new solution for this.
>> >
>> > So far the best idea I have is to get function's AST and recursively
>> > serialize it, catching calls to the other non-Base function and any
>> bound
>> > variables. But this looks quite complicated. Is there better / easier
>> way
>> > to get portable function's representation?
>> >
>> > On Monday, August 10, 2015 at 11:48:55 PM UTC+3, Andrei Zh wrote:
>> > > Yes, I incorrectly assumed `serialize` / `deserialize` use JLD
>> format. But
>> > > anyway, even when I saved the function into "example.jls" or even
>> plain
>> > > byte array (using IOBuffer and `takebuf_array`), nothing changed. Am
>> I
>> > > missing something obvious?
>> > >
>> > > On Monday, August 10, 2015 at 11:40:03 PM UTC+3, Tim Holy wrote:
>> > >> On Monday, August 10, 2015 01:13:15 PM Tony Kelman wrote:
>> > >> > Should
>> > >> > probably use some different extension for that, .jls or something,
>> to
>> > >>
>> > >> avoid
>> > >>
>> > >> > confusion.
>> > >>
>> > >> Yes. That has been sufficiently confusing in the past, we even cover
>> this
>> > >> here:
>> > >>
>> > >>
>> https://github.com/JuliaLang/JLD.jl#saving-and-loading-variables-in-julia
>> > >> -data-format-jld
>> > >>
>> > >> --Tim
>> > >>
>> > >> > On Monday, August 10, 2015 at 12:45:35 PM UTC-7, Stefan Karpinski
>> > >>
>> > >> wrote:
>> > >> > > JLD doesn't support serializing functions but Julia itself does.
>> > >> > >
>> > >> > > On Mon, Aug 10, 2015 at 3:43 PM, Andrei Zh <[email protected]
>> > >> > >
>> > >> > > <javascript:>> wrote:
>> > >> > >> I'm afraid it's not quite true, and I found simple way to show
>> it.
>> > >>
>> > >> In the
>> > >>
>> > >> > >> next code snippet I define function `f` and serialize it to a
>> file:
>> > >> > >>
>> > >> > >> julia> f(x) = x + 1
>> > >> > >> f (generic function with 1 method)
>> > >> > >>
>> > >> > >> julia> f(5)
>> > >> > >> 6
>> > >> > >>
>> > >> > >> julia> open("example.jld", "w") do io serialize(io, f) end
>> > >> > >>
>> > >> > >>
>> > >> > >> Then I close Julia REPL and in a new session try to load and
>> use
>> > >>
>> > >> this
>> > >>
>> > >> > >> function:
>> > >> > >>
>> > >> > >> julia> f2 = open("example.jld") do io deserialize(io) end
>> > >> > >> (anonymous function)
>> > >> > >>
>> > >> > >> julia> f2(5)
>> > >> > >> ERROR: function f not defined on process 1
>> > >> > >>
>> > >> > >> in error at error.jl:21
>> > >> > >> in anonymous at serialize.jl:398
>> > >> > >>
>> > >> > >> So deserialized function still refers to the old definition,
>> which
>> > >>
>> > >> is not
>> > >>
>> > >> > >> available in this new session.
>> > >> > >>
>> > >> > >> Is there any better way to serialize a function and run it on
>> an
>> > >> > >> unrelated Julia process?
>> > >> > >>
>> > >> > >> On Monday, August 10, 2015 at 2:33:11 PM UTC+3, Jeff Waller
>> wrote:
>> > >> > >>>> My question is: does Julia's serialization produce completely
>> > >> > >>>> self-containing code that can be run on workers? In other
>> words,
>> > >>
>> > >> is it
>> > >>
>> > >> > >>>> possible to send serialized function over network to another
>> host
>> > >>
>> > >> /
>> > >>
>> > >> > >>>> Julia
>> > >> > >>>> process and applied there without any additional information
>> from
>> > >>
>> > >> the
>> > >>
>> > >> > >>>> first
>> > >> > >>>> process?
>> > >> > >>>>
>> > >> > >>>> I made some tests on a single machine, and when I defined
>> function
>> > >> > >>>> without `@everywhere`, worker failed with a message "function
>> > >>
>> > >> myfunc
>> > >>
>> > >> > >>>> not
>> > >> > >>>> defined on process 1". With `@everywhere`, my code worked,
>> but
>> > >>
>> > >> will it
>> > >>
>> > >> > >>>> work
>> > >> > >>>> on multiple hosts with essentially independent Julia
>> processes?
>> > >> > >>>
>> > >> > >>> According to Jey here
>> > >> > >>> <
>> > >>
>> > >>
>> https://groups.google.com/forum/#!searchin/julia-users/jey/julia-users/
>> > >>
>> > >> > >>> bolLGcSCrs0/fGGVLgNhI2YJ>, Base.serialize does what we want;
>> it's
>> > >> > >>> contained in serialize.jl
>> > >> > >>> <
>> https://github.com/JuliaLang/julia/blob/master/base/serialize.jl>
>>
>>