+1 for program analysis from me too...

Should be doable also on a lower level (e.g. analysis of compiled *.class
files) with some off-the-shelf libraries, right?

2015-01-20 11:39 GMT+01:00 Till Rohrmann <till.rohrm...@gmail.com>:

> I like the idea to automatically figure out which types are used by a
> program and to register them at Kryo. Thus, +1 for this idea.
>
> On Tue, Jan 20, 2015 at 11:34 AM, Robert Metzger <rmetz...@apache.org>
> wrote:
>
> > @Stephan: Yes, you are summarizing it correctly.
> > I'll assign FLINK-1417 to myself and implement it as discussed here
> (once I
> > have resolved the other issues assigned to me)
> >
> > There is one additional point we forgot in the discussion so far: We are
> > initializing Kryo with twitter/chill's "ScalaKryoInstantiator()". I just
> > checked and its registering 51 default serializers and 81 registered
> types.
> >
> > I just talked to Till (who implemented the KryoSerializer originally) and
> > he also suggests to pool kryo instances using thread-local variables.
> I'll
> > look into that as well once FLINK-1417 has been resolved. I think that
> > helps to mitigate the heavy initialization of Kryo (which is inevitable)
> >
> >
> > @Arvid: Our POJO/Types support does explicitly not require our users to
> > implement any interfaces, so that option is not feasible.
> > The bytecode analysis is not part of the main flink distribution because
> it
> > is using a library with an apache incompatible license.
> >
> >
> >
> > On Tue, Jan 20, 2015 at 9:58 AM, Arvid Heise <arvid.he...@gmail.com>
> > wrote:
> >
> > > An alternative way that may not be applicable in your case:
> > >
> > > For Sopremo, all types implemented a common interface. When a package
> is
> > > loaded, the Sopremo package manager scans the jar and looks for classes
> > > implementing the interfaces (quite fast, because not the entire class
> > must
> > > be loaded). All types implementing the interface are automatically
> added
> > to
> > > Kryo with their respective annotated serializers.
> > >
> > > If you still have bytecode analysis of jobs, you can also statically
> > > determine all types that are used, check for default serializers, and
> > > maintain only a minimal set of serializers used for this specific job.
> > You
> > > could already warn for unregistered types before submitting jobs.
> > >
> > > On Tue, Jan 20, 2015 at 6:38 AM, Stephan Ewen <se...@apache.org>
> wrote:
> > >
> > > > Yes, I agree that the Avro serializer should be available by default.
> > > That
> > > > is one case of a typical type that should work out of the box, given
> > that
> > > > we support Avro file formats.
> > > >
> > > > Let me summarize how I understood that suggestion:
> > > >
> > > >  - We make Avro available by default by registering a default
> > serializer
> > > > for the SpecificBase
> > > >
> > > >  - We create a library of serializers. We do not register them by
> > > default.
> > > >
> > > >  - Via FLINK-1417, we analyze the types. For any (nested) type that
> we
> > > > encounter for which we have a serializer in the library, we register
> > that
> > > > serializer as the default serializer. Also, for every (nested) type
> we
> > > > encounter, we register a tag at Kryo.
> > > >
> > > > I like that, it should give a nice and smooth user experience.
> > > >
> > > > Greetings,
> > > > Stephan
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Jan 19, 2015 at 12:32 PM, Robert Metzger <
> rmetz...@apache.org>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > thank you for putting our discussion to the mailing list. This is
> > > indeed
> > > > > where such discussions belong. For the others, we started
> discussing
> > > > here:
> > > > > https://github.com/apache/flink/pull/304
> > > > >
> > > > > I think there is one additional approach, which is probably close
> to
> > > (1):
> > > > > We only register those serializers by default which we don't see in
> > the
> > > > > pre-flight phase (I think right now thats only GenericData.Array
> from
> > > > > Avro).
> > > > > We would come across all the other classes (Jodatime, Protobuf,
> Avro,
> > > > > Thrift, ...) when traversing the class hierarchy, as proposed in
> > > > > FLINK-1417. With this approach, users get the best out-of-the box
> > > > > experience and the number of registered classes / serializers is
> kept
> > > at
> > > > a
> > > > > minimum.
> > > > > We can still offer means to register additional serializers (I
> think
> > > > thats
> > > > > already merged to master).
> > > > >
> > > > > My main concern with this particular issue is a good out of the box
> > > user
> > > > > experience. If there is an issue with type serialization, users
> will
> > > > notice
> > > > > it very early. (In my experience people often have their existing
> > > > datatypes
> > > > > they use with other systems, and they want to continue using them)
> > > > > Therefore, I want to put some effort into making it as good as
> > > possible.
> > > > I
> > > > > would actually sacrifice performance over stability/usability here.
> > Our
> > > > > system is flexible enough to replace it later with a more efficient
> > > > > serialization if that becomes an issue. But maybe my suggestion
> above
> > > is
> > > > > already sufficient.
> > > > >
> > > > > We could also think about introducing a configuration variable
> which
> > > > allows
> > > > > users to disable the default serializers.
> > > > >
> > > > >
> > > > > Regarding the second question: Is there a downside registering all
> > > types
> > > > > for tagging? We reduce the overall I/O which is good for
> performance.
> > > > >
> > > > > Best,
> > > > > Robert
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Jan 19, 2015 at 8:24 PM, Stephan Ewen <se...@apache.org>
> > > wrote:
> > > > >
> > > > > > Hi all!
> > > > > >
> > > > > > We have various pending pull requests that add support for
> certain
> > > > types
> > > > > by
> > > > > > adding extra kryo serializers.
> > > > > >
> > > > > > I think we need to decide how we want to handle the support for
> > extra
> > > > > > types, because more are certainly to come.
> > > > > >
> > > > > > As I understand it, we have three broad options:
> > > > > >
> > > > > > (1)
> > > > > > Add as many serializers to Kryo by default as possible.
> > > > > >  Pro:
> > > > > >     - Many types work out of the box
> > > > > >  Contra:
> > > > > >     - We may eventually overload the kryo registry with
> serializers
> > > > > >       that are not needed for most cases and suffer in
> performance
> > > > > >     - It is hard to guess which types work out of the box
> > > > (intransparent)
> > > > > >
> > > > > >
> > > > > > (2)
> > > > > > We create a collection of serializers and a registration util.
> > > > > > --------
> > > > > > val env = ExecutionEnvironemnt.getExecutionEnviroment()
> > > > > >
> > > > > > Serializers.registerProtoBufSerializers(env);
> > > > > > Serializers.registerJavaUtilSerializers(env);
> > > > > > ---------
> > > > > > Pro:
> > > > > >   - Easy for users
> > > > > >   - We can grow the set of supported types very large without
> > > > overloading
> > > > > > Kryo
> > > > > >   - It is transparent what gets registered
> > > > > >
> > > > > > Contra:
> > > > > >   - Not quite as convenient as if things just run
> > > > > >
> > > > > >
> > > > > > (3)
> > > > > > We do nothing and let the user create and register whatever is
> > > needed.
> > > > > >
> > > > > > We could have a library and utility for serializers for certain
> > > > > libraries.
> > > > > > Users could use this to conveniently add serializers for the
> > > libraries
> > > > > they
> > > > > > use.
> > > > > > Pro:
> > > > > >   - Simple for us ;-)
> > > > > > Contra:
> > > > > >   - More repeated work for users
> > > > > >
> > > > > >
> > > > > > ========================
> > > > > >
> > > > > > For approach (1) and (2), there is an orthogonal question of
> > whether
> > > we
> > > > > > want to simply register default serializers (that enable that
> types
> > > > work)
> > > > > > or also register types for tags, to speed up the serialization of
> > > those
> > > > > > types.
> > > > > >
> > > > > >
> > > > > > Greetings,
> > > > > > Stephan
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to