Yes, that sounds very reasonable.
On Jan 20, 2015 6:40 AM, "Stephan Ewen" <se...@apache.org> wrote:

> Yes, I agree that the Avro serializer should be available by default. That
> is one case of a typical type that should work out of the box, given that
> we support Avro file formats.
>
> Let me summarize how I understood that suggestion:
>
>  - We make Avro available by default by registering a default serializer
> for the SpecificBase
>
>  - We create a library of serializers. We do not register them by default.
>
>  - Via FLINK-1417, we analyze the types. For any (nested) type that we
> encounter for which we have a serializer in the library, we register that
> serializer as the default serializer. Also, for every (nested) type we
> encounter, we register a tag at Kryo.
>
> I like that, it should give a nice and smooth user experience.
>
> Greetings,
> Stephan
>
>
>
>
> On Mon, Jan 19, 2015 at 12:32 PM, Robert Metzger <rmetz...@apache.org>
> wrote:
>
> > Hi,
> >
> > thank you for putting our discussion to the mailing list. This is indeed
> > where such discussions belong. For the others, we started discussing
> here:
> > https://github.com/apache/flink/pull/304
> >
> > I think there is one additional approach, which is probably close to (1):
> > We only register those serializers by default which we don't see in the
> > pre-flight phase (I think right now thats only GenericData.Array from
> > Avro).
> > We would come across all the other classes (Jodatime, Protobuf, Avro,
> > Thrift, ...) when traversing the class hierarchy, as proposed in
> > FLINK-1417. With this approach, users get the best out-of-the box
> > experience and the number of registered classes / serializers is kept at
> a
> > minimum.
> > We can still offer means to register additional serializers (I think
> thats
> > already merged to master).
> >
> > My main concern with this particular issue is a good out of the box user
> > experience. If there is an issue with type serialization, users will
> notice
> > it very early. (In my experience people often have their existing
> datatypes
> > they use with other systems, and they want to continue using them)
> > Therefore, I want to put some effort into making it as good as possible.
> I
> > would actually sacrifice performance over stability/usability here. Our
> > system is flexible enough to replace it later with a more efficient
> > serialization if that becomes an issue. But maybe my suggestion above is
> > already sufficient.
> >
> > We could also think about introducing a configuration variable which
> allows
> > users to disable the default serializers.
> >
> >
> > Regarding the second question: Is there a downside registering all types
> > for tagging? We reduce the overall I/O which is good for performance.
> >
> > Best,
> > Robert
> >
> >
> >
> > On Mon, Jan 19, 2015 at 8:24 PM, Stephan Ewen <se...@apache.org> wrote:
> >
> > > Hi all!
> > >
> > > We have various pending pull requests that add support for certain
> types
> > by
> > > adding extra kryo serializers.
> > >
> > > I think we need to decide how we want to handle the support for extra
> > > types, because more are certainly to come.
> > >
> > > As I understand it, we have three broad options:
> > >
> > > (1)
> > > Add as many serializers to Kryo by default as possible.
> > >  Pro:
> > >     - Many types work out of the box
> > >  Contra:
> > >     - We may eventually overload the kryo registry with serializers
> > >       that are not needed for most cases and suffer in performance
> > >     - It is hard to guess which types work out of the box
> (intransparent)
> > >
> > >
> > > (2)
> > > We create a collection of serializers and a registration util.
> > > --------
> > > val env = ExecutionEnvironemnt.getExecutionEnviroment()
> > >
> > > Serializers.registerProtoBufSerializers(env);
> > > Serializers.registerJavaUtilSerializers(env);
> > > ---------
> > > Pro:
> > >   - Easy for users
> > >   - We can grow the set of supported types very large without
> overloading
> > > Kryo
> > >   - It is transparent what gets registered
> > >
> > > Contra:
> > >   - Not quite as convenient as if things just run
> > >
> > >
> > > (3)
> > > We do nothing and let the user create and register whatever is needed.
> > >
> > > We could have a library and utility for serializers for certain
> > libraries.
> > > Users could use this to conveniently add serializers for the libraries
> > they
> > > use.
> > > Pro:
> > >   - Simple for us ;-)
> > > Contra:
> > >   - More repeated work for users
> > >
> > >
> > > ========================
> > >
> > > For approach (1) and (2), there is an orthogonal question of whether we
> > > want to simply register default serializers (that enable that types
> work)
> > > or also register types for tags, to speed up the serialization of those
> > > types.
> > >
> > >
> > > Greetings,
> > > Stephan
> > >
> >
>

Reply via email to