Yes, I agree that the Avro serializer should be available by default. That is one case of a typical type that should work out of the box, given that we support Avro file formats.
Let me summarize how I understood that suggestion: - We make Avro available by default by registering a default serializer for the SpecificBase - We create a library of serializers. We do not register them by default. - Via FLINK-1417, we analyze the types. For any (nested) type that we encounter for which we have a serializer in the library, we register that serializer as the default serializer. Also, for every (nested) type we encounter, we register a tag at Kryo. I like that, it should give a nice and smooth user experience. Greetings, Stephan On Mon, Jan 19, 2015 at 12:32 PM, Robert Metzger <rmetz...@apache.org> wrote: > Hi, > > thank you for putting our discussion to the mailing list. This is indeed > where such discussions belong. For the others, we started discussing here: > https://github.com/apache/flink/pull/304 > > I think there is one additional approach, which is probably close to (1): > We only register those serializers by default which we don't see in the > pre-flight phase (I think right now thats only GenericData.Array from > Avro). > We would come across all the other classes (Jodatime, Protobuf, Avro, > Thrift, ...) when traversing the class hierarchy, as proposed in > FLINK-1417. With this approach, users get the best out-of-the box > experience and the number of registered classes / serializers is kept at a > minimum. > We can still offer means to register additional serializers (I think thats > already merged to master). > > My main concern with this particular issue is a good out of the box user > experience. If there is an issue with type serialization, users will notice > it very early. (In my experience people often have their existing datatypes > they use with other systems, and they want to continue using them) > Therefore, I want to put some effort into making it as good as possible. I > would actually sacrifice performance over stability/usability here. Our > system is flexible enough to replace it later with a more efficient > serialization if that becomes an issue. But maybe my suggestion above is > already sufficient. > > We could also think about introducing a configuration variable which allows > users to disable the default serializers. > > > Regarding the second question: Is there a downside registering all types > for tagging? We reduce the overall I/O which is good for performance. > > Best, > Robert > > > > On Mon, Jan 19, 2015 at 8:24 PM, Stephan Ewen <se...@apache.org> wrote: > > > Hi all! > > > > We have various pending pull requests that add support for certain types > by > > adding extra kryo serializers. > > > > I think we need to decide how we want to handle the support for extra > > types, because more are certainly to come. > > > > As I understand it, we have three broad options: > > > > (1) > > Add as many serializers to Kryo by default as possible. > > Pro: > > - Many types work out of the box > > Contra: > > - We may eventually overload the kryo registry with serializers > > that are not needed for most cases and suffer in performance > > - It is hard to guess which types work out of the box (intransparent) > > > > > > (2) > > We create a collection of serializers and a registration util. > > -------- > > val env = ExecutionEnvironemnt.getExecutionEnviroment() > > > > Serializers.registerProtoBufSerializers(env); > > Serializers.registerJavaUtilSerializers(env); > > --------- > > Pro: > > - Easy for users > > - We can grow the set of supported types very large without overloading > > Kryo > > - It is transparent what gets registered > > > > Contra: > > - Not quite as convenient as if things just run > > > > > > (3) > > We do nothing and let the user create and register whatever is needed. > > > > We could have a library and utility for serializers for certain > libraries. > > Users could use this to conveniently add serializers for the libraries > they > > use. > > Pro: > > - Simple for us ;-) > > Contra: > > - More repeated work for users > > > > > > ======================== > > > > For approach (1) and (2), there is an orthogonal question of whether we > > want to simply register default serializers (that enable that types work) > > or also register types for tags, to speed up the serialization of those > > types. > > > > > > Greetings, > > Stephan > > >