Yes, that sounds very reasonable. On Jan 20, 2015 6:40 AM, "Stephan Ewen" <se...@apache.org> wrote:
> Yes, I agree that the Avro serializer should be available by default. That > is one case of a typical type that should work out of the box, given that > we support Avro file formats. > > Let me summarize how I understood that suggestion: > > - We make Avro available by default by registering a default serializer > for the SpecificBase > > - We create a library of serializers. We do not register them by default. > > - Via FLINK-1417, we analyze the types. For any (nested) type that we > encounter for which we have a serializer in the library, we register that > serializer as the default serializer. Also, for every (nested) type we > encounter, we register a tag at Kryo. > > I like that, it should give a nice and smooth user experience. > > Greetings, > Stephan > > > > > On Mon, Jan 19, 2015 at 12:32 PM, Robert Metzger <rmetz...@apache.org> > wrote: > > > Hi, > > > > thank you for putting our discussion to the mailing list. This is indeed > > where such discussions belong. For the others, we started discussing > here: > > https://github.com/apache/flink/pull/304 > > > > I think there is one additional approach, which is probably close to (1): > > We only register those serializers by default which we don't see in the > > pre-flight phase (I think right now thats only GenericData.Array from > > Avro). > > We would come across all the other classes (Jodatime, Protobuf, Avro, > > Thrift, ...) when traversing the class hierarchy, as proposed in > > FLINK-1417. With this approach, users get the best out-of-the box > > experience and the number of registered classes / serializers is kept at > a > > minimum. > > We can still offer means to register additional serializers (I think > thats > > already merged to master). > > > > My main concern with this particular issue is a good out of the box user > > experience. If there is an issue with type serialization, users will > notice > > it very early. (In my experience people often have their existing > datatypes > > they use with other systems, and they want to continue using them) > > Therefore, I want to put some effort into making it as good as possible. > I > > would actually sacrifice performance over stability/usability here. Our > > system is flexible enough to replace it later with a more efficient > > serialization if that becomes an issue. But maybe my suggestion above is > > already sufficient. > > > > We could also think about introducing a configuration variable which > allows > > users to disable the default serializers. > > > > > > Regarding the second question: Is there a downside registering all types > > for tagging? We reduce the overall I/O which is good for performance. > > > > Best, > > Robert > > > > > > > > On Mon, Jan 19, 2015 at 8:24 PM, Stephan Ewen <se...@apache.org> wrote: > > > > > Hi all! > > > > > > We have various pending pull requests that add support for certain > types > > by > > > adding extra kryo serializers. > > > > > > I think we need to decide how we want to handle the support for extra > > > types, because more are certainly to come. > > > > > > As I understand it, we have three broad options: > > > > > > (1) > > > Add as many serializers to Kryo by default as possible. > > > Pro: > > > - Many types work out of the box > > > Contra: > > > - We may eventually overload the kryo registry with serializers > > > that are not needed for most cases and suffer in performance > > > - It is hard to guess which types work out of the box > (intransparent) > > > > > > > > > (2) > > > We create a collection of serializers and a registration util. > > > -------- > > > val env = ExecutionEnvironemnt.getExecutionEnviroment() > > > > > > Serializers.registerProtoBufSerializers(env); > > > Serializers.registerJavaUtilSerializers(env); > > > --------- > > > Pro: > > > - Easy for users > > > - We can grow the set of supported types very large without > overloading > > > Kryo > > > - It is transparent what gets registered > > > > > > Contra: > > > - Not quite as convenient as if things just run > > > > > > > > > (3) > > > We do nothing and let the user create and register whatever is needed. > > > > > > We could have a library and utility for serializers for certain > > libraries. > > > Users could use this to conveniently add serializers for the libraries > > they > > > use. > > > Pro: > > > - Simple for us ;-) > > > Contra: > > > - More repeated work for users > > > > > > > > > ======================== > > > > > > For approach (1) and (2), there is an orthogonal question of whether we > > > want to simply register default serializers (that enable that types > work) > > > or also register types for tags, to speed up the serialization of those > > > types. > > > > > > > > > Greetings, > > > Stephan > > > > > >