An alternative way that may not be applicable in your case: For Sopremo, all types implemented a common interface. When a package is loaded, the Sopremo package manager scans the jar and looks for classes implementing the interfaces (quite fast, because not the entire class must be loaded). All types implementing the interface are automatically added to Kryo with their respective annotated serializers.
If you still have bytecode analysis of jobs, you can also statically determine all types that are used, check for default serializers, and maintain only a minimal set of serializers used for this specific job. You could already warn for unregistered types before submitting jobs. On Tue, Jan 20, 2015 at 6:38 AM, Stephan Ewen <se...@apache.org> wrote: > Yes, I agree that the Avro serializer should be available by default. That > is one case of a typical type that should work out of the box, given that > we support Avro file formats. > > Let me summarize how I understood that suggestion: > > - We make Avro available by default by registering a default serializer > for the SpecificBase > > - We create a library of serializers. We do not register them by default. > > - Via FLINK-1417, we analyze the types. For any (nested) type that we > encounter for which we have a serializer in the library, we register that > serializer as the default serializer. Also, for every (nested) type we > encounter, we register a tag at Kryo. > > I like that, it should give a nice and smooth user experience. > > Greetings, > Stephan > > > > > On Mon, Jan 19, 2015 at 12:32 PM, Robert Metzger <rmetz...@apache.org> > wrote: > > > Hi, > > > > thank you for putting our discussion to the mailing list. This is indeed > > where such discussions belong. For the others, we started discussing > here: > > https://github.com/apache/flink/pull/304 > > > > I think there is one additional approach, which is probably close to (1): > > We only register those serializers by default which we don't see in the > > pre-flight phase (I think right now thats only GenericData.Array from > > Avro). > > We would come across all the other classes (Jodatime, Protobuf, Avro, > > Thrift, ...) when traversing the class hierarchy, as proposed in > > FLINK-1417. With this approach, users get the best out-of-the box > > experience and the number of registered classes / serializers is kept at > a > > minimum. > > We can still offer means to register additional serializers (I think > thats > > already merged to master). > > > > My main concern with this particular issue is a good out of the box user > > experience. If there is an issue with type serialization, users will > notice > > it very early. (In my experience people often have their existing > datatypes > > they use with other systems, and they want to continue using them) > > Therefore, I want to put some effort into making it as good as possible. > I > > would actually sacrifice performance over stability/usability here. Our > > system is flexible enough to replace it later with a more efficient > > serialization if that becomes an issue. But maybe my suggestion above is > > already sufficient. > > > > We could also think about introducing a configuration variable which > allows > > users to disable the default serializers. > > > > > > Regarding the second question: Is there a downside registering all types > > for tagging? We reduce the overall I/O which is good for performance. > > > > Best, > > Robert > > > > > > > > On Mon, Jan 19, 2015 at 8:24 PM, Stephan Ewen <se...@apache.org> wrote: > > > > > Hi all! > > > > > > We have various pending pull requests that add support for certain > types > > by > > > adding extra kryo serializers. > > > > > > I think we need to decide how we want to handle the support for extra > > > types, because more are certainly to come. > > > > > > As I understand it, we have three broad options: > > > > > > (1) > > > Add as many serializers to Kryo by default as possible. > > > Pro: > > > - Many types work out of the box > > > Contra: > > > - We may eventually overload the kryo registry with serializers > > > that are not needed for most cases and suffer in performance > > > - It is hard to guess which types work out of the box > (intransparent) > > > > > > > > > (2) > > > We create a collection of serializers and a registration util. > > > -------- > > > val env = ExecutionEnvironemnt.getExecutionEnviroment() > > > > > > Serializers.registerProtoBufSerializers(env); > > > Serializers.registerJavaUtilSerializers(env); > > > --------- > > > Pro: > > > - Easy for users > > > - We can grow the set of supported types very large without > overloading > > > Kryo > > > - It is transparent what gets registered > > > > > > Contra: > > > - Not quite as convenient as if things just run > > > > > > > > > (3) > > > We do nothing and let the user create and register whatever is needed. > > > > > > We could have a library and utility for serializers for certain > > libraries. > > > Users could use this to conveniently add serializers for the libraries > > they > > > use. > > > Pro: > > > - Simple for us ;-) > > > Contra: > > > - More repeated work for users > > > > > > > > > ======================== > > > > > > For approach (1) and (2), there is an orthogonal question of whether we > > > want to simply register default serializers (that enable that types > work) > > > or also register types for tags, to speed up the serialization of those > > > types. > > > > > > > > > Greetings, > > > Stephan > > > > > >