+1 for program analysis from me too... Should be doable also on a lower level (e.g. analysis of compiled *.class files) with some off-the-shelf libraries, right?
2015-01-20 11:39 GMT+01:00 Till Rohrmann <till.rohrm...@gmail.com>: > I like the idea to automatically figure out which types are used by a > program and to register them at Kryo. Thus, +1 for this idea. > > On Tue, Jan 20, 2015 at 11:34 AM, Robert Metzger <rmetz...@apache.org> > wrote: > > > @Stephan: Yes, you are summarizing it correctly. > > I'll assign FLINK-1417 to myself and implement it as discussed here > (once I > > have resolved the other issues assigned to me) > > > > There is one additional point we forgot in the discussion so far: We are > > initializing Kryo with twitter/chill's "ScalaKryoInstantiator()". I just > > checked and its registering 51 default serializers and 81 registered > types. > > > > I just talked to Till (who implemented the KryoSerializer originally) and > > he also suggests to pool kryo instances using thread-local variables. > I'll > > look into that as well once FLINK-1417 has been resolved. I think that > > helps to mitigate the heavy initialization of Kryo (which is inevitable) > > > > > > @Arvid: Our POJO/Types support does explicitly not require our users to > > implement any interfaces, so that option is not feasible. > > The bytecode analysis is not part of the main flink distribution because > it > > is using a library with an apache incompatible license. > > > > > > > > On Tue, Jan 20, 2015 at 9:58 AM, Arvid Heise <arvid.he...@gmail.com> > > wrote: > > > > > An alternative way that may not be applicable in your case: > > > > > > For Sopremo, all types implemented a common interface. When a package > is > > > loaded, the Sopremo package manager scans the jar and looks for classes > > > implementing the interfaces (quite fast, because not the entire class > > must > > > be loaded). All types implementing the interface are automatically > added > > to > > > Kryo with their respective annotated serializers. > > > > > > If you still have bytecode analysis of jobs, you can also statically > > > determine all types that are used, check for default serializers, and > > > maintain only a minimal set of serializers used for this specific job. > > You > > > could already warn for unregistered types before submitting jobs. > > > > > > On Tue, Jan 20, 2015 at 6:38 AM, Stephan Ewen <se...@apache.org> > wrote: > > > > > > > Yes, I agree that the Avro serializer should be available by default. > > > That > > > > is one case of a typical type that should work out of the box, given > > that > > > > we support Avro file formats. > > > > > > > > Let me summarize how I understood that suggestion: > > > > > > > > - We make Avro available by default by registering a default > > serializer > > > > for the SpecificBase > > > > > > > > - We create a library of serializers. We do not register them by > > > default. > > > > > > > > - Via FLINK-1417, we analyze the types. For any (nested) type that > we > > > > encounter for which we have a serializer in the library, we register > > that > > > > serializer as the default serializer. Also, for every (nested) type > we > > > > encounter, we register a tag at Kryo. > > > > > > > > I like that, it should give a nice and smooth user experience. > > > > > > > > Greetings, > > > > Stephan > > > > > > > > > > > > > > > > > > > > On Mon, Jan 19, 2015 at 12:32 PM, Robert Metzger < > rmetz...@apache.org> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > thank you for putting our discussion to the mailing list. This is > > > indeed > > > > > where such discussions belong. For the others, we started > discussing > > > > here: > > > > > https://github.com/apache/flink/pull/304 > > > > > > > > > > I think there is one additional approach, which is probably close > to > > > (1): > > > > > We only register those serializers by default which we don't see in > > the > > > > > pre-flight phase (I think right now thats only GenericData.Array > from > > > > > Avro). > > > > > We would come across all the other classes (Jodatime, Protobuf, > Avro, > > > > > Thrift, ...) when traversing the class hierarchy, as proposed in > > > > > FLINK-1417. With this approach, users get the best out-of-the box > > > > > experience and the number of registered classes / serializers is > kept > > > at > > > > a > > > > > minimum. > > > > > We can still offer means to register additional serializers (I > think > > > > thats > > > > > already merged to master). > > > > > > > > > > My main concern with this particular issue is a good out of the box > > > user > > > > > experience. If there is an issue with type serialization, users > will > > > > notice > > > > > it very early. (In my experience people often have their existing > > > > datatypes > > > > > they use with other systems, and they want to continue using them) > > > > > Therefore, I want to put some effort into making it as good as > > > possible. > > > > I > > > > > would actually sacrifice performance over stability/usability here. > > Our > > > > > system is flexible enough to replace it later with a more efficient > > > > > serialization if that becomes an issue. But maybe my suggestion > above > > > is > > > > > already sufficient. > > > > > > > > > > We could also think about introducing a configuration variable > which > > > > allows > > > > > users to disable the default serializers. > > > > > > > > > > > > > > > Regarding the second question: Is there a downside registering all > > > types > > > > > for tagging? We reduce the overall I/O which is good for > performance. > > > > > > > > > > Best, > > > > > Robert > > > > > > > > > > > > > > > > > > > > On Mon, Jan 19, 2015 at 8:24 PM, Stephan Ewen <se...@apache.org> > > > wrote: > > > > > > > > > > > Hi all! > > > > > > > > > > > > We have various pending pull requests that add support for > certain > > > > types > > > > > by > > > > > > adding extra kryo serializers. > > > > > > > > > > > > I think we need to decide how we want to handle the support for > > extra > > > > > > types, because more are certainly to come. > > > > > > > > > > > > As I understand it, we have three broad options: > > > > > > > > > > > > (1) > > > > > > Add as many serializers to Kryo by default as possible. > > > > > > Pro: > > > > > > - Many types work out of the box > > > > > > Contra: > > > > > > - We may eventually overload the kryo registry with > serializers > > > > > > that are not needed for most cases and suffer in > performance > > > > > > - It is hard to guess which types work out of the box > > > > (intransparent) > > > > > > > > > > > > > > > > > > (2) > > > > > > We create a collection of serializers and a registration util. > > > > > > -------- > > > > > > val env = ExecutionEnvironemnt.getExecutionEnviroment() > > > > > > > > > > > > Serializers.registerProtoBufSerializers(env); > > > > > > Serializers.registerJavaUtilSerializers(env); > > > > > > --------- > > > > > > Pro: > > > > > > - Easy for users > > > > > > - We can grow the set of supported types very large without > > > > overloading > > > > > > Kryo > > > > > > - It is transparent what gets registered > > > > > > > > > > > > Contra: > > > > > > - Not quite as convenient as if things just run > > > > > > > > > > > > > > > > > > (3) > > > > > > We do nothing and let the user create and register whatever is > > > needed. > > > > > > > > > > > > We could have a library and utility for serializers for certain > > > > > libraries. > > > > > > Users could use this to conveniently add serializers for the > > > libraries > > > > > they > > > > > > use. > > > > > > Pro: > > > > > > - Simple for us ;-) > > > > > > Contra: > > > > > > - More repeated work for users > > > > > > > > > > > > > > > > > > ======================== > > > > > > > > > > > > For approach (1) and (2), there is an orthogonal question of > > whether > > > we > > > > > > want to simply register default serializers (that enable that > types > > > > work) > > > > > > or also register types for tags, to speed up the serialization of > > > those > > > > > > types. > > > > > > > > > > > > > > > > > > Greetings, > > > > > > Stephan > > > > > > > > > > > > > > > > > > > > >