Hi! I would like to give you some status updates on the Google Summer of Code project. I started to implement the proposed features [1].
Status of code generation in general: * I can compile the generated code using Janino compiler * I can load the compiled classes and use them * For some mysterious reason, during deserializing a Janino compiled object, the readObject method is not invoked. When the same code is compiled using another compiler it works as intended. I am investigating this issue. In case any of you have some idea what the problem might be, don't keep it secret :) While I am trying to solve this issue, I also continue to work on code generation. I can still test the generated code, registering it manually. Status of generated POJO serializers: * I could use the generated code on the WordCountPojo example. * Everything is implemented except for copying stateful serializers and serializing subclasses. * There are several possible performance advantages of the generated serializers: - The serialization/deserialization of the fields are not in a loop, giving the JVM better chance to inline and devirtualize - Null checks are eliminated for primitive types - Subclass checks are eliminated for final classes Status of generated POJO comparators: * I started to implement them. I did some preliminary benchmarks with the generated code using the WordCountPojo example. * In the baseline (using the default Flink serializers) PojoSerializer.deserialize was one of the hottest methods (with over 11 percent sample rate). * Using the generated serializers, the percentage of samples from deserialize method went down below 3 percent. * Very significant amount of the time is spent in comparators, so there are some potential performance gains there as well. What's next? I am trying to solve the problem with the readObject method, and in the meantime I try to get the generated comparators working on the WordCountPojo example. Once that is done, I will make a more detailed performance case study. After that I will add support for handling subclasses in the generated code. At that point the generated code will have all the required features. Note: I did not change the serialization format, so the generated code can work with the default serializers. This is crucial for backward compatibility with save points. Regards, Gábor [1] https://github.com/Xazax-hun/flink/commits/serializer_codegen On 23 April 2016 at 10:33, Gábor Horváth <xazax....@gmail.com> wrote: > Hi, > > The GSoC project proposal was accepted! Thank you for all your support. I > will do my best to live up to the challenges and deliver everything that > way planned for this summer. > > Best Regards, > Gábor > > On 20 April 2016 at 16:18, Gábor Horváth <xazax....@gmail.com> wrote: > >> On the second thought I think you are right. I had the impression that >> there is cyclic dependency between TypeInformation and the serializers but >> that is not the case. So there is no rewrite needed for TypeInformation in >> order to be able to use Scala for serializers. >> >> According to the proposal unless someone utilize the annotations the >> generated serializers would be compatible to the current ones. There could >> be a configuration option whether to try to make the layout more compact >> based on annotations. >> >> On 20 April 2016 at 16:03, Fabian Hueske <fhue...@gmail.com> wrote: >> >>> Why would you need to rewrite the TypeInformation in Scala? >>> I think we need a way to replace Serializer implementations anyway unless >>> the generated serializers are compatible to the current ones. >>> >>> 2016-04-20 15:53 GMT+02:00 Gábor Horváth <xazax....@gmail.com>: >>> >>> > Hi Fabian, >>> > >>> > I agree that it would be awesome to move this to its own module/plugin. >>> > However in order to be able to write the code generation in Scala I >>> would >>> > need to rewrite the type information to use Scala as well. I think I >>> will >>> > not >>> > have time to do this during the summer, so I think I will stick to >>> Java and >>> > this modularization can be done later. >>> > >>> > Thanks, >>> > Gábor >>> > >>> > On 19 April 2016 at 11:50, Fabian Hueske <fhue...@gmail.com> wrote: >>> > >>> > > Hi Gabor, >>> > > >>> > > you are right, a codegen serializer module would depend on >>> flink-core and >>> > > in the current design flink-core would need to know about the type >>> infos >>> > / >>> > > serializers / comparators. >>> > > >>> > > Decoupling implementations of type info, serializers, and comparators >>> > from >>> > > flink-core and resolving the cyclic dependency would be what the >>> plugin >>> > > architecture would be for. >>> > > Maybe this can be done by some mechanism to dynamically load >>> > > TypeInformations for types with overridden serializers / comparators. >>> > > This would require some design document and discussion in the >>> community. >>> > > >>> > > Cheers, Fabian >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > 2016-04-18 21:19 GMT+02:00 Gábor Horváth <xazax....@gmail.com>: >>> > > >>> > > > Unfortunately making code generation a separate module would >>> introduce >>> > > > cyclic dependency. >>> > > > Code generation requires the TypeInfo which is available in >>> flink-core >>> > > and >>> > > > flink-core requires >>> > > > the generated serializers from the code generation module. Do you >>> have >>> > a >>> > > > solution for this? >>> > > > >>> > > > I think if we can come up with a solution I will implement it as a >>> > > separate >>> > > > Scala module >>> > > > otherwise I will stick to Java. >>> > > > >>> > > > BR, >>> > > > Gábor >>> > > > >>> > > > On 18 April 2016 at 12:40, Fabian Hueske <fhue...@gmail.com> >>> wrote: >>> > > > >>> > > > > +1 for not mixing Java and Scala in flink-core. >>> > > > > >>> > > > > Maybe it makes sense to implement the code generated serializers >>> / >>> > > > > comparators as a separate module which can be plugged-in. This >>> could >>> > be >>> > > > > pure Scala. >>> > > > > In general, I think it would be good to have some kind of >>> "version >>> > > > > management" for serializers in place. With features such as >>> > safepoints >>> > > > that >>> > > > > depend on the implementation of serializers, it would be good to >>> > have a >>> > > > > mechanism to switch between implementations. >>> > > > > >>> > > > > Best, Fabian >>> > > > > >>> > > > > 2016-04-18 10:01 GMT+02:00 Chiwan Park <chiwanp...@apache.org>: >>> > > > > >>> > > > > > Yes, I know Janino is a pure Java project. I meant if we add >>> Scala >>> > > code >>> > > > > to >>> > > > > > flink-core, we should add Scala dependency to flink-core and it >>> > could >>> > > > be >>> > > > > > confusing. >>> > > > > > >>> > > > > > Regards, >>> > > > > > Chiwan Park >>> > > > > > >>> > > > > > > On Apr 18, 2016, at 2:49 PM, Márton Balassi < >>> > > > balassi.mar...@gmail.com> >>> > > > > > wrote: >>> > > > > > > >>> > > > > > > Chiwan, just to clarify Janino is a Java project. [1] >>> > > > > > > >>> > > > > > > [1] https://github.com/aunkrig/janino >>> > > > > > > >>> > > > > > > On Mon, Apr 18, 2016 at 3:40 AM, Chiwan Park < >>> > > chiwanp...@apache.org> >>> > > > > > wrote: >>> > > > > > > >>> > > > > > >> I prefer to avoid Scala dependencies in flink-core. If >>> > flink-core >>> > > > > > includes >>> > > > > > >> Scala dependencies, Scala version suffix (_2.10 or _2.11) >>> should >>> > > be >>> > > > > > added. >>> > > > > > >> I think that users could be confused. >>> > > > > > >> >>> > > > > > >> Regards, >>> > > > > > >> Chiwan Park >>> > > > > > >> >>> > > > > > >>> On Apr 17, 2016, at 3:49 PM, Márton Balassi < >>> > > > > balassi.mar...@gmail.com> >>> > > > > > >> wrote: >>> > > > > > >>> >>> > > > > > >>> Hi Gábor, >>> > > > > > >>> >>> > > > > > >>> I think that adding the Janino dep to flink-core should be >>> > fine, >>> > > as >>> > > > > it >>> > > > > > >> has >>> > > > > > >>> quite slim dependencies [1,2] which are generally >>> orthogonal to >>> > > > > Flink's >>> > > > > > >>> main dependency line (also it is already used elsewhere). >>> > > > > > >>> >>> > > > > > >>> As for mixing Scala code that is used from the Java parts >>> of >>> > the >>> > > > same >>> > > > > > >> maven >>> > > > > > >>> module I am skeptical. We have seen IDE compilation issues >>> with >>> > > > > > projects >>> > > > > > >>> using this setup and have decided that the community-wide >>> > > potential >>> > > > > IDE >>> > > > > > >>> setup pain outweighs the individual implementation >>> convenience >>> > > with >>> > > > > > >> Scala. >>> > > > > > >>> >>> > > > > > >>> [1] >>> > > > > > >>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://repo1.maven.org/maven2/org/codehaus/janino/janino-parent/2.7.8/janino-parent-2.7.8.pom >>> > > > > > >>> [2] >>> > > > > > >>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://repo1.maven.org/maven2/org/codehaus/janino/janino/2.7.8/janino-2.7.8.pom >>> > > > > > >>> >>> > > > > > >>> On Sat, Apr 16, 2016 at 5:51 PM, Gábor Horváth < >>> > > > xazax....@gmail.com> >>> > > > > > >> wrote: >>> > > > > > >>> >>> > > > > > >>>> Hi! >>> > > > > > >>>> >>> > > > > > >>>> Table API already uses code generation and the Janino >>> compiler >>> > > > [1]. >>> > > > > Is >>> > > > > > >> it a >>> > > > > > >>>> dependency that is ok to add to flink-core? In case it is >>> ok, >>> > I >>> > > > > think >>> > > > > > I >>> > > > > > >>>> will use the same in order to be consistent with the other >>> > code >>> > > > > > >> generation >>> > > > > > >>>> efforts. >>> > > > > > >>>> >>> > > > > > >>>> I started to look at the Table API code generation [2] >>> and it >>> > > uses >>> > > > > > Scala >>> > > > > > >>>> extensively. There are several Scala features that can >>> make >>> > Java >>> > > > > code >>> > > > > > >>>> generation easier such as pattern matching and string >>> > > > > interpolation. I >>> > > > > > >> did >>> > > > > > >>>> not see any Scala code in flink-core yet. Is it ok to >>> > implement >>> > > > the >>> > > > > > code >>> > > > > > >>>> generation inside the flink-core using Scala? >>> > > > > > >>>> >>> > > > > > >>>> Regards, >>> > > > > > >>>> Gábor >>> > > > > > >>>> >>> > > > > > >>>> [1] http://unkrig.de/w/Janino >>> > > > > > >>>> [2] >>> > > > > > >>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/codegen/CodeGenerator.scala >>> > > > > > >>>> >>> > > > > > >>>> On 18 March 2016 at 19:37, Gábor Horváth < >>> xazax....@gmail.com >>> > > >>> > > > > wrote: >>> > > > > > >>>> >>> > > > > > >>>>> Thank you! I finalized the project. >>> > > > > > >>>>> >>> > > > > > >>>>> >>> > > > > > >>>>> On 18 March 2016 at 10:29, Márton Balassi < >>> > > > > balassi.mar...@gmail.com> >>> > > > > > >>>>> wrote: >>> > > > > > >>>>> >>> > > > > > >>>>>> Thanks Gábor, now I also see it on the internal GSoC >>> > > interface. >>> > > > I >>> > > > > > have >>> > > > > > >>>>>> indicated that I wish to mentor your project, I think >>> you >>> > can >>> > > > hit >>> > > > > > >>>> finalize >>> > > > > > >>>>>> on your project there. >>> > > > > > >>>>>> >>> > > > > > >>>>>> On Mon, Mar 14, 2016 at 11:16 AM, Gábor Horváth < >>> > > > > > xazax....@gmail.com> >>> > > > > > >>>>>> wrote: >>> > > > > > >>>>>> >>> > > > > > >>>>>>> Hi, >>> > > > > > >>>>>>> >>> > > > > > >>>>>>> I have updated this draft to include preliminary >>> > benchmarks, >>> > > > > > >> mentioned >>> > > > > > >>>>>> the >>> > > > > > >>>>>>> interaction of annotations with savepoints, extended it >>> > with >>> > > a >>> > > > > > >>>> timeline, >>> > > > > > >>>>>>> and some notes about scala case classes. >>> > > > > > >>>>>>> >>> > > > > > >>>>>>> Regards, >>> > > > > > >>>>>>> Gábor >>> > > > > > >>>>>>> >>> > > > > > >>>>>>> On 9 March 2016 at 16:12, Gábor Horváth < >>> > xazax....@gmail.com >>> > > > >>> > > > > > wrote: >>> > > > > > >>>>>>> >>> > > > > > >>>>>>>> Hi! >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> As far as I can see the formatting was not correct in >>> my >>> > > > > previous >>> > > > > > >>>>>> mail. A >>> > > > > > >>>>>>>> better formatted version is available here: >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://docs.google.com/document/d/1VC8lCeErx9kI5lCMPiUn625PO0rxR-iKlVqtt3hkVnk >>> > > > > > >>>>>>>> Sorry for that. >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> Regards, >>> > > > > > >>>>>>>> Gábor >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> On 9 March 2016 at 15:51, Gábor Horváth < >>> > > xazax....@gmail.com> >>> > > > > > >>>> wrote: >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>>> Hi,I did not want to send this proposal out before >>> the I >>> > > have >>> > > > > > some >>> > > > > > >>>>>>>>> initial benchmarks, but this issue was mentioned on >>> the >>> > > > mailing >>> > > > > > >>>> list >>> > > > > > >>>>>> ( >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Tuple-performance-and-the-curious-JIT-compiler-td10666.html >>> > > > > > >>>>>>> ), >>> > > > > > >>>>>>>>> and I wanted to make this information available to be >>> > able >>> > > to >>> > > > > > >>>>>>> incorporate >>> > > > > > >>>>>>>>> this into that discussion. I have written this draft >>> with >>> > > the >>> > > > > > help >>> > > > > > >>>> of >>> > > > > > >>>>>>> Gábor >>> > > > > > >>>>>>>>> Gévay and Márton Balassi and I am open to every >>> > suggestion. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> The proposal draft: >>> > > > > > >>>>>>>>> Code Generation in Serializers and Comparators of >>> Apache >>> > > > Flink >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> I am doing my last semester of my MSc studies and >>> I’m a >>> > > > former >>> > > > > > GSoC >>> > > > > > >>>>>>>>> student in the LLVM project. I plan to improve the >>> > > > > serialization >>> > > > > > >>>>>> code in >>> > > > > > >>>>>>>>> Flink during this summer. The current implementation >>> of >>> > the >>> > > > > > >>>>>> serializers >>> > > > > > >>>>>>> can >>> > > > > > >>>>>>>>> be a performance bottleneck in some scenarios. These >>> > > > > performance >>> > > > > > >>>>>>> problems >>> > > > > > >>>>>>>>> were also reported on the mailing list recently [1]. >>> I >>> > plan >>> > > > to >>> > > > > > >>>>>> implement >>> > > > > > >>>>>>>>> code generation into the serializers to improve the >>> > > > performance >>> > > > > > (as >>> > > > > > >>>>>>> Stephan >>> > > > > > >>>>>>>>> Ewen also suggested.) >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> TODO: I plan to include some preliminary benchmarks >>> in >>> > this >>> > > > > > >>>> section. >>> > > > > > >>>>>>>>> Performance problems with the current serializers >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> 1. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> PojoSerializer uses reflection for accessing the >>> fields, >>> > > > which >>> > > > > > >>>> is >>> > > > > > >>>>>>>>> slow (eg. [2]) >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> This is also a serious problem for the comparators >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> 1. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> When deserializing fields of primitive types (eg. >>> int), >>> > > the >>> > > > > > >>>>>> reusing >>> > > > > > >>>>>>>>> overload of the corresponding field serializers >>> cannot >>> > > > really >>> > > > > do >>> > > > > > >>>>>> any >>> > > > > > >>>>>>> reuse, >>> > > > > > >>>>>>>>> because boxed primitive types are immutable in Java. >>> > This >>> > > > > > >>>> results >>> > > > > > >>>>>> in >>> > > > > > >>>>>>> lots >>> > > > > > >>>>>>>>> of object creations. [3][7] >>> > > > > > >>>>>>>>> 2. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> The loop to call the field serializers makes virtual >>> > > > function >>> > > > > > >>>>>> calls, >>> > > > > > >>>>>>>>> that cannot be speculatively devirtualized by the >>> JVM or >>> > > > > > >>>> predicted >>> > > > > > >>>>>>> by the >>> > > > > > >>>>>>>>> CPU, because different serializer subclasses are >>> invoked >>> > > for >>> > > > > the >>> > > > > > >>>>>>> different >>> > > > > > >>>>>>>>> fields. (And the loop cannot be unrolled, because >>> the >>> > > number >>> > > > > of >>> > > > > > >>>>>>> iterations >>> > > > > > >>>>>>>>> is not a compile time constant.) See also the >>> following >>> > > > > > >>>> discussion >>> > > > > > >>>>>>> on the >>> > > > > > >>>>>>>>> mailing list [1]. >>> > > > > > >>>>>>>>> 3. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> A POJO field can have the value null, so the >>> serializer >>> > > > > inserts >>> > > > > > >>>> 1 >>> > > > > > >>>>>>>>> byte null tags, which wastes space. (Also, the type >>> > > > extractor >>> > > > > > >>>>>> logic >>> > > > > > >>>>>>> does >>> > > > > > >>>>>>>>> not distinguish between primitive types and their >>> boxed >>> > > > > > >>>> versions, >>> > > > > > >>>>>> so >>> > > > > > >>>>>>> even >>> > > > > > >>>>>>>>> an int field has a null tag.) >>> > > > > > >>>>>>>>> 4. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Subclass tags also add a byte at the beginning of >>> every >>> > > POJO >>> > > > > > >>>>>>>>> 5. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> getLength() does not know the size in most cases [4] >>> > > > > > >>>>>>>>> Knowing the size of a type when serialized has >>> numerous >>> > > > > > >>>>>> performance >>> > > > > > >>>>>>>>> benefits throughout Flink: >>> > > > > > >>>>>>>>> 1. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Sorters can do in-place, when the type is small >>> [5] >>> > > > > > >>>>>>>>> 2. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Chaining hash tables do not need resizes, because >>> > they >>> > > > know >>> > > > > > >>>> how >>> > > > > > >>>>>>>>> many buckets to allocate upfront [6] >>> > > > > > >>>>>>>>> 3. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Different hash table architectures could be >>> used, eg. >>> > > > open >>> > > > > > >>>>>>>>> addressing with linear probing instead of some >>> > chaining >>> > > > > > >>>>>>>>> 4. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> It is possible to deserialize, modify, and then >>> > > serialize >>> > > > > > >>>> back >>> > > > > > >>>>>> a >>> > > > > > >>>>>>>>> record to its original place, because it cannot >>> > happen >>> > > > that >>> > > > > > >>>> the >>> > > > > > >>>>>>> modified >>> > > > > > >>>>>>>>> version does not fit in the place allocated >>> there for >>> > > the >>> > > > > old >>> > > > > > >>>>>>> version (see >>> > > > > > >>>>>>>>> CompactingHashTable and ReduceHashTable for >>> concrete >>> > > > > > >>>> instances >>> > > > > > >>>>>> of >>> > > > > > >>>>>>> this >>> > > > > > >>>>>>>>> problem) >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Note, that 2. and 3. are problems with not just the >>> > > > > > PojoSerializer, >>> > > > > > >>>>>> but >>> > > > > > >>>>>>>>> also with the TupleSerializer. >>> > > > > > >>>>>>>>> Solution approaches >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> 1. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Run time code generation for every POJO >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> 1. and 3 . would be automatically solved, if the >>> > > > > serializers >>> > > > > > >>>>>> for >>> > > > > > >>>>>>>>> POJOs would be generated on-the-fly (by, for >>> example, >>> > > > > > >>>>>> Javassist) >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> 2. also needs code generation, and also some >>> extra >>> > > effort >>> > > > > in >>> > > > > > >>>>>> the >>> > > > > > >>>>>>>>> type extractor to distinguish between primitive >>> types >>> > > and >>> > > > > > >>>> their >>> > > > > > >>>>>>> boxed >>> > > > > > >>>>>>>>> versions >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> could be used for PojoComparator as well (which >>> could >>> > > > > greatly >>> > > > > > >>>>>>>>> increase the performance of sorting) >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> 1. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Annotations on POJOs (by the users) >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Concretely: >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> annotate fields that will never be nulls -> no >>> > null >>> > > > tag >>> > > > > > >>>>>> needed >>> > > > > > >>>>>>>>> before every field! >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> make a POJO final -> no subclass tag needed >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> annotating a POJO that it will not be null -> >>> no >>> > top >>> > > > > level >>> > > > > > >>>>>> null >>> > > > > > >>>>>>>>> tag needed >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> These would also help with the getLength problem >>> > (6.), >>> > > > > > >>>> because >>> > > > > > >>>>>> the >>> > > > > > >>>>>>>>> length is often not known because currently >>> anything >>> > > can >>> > > > be >>> > > > > > >>>>>> null >>> > > > > > >>>>>>> or a >>> > > > > > >>>>>>>>> subclass can appear anywhere >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> These annotations could be done without code >>> > > generation, >>> > > > > but >>> > > > > > >>>>>> then >>> > > > > > >>>>>>>>> they would add some overhead when there are no >>> > > > annotations >>> > > > > > >>>>>>> present, so this >>> > > > > > >>>>>>>>> would work better together with the code >>> generation >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Tuples would become a special case of POJOs, >>> where >>> > > > nothing >>> > > > > > >>>> can >>> > > > > > >>>>>> be >>> > > > > > >>>>>>>>> null, and no subclass can appear, so maybe we >>> could >>> > > > > eliminate >>> > > > > > >>>>>> the >>> > > > > > >>>>>>>>> TupleSerializer >>> > > > > > >>>>>>>>> - >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> We could annotate some internal types in Flink >>> > > libraries >>> > > > > > >>>> (Gelly >>> > > > > > >>>>>>>>> (Vertex, Edge), FlinkML) >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> TODO: what is the situation with Scala case classes? >>> Run >>> > > time >>> > > > > > code >>> > > > > > >>>>>>>>> generation is probably easier in Scala? (with >>> > quasiquotes) >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> About me >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> I am in the last year of my Computer Science MSc >>> studies >>> > at >>> > > > > > Eotvos >>> > > > > > >>>>>>> Lorand >>> > > > > > >>>>>>>>> University in Budapest, and planning to start a PhD >>> in >>> > the >>> > > > > > autumn. >>> > > > > > >>>> I >>> > > > > > >>>>>>> have >>> > > > > > >>>>>>>>> been working for almost three years at Ericsson on >>> static >>> > > > > > analysis >>> > > > > > >>>>>> tools >>> > > > > > >>>>>>>>> for C++. In 2014 I participated in GSoC, working on >>> the >>> > > LLVM >>> > > > > > >>>> project, >>> > > > > > >>>>>>> and I >>> > > > > > >>>>>>>>> am a frequent contributor ever since. The next >>> summer I >>> > was >>> > > > > > >>>>>> interning at >>> > > > > > >>>>>>>>> Apple. >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> I learned about the Flink project not too long ago >>> and I >>> > > like >>> > > > > it >>> > > > > > so >>> > > > > > >>>>>> far. >>> > > > > > >>>>>>>>> The last few weeks I was working on some tickets to >>> > > > familiarize >>> > > > > > >>>>>> myself >>> > > > > > >>>>>>> with >>> > > > > > >>>>>>>>> the codebase: >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-3422 >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-3322 >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-3457 >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> My CV is available here: >>> > > > > > http://xazax.web.elte.hu/files/resume.pdf >>> > > > > > >>>>>>>>> References >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> [1] >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Tuple-performance-and-the-curious-JIT-compiler-td10666.html >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> [2] >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L369 >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> [3] >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/typeutils/base/IntSerializer.java#L73 >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> [4] >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/typeutils/TypeSerializer.java#L98 >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> [5] >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/sort/FixedLengthRecordSorter.java >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> [6] >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/hash/CompactingHashTable.java#L861 >>> > > > > > >>>>>>>>> [7] https://issues.apache.org/jira/browse/FLINK-3277 >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Best Regards, >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>>> Gábor >>> > > > > > >>>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>>> >>> > > > > > >>>>>>> >>> > > > > > >>>>>> >>> > > > > > >>>>> >>> > > > > > >>>>> >>> > > > > > >>>> >>> > > > > > >> >>> > > > > > >> >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >> >