Yes, I know Janino is a pure Java project. I meant if we add Scala code to flink-core, we should add Scala dependency to flink-core and it could be confusing.
Regards, Chiwan Park > On Apr 18, 2016, at 2:49 PM, Márton Balassi <balassi.mar...@gmail.com> wrote: > > Chiwan, just to clarify Janino is a Java project. [1] > > [1] https://github.com/aunkrig/janino > > On Mon, Apr 18, 2016 at 3:40 AM, Chiwan Park <chiwanp...@apache.org> wrote: > >> I prefer to avoid Scala dependencies in flink-core. If flink-core includes >> Scala dependencies, Scala version suffix (_2.10 or _2.11) should be added. >> I think that users could be confused. >> >> Regards, >> Chiwan Park >> >>> On Apr 17, 2016, at 3:49 PM, Márton Balassi <balassi.mar...@gmail.com> >> wrote: >>> >>> Hi Gábor, >>> >>> I think that adding the Janino dep to flink-core should be fine, as it >> has >>> quite slim dependencies [1,2] which are generally orthogonal to Flink's >>> main dependency line (also it is already used elsewhere). >>> >>> As for mixing Scala code that is used from the Java parts of the same >> maven >>> module I am skeptical. We have seen IDE compilation issues with projects >>> using this setup and have decided that the community-wide potential IDE >>> setup pain outweighs the individual implementation convenience with >> Scala. >>> >>> [1] >>> >> https://repo1.maven.org/maven2/org/codehaus/janino/janino-parent/2.7.8/janino-parent-2.7.8.pom >>> [2] >>> >> https://repo1.maven.org/maven2/org/codehaus/janino/janino/2.7.8/janino-2.7.8.pom >>> >>> On Sat, Apr 16, 2016 at 5:51 PM, Gábor Horváth <xazax....@gmail.com> >> wrote: >>> >>>> Hi! >>>> >>>> Table API already uses code generation and the Janino compiler [1]. Is >> it a >>>> dependency that is ok to add to flink-core? In case it is ok, I think I >>>> will use the same in order to be consistent with the other code >> generation >>>> efforts. >>>> >>>> I started to look at the Table API code generation [2] and it uses Scala >>>> extensively. There are several Scala features that can make Java code >>>> generation easier such as pattern matching and string interpolation. I >> did >>>> not see any Scala code in flink-core yet. Is it ok to implement the code >>>> generation inside the flink-core using Scala? >>>> >>>> Regards, >>>> Gábor >>>> >>>> [1] http://unkrig.de/w/Janino >>>> [2] >>>> >>>> >> https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/codegen/CodeGenerator.scala >>>> >>>> On 18 March 2016 at 19:37, Gábor Horváth <xazax....@gmail.com> wrote: >>>> >>>>> Thank you! I finalized the project. >>>>> >>>>> >>>>> On 18 March 2016 at 10:29, Márton Balassi <balassi.mar...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks Gábor, now I also see it on the internal GSoC interface. I have >>>>>> indicated that I wish to mentor your project, I think you can hit >>>> finalize >>>>>> on your project there. >>>>>> >>>>>> On Mon, Mar 14, 2016 at 11:16 AM, Gábor Horváth <xazax....@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have updated this draft to include preliminary benchmarks, >> mentioned >>>>>> the >>>>>>> interaction of annotations with savepoints, extended it with a >>>> timeline, >>>>>>> and some notes about scala case classes. >>>>>>> >>>>>>> Regards, >>>>>>> Gábor >>>>>>> >>>>>>> On 9 March 2016 at 16:12, Gábor Horváth <xazax....@gmail.com> wrote: >>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>> As far as I can see the formatting was not correct in my previous >>>>>> mail. A >>>>>>>> better formatted version is available here: >>>>>>>> >>>>>>> >>>>>> >>>> >> https://docs.google.com/document/d/1VC8lCeErx9kI5lCMPiUn625PO0rxR-iKlVqtt3hkVnk >>>>>>>> Sorry for that. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Gábor >>>>>>>> >>>>>>>> On 9 March 2016 at 15:51, Gábor Horváth <xazax....@gmail.com> >>>> wrote: >>>>>>>> >>>>>>>>> Hi,I did not want to send this proposal out before the I have some >>>>>>>>> initial benchmarks, but this issue was mentioned on the mailing >>>> list >>>>>> ( >>>>>>>>> >>>>>>> >>>>>> >>>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Tuple-performance-and-the-curious-JIT-compiler-td10666.html >>>>>>> ), >>>>>>>>> and I wanted to make this information available to be able to >>>>>>> incorporate >>>>>>>>> this into that discussion. I have written this draft with the help >>>> of >>>>>>> Gábor >>>>>>>>> Gévay and Márton Balassi and I am open to every suggestion. >>>>>>>>> >>>>>>>>> >>>>>>>>> The proposal draft: >>>>>>>>> Code Generation in Serializers and Comparators of Apache Flink >>>>>>>>> >>>>>>>>> I am doing my last semester of my MSc studies and I’m a former GSoC >>>>>>>>> student in the LLVM project. I plan to improve the serialization >>>>>> code in >>>>>>>>> Flink during this summer. The current implementation of the >>>>>> serializers >>>>>>> can >>>>>>>>> be a performance bottleneck in some scenarios. These performance >>>>>>> problems >>>>>>>>> were also reported on the mailing list recently [1]. I plan to >>>>>> implement >>>>>>>>> code generation into the serializers to improve the performance (as >>>>>>> Stephan >>>>>>>>> Ewen also suggested.) >>>>>>>>> >>>>>>>>> TODO: I plan to include some preliminary benchmarks in this >>>> section. >>>>>>>>> Performance problems with the current serializers >>>>>>>>> >>>>>>>>> 1. >>>>>>>>> >>>>>>>>> PojoSerializer uses reflection for accessing the fields, which >>>> is >>>>>>>>> slow (eg. [2]) >>>>>>>>> >>>>>>>>> >>>>>>>>> - >>>>>>>>> >>>>>>>>> This is also a serious problem for the comparators >>>>>>>>> >>>>>>>>> >>>>>>>>> 1. >>>>>>>>> >>>>>>>>> When deserializing fields of primitive types (eg. int), the >>>>>> reusing >>>>>>>>> overload of the corresponding field serializers cannot really do >>>>>> any >>>>>>> reuse, >>>>>>>>> because boxed primitive types are immutable in Java. This >>>> results >>>>>> in >>>>>>> lots >>>>>>>>> of object creations. [3][7] >>>>>>>>> 2. >>>>>>>>> >>>>>>>>> The loop to call the field serializers makes virtual function >>>>>> calls, >>>>>>>>> that cannot be speculatively devirtualized by the JVM or >>>> predicted >>>>>>> by the >>>>>>>>> CPU, because different serializer subclasses are invoked for the >>>>>>> different >>>>>>>>> fields. (And the loop cannot be unrolled, because the number of >>>>>>> iterations >>>>>>>>> is not a compile time constant.) See also the following >>>> discussion >>>>>>> on the >>>>>>>>> mailing list [1]. >>>>>>>>> 3. >>>>>>>>> >>>>>>>>> A POJO field can have the value null, so the serializer inserts >>>> 1 >>>>>>>>> byte null tags, which wastes space. (Also, the type extractor >>>>>> logic >>>>>>> does >>>>>>>>> not distinguish between primitive types and their boxed >>>> versions, >>>>>> so >>>>>>> even >>>>>>>>> an int field has a null tag.) >>>>>>>>> 4. >>>>>>>>> >>>>>>>>> Subclass tags also add a byte at the beginning of every POJO >>>>>>>>> 5. >>>>>>>>> >>>>>>>>> getLength() does not know the size in most cases [4] >>>>>>>>> Knowing the size of a type when serialized has numerous >>>>>> performance >>>>>>>>> benefits throughout Flink: >>>>>>>>> 1. >>>>>>>>> >>>>>>>>> Sorters can do in-place, when the type is small [5] >>>>>>>>> 2. >>>>>>>>> >>>>>>>>> Chaining hash tables do not need resizes, because they know >>>> how >>>>>>>>> many buckets to allocate upfront [6] >>>>>>>>> 3. >>>>>>>>> >>>>>>>>> Different hash table architectures could be used, eg. open >>>>>>>>> addressing with linear probing instead of some chaining >>>>>>>>> 4. >>>>>>>>> >>>>>>>>> It is possible to deserialize, modify, and then serialize >>>> back >>>>>> a >>>>>>>>> record to its original place, because it cannot happen that >>>> the >>>>>>> modified >>>>>>>>> version does not fit in the place allocated there for the old >>>>>>> version (see >>>>>>>>> CompactingHashTable and ReduceHashTable for concrete >>>> instances >>>>>> of >>>>>>> this >>>>>>>>> problem) >>>>>>>>> >>>>>>>>> >>>>>>>>> Note, that 2. and 3. are problems with not just the PojoSerializer, >>>>>> but >>>>>>>>> also with the TupleSerializer. >>>>>>>>> Solution approaches >>>>>>>>> >>>>>>>>> 1. >>>>>>>>> >>>>>>>>> Run time code generation for every POJO >>>>>>>>> >>>>>>>>> >>>>>>>>> - >>>>>>>>> >>>>>>>>> 1. and 3 . would be automatically solved, if the serializers >>>>>> for >>>>>>>>> POJOs would be generated on-the-fly (by, for example, >>>>>> Javassist) >>>>>>>>> - >>>>>>>>> >>>>>>>>> 2. also needs code generation, and also some extra effort in >>>>>> the >>>>>>>>> type extractor to distinguish between primitive types and >>>> their >>>>>>> boxed >>>>>>>>> versions >>>>>>>>> - >>>>>>>>> >>>>>>>>> could be used for PojoComparator as well (which could greatly >>>>>>>>> increase the performance of sorting) >>>>>>>>> >>>>>>>>> >>>>>>>>> 1. >>>>>>>>> >>>>>>>>> Annotations on POJOs (by the users) >>>>>>>>> >>>>>>>>> >>>>>>>>> - >>>>>>>>> >>>>>>>>> Concretely: >>>>>>>>> - >>>>>>>>> >>>>>>>>> annotate fields that will never be nulls -> no null tag >>>>>> needed >>>>>>>>> before every field! >>>>>>>>> - >>>>>>>>> >>>>>>>>> make a POJO final -> no subclass tag needed >>>>>>>>> - >>>>>>>>> >>>>>>>>> annotating a POJO that it will not be null -> no top level >>>>>> null >>>>>>>>> tag needed >>>>>>>>> - >>>>>>>>> >>>>>>>>> These would also help with the getLength problem (6.), >>>> because >>>>>> the >>>>>>>>> length is often not known because currently anything can be >>>>>> null >>>>>>> or a >>>>>>>>> subclass can appear anywhere >>>>>>>>> - >>>>>>>>> >>>>>>>>> These annotations could be done without code generation, but >>>>>> then >>>>>>>>> they would add some overhead when there are no annotations >>>>>>> present, so this >>>>>>>>> would work better together with the code generation >>>>>>>>> - >>>>>>>>> >>>>>>>>> Tuples would become a special case of POJOs, where nothing >>>> can >>>>>> be >>>>>>>>> null, and no subclass can appear, so maybe we could eliminate >>>>>> the >>>>>>>>> TupleSerializer >>>>>>>>> - >>>>>>>>> >>>>>>>>> We could annotate some internal types in Flink libraries >>>> (Gelly >>>>>>>>> (Vertex, Edge), FlinkML) >>>>>>>>> >>>>>>>>> >>>>>>>>> TODO: what is the situation with Scala case classes? Run time code >>>>>>>>> generation is probably easier in Scala? (with quasiquotes) >>>>>>>>> >>>>>>>>> About me >>>>>>>>> >>>>>>>>> I am in the last year of my Computer Science MSc studies at Eotvos >>>>>>> Lorand >>>>>>>>> University in Budapest, and planning to start a PhD in the autumn. >>>> I >>>>>>> have >>>>>>>>> been working for almost three years at Ericsson on static analysis >>>>>> tools >>>>>>>>> for C++. In 2014 I participated in GSoC, working on the LLVM >>>> project, >>>>>>> and I >>>>>>>>> am a frequent contributor ever since. The next summer I was >>>>>> interning at >>>>>>>>> Apple. >>>>>>>>> >>>>>>>>> I learned about the Flink project not too long ago and I like it so >>>>>> far. >>>>>>>>> The last few weeks I was working on some tickets to familiarize >>>>>> myself >>>>>>> with >>>>>>>>> the codebase: >>>>>>>>> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-3422 >>>>>>>>> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-3322 >>>>>>>>> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-3457 >>>>>>>>> >>>>>>>>> My CV is available here: http://xazax.web.elte.hu/files/resume.pdf >>>>>>>>> References >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>> >>>>>> >>>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Tuple-performance-and-the-curious-JIT-compiler-td10666.html >>>>>>>>> >>>>>>>>> [2] >>>>>>>>> >>>>>>> >>>>>> >>>> >> https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L369 >>>>>>>>> >>>>>>>>> [3] >>>>>>>>> >>>>>>> >>>>>> >>>> >> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/typeutils/base/IntSerializer.java#L73 >>>>>>>>> >>>>>>>>> [4] >>>>>>>>> >>>>>>> >>>>>> >>>> >> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/typeutils/TypeSerializer.java#L98 >>>>>>>>> >>>>>>>>> [5] >>>>>>>>> >>>>>>> >>>>>> >>>> >> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/sort/FixedLengthRecordSorter.java >>>>>>>>> >>>>>>>>> [6] >>>>>>>>> >>>>>>> >>>>>> >>>> >> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/hash/CompactingHashTable.java#L861 >>>>>>>>> [7] https://issues.apache.org/jira/browse/FLINK-3277 >>>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> >>>>>>>>> Gábor >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >> >>