Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-03-08 Thread Greg Hogan
Hi Pat, I’m still trying to understand the implications of Java’s Class Hierarchy Analysis [0]. Flink currently uses only a single implementation of InMemorySorter, which is NormalizedKeySorter. FLINK-4705 adds support for FixedLengthRecordSorter for Flink’s Value types and Tuples. This propo

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-03-07 Thread pat.chormai
Hi all, We have almost finished implementing the functionalities. The implementation is available at [1]. Also, we have included the benchmark result of FLINK-3722 into the FLIP[2] as well as other implementation details We would be appreciated if you could give us feedback on this before actual

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-14 Thread Greg Hogan
Pat, Thanks for adding the new test results. This idea for this implementation was Gábor's from the FLINK-3722 description. Since you will be filing a FLIP I recommend including these benchmarks for consideration and discussion on the mailing list. In part because the PR is 4 months old and need

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-14 Thread Gábor Gévay
Hello, Pat, the table in your email is somehow not visible in my gmail, but it is visible here: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/FLINK-5734-Code-Generation-for-NormalizedKeySorter-tt15804.html#a15936 Maybe the problem is caused by the formatting. > FLINK-3722 > appro

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-14 Thread pat.chormai
Hi [~greghogan] I have done the benchmark comparing between FLINK-3722 and our approaches. As you can see at *Score * column which represents sorting time, FLINK-3722 approach seems to be the fastest one. -- View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.na

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-09 Thread Greg Hogan
On Thu, Feb 9, 2017 at 3:50 PM, pat.chormai wrote: > Hi Greg, > > Thanks for your feedback. I would like to answer your questions here. > > Q: Do I understand correctly that the generated code is only dependent on > the > length of the sort key? > > A: Yes, you're right. The generated code is mai

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-09 Thread pat.chormai
Hi Greg, Thanks for your feedback. I would like to answer your questions here. Q: Do I understand correctly that the generated code is only dependent on the length of the sort key? A: Yes, you're right. The generated code is mainly dependent on the length of the sort key. However, I'm not sure

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-09 Thread Till Rohrmann
I'm not sure how the code generation and compilation works in detail, but aren't you able to write a class file as the result of the compilation? Then this class file could be uploaded to the JM and from there to all TMs. The class basically becomes another job dependency like the user code jar and

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-08 Thread Greg Hogan
Hi Pat, Serkan, and Gábor, This looks very nice. I'll treat this like a pre-FLIP and ask my question here. Do I understand correctly that the generated code is only dependent on the length of the sort key? So we could separate the writing and reading of keys and records and from the generated cod

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-08 Thread Gábor Gévay
Hello Till, > Why did you decide to generate the code on the TMs? If we generated on the client side, then we would need to serialize instances of the generated classes when shipping the job to the TMs, but we would really like to avoid serializing instances of the generated classes. In the other

Re: FLINK-5734 : Code Generation for NormalizedKeySorter

2017-02-08 Thread Till Rohrmann
Hi Pat, Serkan and Gabor, I really like your design document and the preliminary results look really good. Impressive! I think your document can be converted into a FLIP. I was wondering whether it makes sense (or not) to generate the code on the client side? Then we would not have to introduce t