Timo, thanks for picking up this very cool feature! I think as well that an integrated approach would be the better solution, if it can be done with reasonable effort.
+1 implementing a prototype using ASM. Let me know, if I can help somehow. Cheers, Fabian 2015-02-05 14:31 GMT+01:00 Timo Walther (JIRA) <j...@apache.org>: > > [ > https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307203#comment-14307203 > ] > > Timo Walther commented on FLINK-1319: > ------------------------------------- > > Actually, I don't like the "drop-in" approach. I think it would be much > better if the code analysis can be included in the release. Especially once > the code is stable enough, it would be great to enable it by default and > speed up jobs automatically. > > I did some research about other frameworks we could use instead. Soot is > the best framework, however, I think we can also build the code analysis on > top of the ObjectWeb ASM library[1]. It provides some functionality for > data flow analysis[2]. The examples for BasicInterpreter and BasicVerifier > look promising. Other projects use it for determine types[3]. > > Using ASM requires us to implement more but it gives us full flexibility > for further analysis use cases. > > I would try implement a simple proof-of-concept prototype. What do you > think? > > [1] http://asm.ow2.org/ > [2] http://download.forge.objectweb.org/asm/asm4-guide.pdf, 115ff > [3] > https://github.com/hraberg/enumerable/blob/master/src/main/java/org/enumerable/lambda/support/expression/ExpressionInterpreter.java > > > Add static code analysis for UDFs > > --------------------------------- > > > > Key: FLINK-1319 > > URL: https://issues.apache.org/jira/browse/FLINK-1319 > > Project: Flink > > Issue Type: New Feature > > Components: Java API, Scala API > > Reporter: Stephan Ewen > > Assignee: Timo Walther > > Priority: Minor > > > > Flink's Optimizer takes information that tells it for UDFs which fields > of the input elements are accessed, modified, or frwarded/copied. This > information frequently helps to reuse partitionings, sorts, etc. It may > speed up programs significantly, as it can frequently eliminate sorts and > shuffles, which are costly. > > Right now, users can add lightweight annotations to UDFs to provide this > information (such as adding {{@ConstandFields("0->3, 1, 2->1")}}. > > We worked with static code analysis of UDFs before, to determine this > information automatically. This is an incredible feature, as it "magically" > makes programs faster. > > For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), > this works surprisingly well in many cases. We used the "Soot" toolkit for > the static code analysis. Unfortunately, Soot is LGPL licensed and thus we > did not include any of the code so far. > > I propose to add this functionality to Flink, in the form of a drop-in > addition, to work around the LGPL incompatibility with ALS 2.0. Users could > simply download a special "flink-code-analysis.jar" and drop it into the > "lib" folder to enable this functionality. We may even add a script to > "tools" that downloads that library automatically into the lib folder. This > should be legally fine, since we do not redistribute LGPL code and only > dynamically link it (the incompatibility with ASL 2.0 is mainly in the > patentability, if I remember correctly). > > Prior work on this has been done by [~aljoscha] and [~skunert], which > could provide a code base to start with. > > *Appendix* > > Hompage to Soot static analysis toolkit: > http://www.sable.mcgill.ca/soot/ > > Papers on static analysis and for optimization: > http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf > and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf > > Quick introduction to the Optimizer: > http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf > (Section 6) > > Optimizer for Iterations: > http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf > (Sections 4.3 and 5.3) > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) >