Hi Max, thanks for your feedback. I guess you confuse the interfaces "Program" and "ProgramDescription". Using "Program" the use of main method is replaced by "getPlan(...)". However, "ProgramDescription" only adds method "getDescription()" which returns a string that explains the usage of the program (ie, short description, expected parameters).
Thus, adding "ProgramDescription" to the examples, does not change the examples -- main method will still be uses. It only adds the ability that a program "explains" itself (ie, give meta info). Furhtermore, "ProgramDescription" is also not related to the new "ParameterTool". -Matthias On 05/26/2015 11:46 AM, Maximilian Michels wrote: > I don't think `getDisplayName()` is necessary either. The class name and > the description string should be fine. Adding ProgramDescription to the > examples is not necessary; as already pointed out, using the main method is > more convenient for most users. As far as I know, the idea of the > ParameterTool was to use it only in the user code and not automatically > handle parameters. > > Changing the interface would be quite API breaking but since most programs > use the main method, IMHO we could do it. > > On Fri, May 22, 2015 at 10:09 PM, Matthias J. Sax < > mj...@informatik.hu-berlin.de> wrote: > >> Makes sense to me. :) >> >> One more thing: What about extending the "ProgramDescription" interface >> to have multiple methods as Flavio suggested (with the config(...) >> method that should be handle by the ParameterTool) >> >>> public interface FlinkJob { >>> >>> /** The name to display in the job submission UI or shell */ >>> //e.g. "My Flink HelloWorld" >>> String getDisplayName(); >>> //e.g. "This program does this and that etc.." >>> String getDescription(); >>> //e.g. <0,Integer,"An integer representing my first param">, >> <1,String,"An string representing my second param"> >>> List<Tuple3<Integer, TypeInfo, String>> paramDescription; >>> /** Set up the flink job in the passed ExecutionEnvironment */ >>> ExecutionEnvironment config(ExecutionEnvironment env); >>> } >> >> Right now, the interface is used only a couple of times in Flink's code >> base, so it would not be a problem to update those classes. However, it >> could break external code that uses the interface already (even if I >> doubt that the interface is well known and used often [or at all]). >> >> I personally don't think, that "getDiplayName()" to too helpful. >> Splitting the program description and the parameter description seems to >> be useful. For example, if wrong parameters are provided, the parameter >> description can be included in the error message. If program+parameter >> description is given in a single string, this is not possible. But this >> is only a minor issue of course. >> >> Maybe, we should also add the interface to the current Flink examples, >> to make people more aware of it. Is there any documentation on the web >> site. >> >> >> -Matthias >> >> >> >> On 05/22/2015 09:43 PM, Robert Metzger wrote: >>> Thank you for working on this. >>> My responses are inline below: >>> >>> (Flavio) >>> >>>> My suggestion is to create a specific Flink interface to get also >>>> description of a job and standardize parameter passing. >>> >>> >>> I've recently merged the ParameterTool which is solving the "standardize >>> parameter passing" problem (at least it presents a best practice) : >>> >> http://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#parsing-command-line-arguments-and-passing-them-around-in-your-flink-application >>> >>> Regarding the description: Maybe we can use the "ProgramDescription" >>> interface for getting a string describing the program in the web >> frontend. >>> >>> (Matthias) >>> >>>> I don't want to start working on it, before it's clear that it has a >>>> chance to be >>>> included in Flink. >>> >>> >>> I think the changes discussed here won't change the current behavior, but >>> they add new functionality which >>> can make the life of our users easier, so I'll vote to include your >> changes >>> (given they meet our quality standards) >>> >>> >>> If multiple classes implement "Program" interface an exception should be >>>> through (I think that would make sense). However, I am not sure was >>>> "good" behavior is, if a single "Program"-class is found and an >>>> additional main-method class. >>>> - should "Program"-class be executed (ie, "overwrite" main-method >> class) >>>> - or, better to through an exception ? >>> >>> >>> I would give a class implementing "Program" priority over a random main() >>> method in a random class. >>> Maybe printing a WARN log message informing the user that the "Program" >>> class has been choosen. >>> >>> >>> If no "Program"-class is found, but a single main-method class, Flink >>>> could execute using main method. But I am not sure either, if this is >>>> "good" behavior. If multiple main-method classes are present, throwing >>>> and exception is the only way to got, I guess. >>> >>> >>> I think the best effort approach "one class with main() found" is good. >> In >>> case of multiple main methods, a helpful exception is the best approach >> in >>> my opinion. >>> >>> >>> If the manifest contains "program-class" or "Main-Class" entry, >>>> should we check the jar file right away if the specified class is there? >>>> Right now, no check is performed and an error occurs if the user tries >>>> to execute the job. >>> >>> >>> I'd say the current approach is sufficient. There is no need to have a >>> special code path which is doing the check. >>> I think the error message will be pretty similar in both cases and I fear >>> that this additional code could also introduce new bugs ;) >>> >>> >>> >>> >>> On Fri, May 22, 2015 at 9:06 PM, Matthias J. Sax < >>> mj...@informatik.hu-berlin.de> wrote: >>> >>>> Hi, >>>> >>>> two more thoughts to this discussion: >>>> >>>> 1) looking at the commit history of "CliFrontend", I found the >>>> following closed issue and the closing pull request >>>> * https://issues.apache.org/jira/browse/FLINK-1095 >>>> * https://github.com/apache/flink/pull/238 >>>> It stand in opposite of Flavio's request to have a job description. Any >>>> comment on this? Should a removed feature be re-introduced? If not, I >>>> would suggest to remove the "ProgramDescription" interface completely. >>>> >>>> 2) If the manifest contains "program-class" or "Main-Class" entry, >>>> should we check the jar file right away if the specified class is there? >>>> Right now, no check is performed and an error occurs if the user tries >>>> to execute the job. >>>> >>>> >>>> -Matthias >>>> >>>> >>>> On 05/22/2015 12:06 PM, Matthias J. Sax wrote: >>>>> Thanks for your feedback. >>>>> >>>>> I agree on the main method "problem". For scanning and listing all >> stuff >>>>> that is found it's fine. >>>>> >>>>> The tricky question is the automatic invocation mechanism, if "-c" flag >>>>> is not used, and no manifest program-class or Main-Class entry is >> found. >>>>> >>>>> If multiple classes implement "Program" interface an exception should >> be >>>>> through (I think that would make sense). However, I am not sure was >>>>> "good" behavior is, if a single "Program"-class is found and an >>>>> additional main-method class. >>>>> - should "Program"-class be executed (ie, "overwrite" main-method >>>> class) >>>>> - or, better to through an exception ? >>>>> >>>>> If no "Program"-class is found, but a single main-method class, Flink >>>>> could execute using main method. But I am not sure either, if this is >>>>> "good" behavior. If multiple main-method classes are present, throwing >>>>> and exception is the only way to got, I guess. >>>>> >>>>> To sum up: Should Flink consider main-method classes for automatic >>>>> invocation, or should it be required for main-method classes to either >>>>> list them in "program-class" or "Main-Class" manifest parameter (to >>>>> enable them for automatic invocation)? >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> >>>>> >>>>> >>>>> On 05/22/2015 09:56 AM, Maximilian Michels wrote: >>>>>> Hi Matthias, >>>>>> >>>>>> Thank you for taking the time to analyze Flink's invocation behavior. >> I >>>>>> like your proposal. I'm not sure whether it is a good idea to scan the >>>>>> entire JAR for main methods. Sometimes, main methods are added solely >>>> for >>>>>> testing purposes and don't really serve any practical use. However, if >>>>>> you're already going through the JAR to find the ProgramDescription >>>>>> interface, then you might look for main methods as well. As long as it >>>> is >>>>>> just a listing without execution, that should be fine. >>>>>> >>>>>> Best regards, >>>>>> Max >>>>>> >>>>>> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax < >>>>>> mj...@informatik.hu-berlin.de> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I had a look into the current Workflow of Flink with regard to the >>>>>>> progressing steps of a jar file. >>>>>>> >>>>>>> If I got it right it works as follows (not sure if this is documented >>>>>>> somewhere): >>>>>>> >>>>>>> 1) check, if "-c" flag is used to set program entry point >>>>>>> if yes, goto 4 >>>>>>> 2) try to extract "program-class" property from manifest >>>>>>> (if found goto 4) >>>>>>> 3) try to extract "Main-Class" property from manifest >>>>>>> -> if not found through exception (this happens also, if no >> manifest >>>>>>> file is found at all) >>>>>>> >>>>>>> 4) check if entry point class implements "Program" interface >>>>>>> if yes, goto 6 >>>>>>> 5) check if entry point class provided "public static void >>>> main(String[] >>>>>>> args)" method >>>>>>> -> if not, through exception >>>>>>> >>>>>>> 6) execute program (ie, show plan/info or really run it) >>>>>>> >>>>>>> >>>>>>> I also "discovered" the interface "ProgramDescription" with a single >>>>>>> method "String getDescription()". Even if some examples implement >> this >>>>>>> interface (and use it in the example itself), Flink basically ignores >>>>>>> it... From the CLI there is no way to get this info, and the WebUI >> does >>>>>>> actually get it if present, however, doesn't show it anywhere... >>>>>>> >>>>>>> >>>>>>> I think it would be nice, if we would extend the following functions: >>>>>>> >>>>>>> - extend the possibility to specify multiple entry classes in >>>>>>> "program-class" or "Main-Class" -> in this case, the user needs to >> use >>>>>>> "-c" flag to pick program to run every time >>>>>>> >>>>>>> - add a CLI option that allows the user to see what entry point >>>> classes >>>>>>> are available >>>>>>> for this, consider >>>>>>> a) "program-class" entry >>>>>>> b) "Main-Class" entry >>>>>>> c) if neither is found, scan jar-file for classes implementing >>>>>>> "Program" interface >>>>>>> d) if still not found, scan jar-file for classes with "main" >>>> method >>>>>>> >>>>>>> - if user looks for entry point classes via CLI, check for >>>>>>> "ProgramDesciption" interface and show info >>>>>>> >>>>>>> - extend WebUI to show all available entry-classes (pull request >>>>>>> already there, for multiple entries in "program-class") >>>>>>> >>>>>>> - extend WebUI to show "ProgramDescription" info >>>>>>> >>>>>>> >>>>>>> What do you think? I am not too sure about the "auto scan" of the jar >>>>>>> file if no manifest entry is provided. We might get some "fat jars" >> and >>>>>>> scanning might take some time. >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 05/19/2015 10:44 AM, Stephan Ewen wrote: >>>>>>>> We actually has an interface like that before ("Program"). It is >> still >>>>>>>> supported, but in all new programs we simply use the Java main >> method. >>>>>>> The >>>>>>>> advantage is that >>>>>>>> most IDEs can create executable JARs automatically, setting the JAR >>>>>>>> manifest attributes, etc. >>>>>>>> >>>>>>>> The "Program" interface still works, though. Most tool classes (like >>>>>>>> "PackagedProgram") have a way to figure out whether the code uses >>>>>>> "main()" >>>>>>>> or implements "Program" >>>>>>>> and calls the right method. >>>>>>>> >>>>>>>> You can try and extend the program interface. If you want to >>>> consistently >>>>>>>> support multiple programs in one JAR file, you may need to adjust >> the >>>>>>> util >>>>>>>> classes as >>>>>>>> well to deal with that. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax < >>>>>>>> mj...@informatik.hu-berlin.de> wrote: >>>>>>>> >>>>>>>>> Supporting an interface like this seems to be a nice idea. Any >> other >>>>>>>>> opinions on it? >>>>>>>>> >>>>>>>>> It seems to be some more work to get it done right. I don't want to >>>>>>>>> start working on it, before it's clear that it has a chance to be >>>>>>>>> included in Flink. >>>>>>>>> >>>>>>>>> @Flavio: I moved the discussion to dev mailing list (user list is >> not >>>>>>>>> appropriate for this discussion). Are you subscribed to it or >> should >>>> I >>>>>>>>> cc you in each mail? >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>>>> >>>>>>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote: >>>>>>>>>> Nice feature Matthias! >>>>>>>>>> My suggestion is to create a specific Flink interface to get also >>>>>>>>>> description of a job and standardize parameter passing. >>>>>>>>>> Then, somewhere (e.g. Manifest) you could specify the list of >>>> packages >>>>>>>>> (or >>>>>>>>>> also directly the classes) to inspect with reflection to extract >> the >>>>>>> list >>>>>>>>>> of available Flink jobs. >>>>>>>>>> Something like: >>>>>>>>>> >>>>>>>>>> public interface FlinkJob { >>>>>>>>>> >>>>>>>>>> /** The name to display in the job submission UI or shell */ >>>>>>>>>> //e.g. "My Flink HelloWorld" >>>>>>>>>> String getDisplayName(); >>>>>>>>>> //e.g. "This program does this and that etc.." >>>>>>>>>> String getDescription(); >>>>>>>>>> //e.g. <0,Integer,"An integer representing my first param">, >>>>>>>>> <1,String,"An >>>>>>>>>> string representing my second param"> >>>>>>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription; >>>>>>>>>> /** Set up the flink job in the passed ExecutionEnvironment */ >>>>>>>>>> ExecutionEnvironment config(ExecutionEnvironment env); >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> What do you think? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax < >>>>>>>>>> mj...@informatik.hu-berlin.de> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I like the idea that Flink's WebClient can show different plans >> for >>>>>>>>>>> different jobs within a single jar file. >>>>>>>>>>> >>>>>>>>>>> I prepared a prototype for this feature. You can find it here: >>>>>>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI >>>>>>>>>>> >>>>>>>>>>> To test the feature, you need to prepare a jar file, that >> contains >>>> the >>>>>>>>>>> code of multiple programs and specify each entry class in the >>>> manifest >>>>>>>>>>> file as comma separated values in "program-class" line. >>>>>>>>>>> >>>>>>>>>>> Feedback is welcome. :) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Matthias >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote: >>>>>>>>>>>> Thank you all for the support! >>>>>>>>>>>> It will be a really nice feature if the web client could be able >>>> to >>>>>>>>> show >>>>>>>>>>>> me the list of Flink jobs within my jar.. >>>>>>>>>>>> it should be sufficient to mark them with a special annotation >> and >>>>>>>>>>>> inspect the classes within the jar.. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <m...@mieo.de >>>>>>>>>>>> <mailto:m...@mieo.de>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Flavio, >>>>>>>>>>>> >>>>>>>>>>>> you also can put each job in a single class and use the –c >>>>>>>>> parameter >>>>>>>>>>>> to execute jobs separately: >>>>>>>>>>>> >>>>>>>>>>>> /bin/flink run –c com.myflinkjobs.JobA >>>>>>>>> /path/to/jar/multiplejobs.jar >>>>>>>>>>>> /bin/flink run –c com.myflinkjobs.JobB >>>>>>>>> /path/to/jar/multiplejobs.jar >>>>>>>>>>>> … >>>>>>>>>>>> >>>>>>>>>>>> Cheers >>>>>>>>>>>> Malte >>>>>>>>>>>> >>>>>>>>>>>> Von: Robert Metzger <rmetz...@apache.org <mailto: >>>>>>>>> rmetz...@apache.org >>>>>>>>>>>>> >>>>>>>>>>>> Antworten an: <u...@flink.apache.org <mailto: >>>>>>> u...@flink.apache.org >>>>>>>>>>> >>>>>>>>>>>> Datum: Freitag, 8. Mai 2015 14:57 >>>>>>>>>>>> An: "u...@flink.apache.org <mailto:u...@flink.apache.org>" >>>>>>>>>>>> <u...@flink.apache.org <mailto:u...@flink.apache.org>> >>>>>>>>>>>> Betreff: Re: Package multiple jobs in a single jar >>>>>>>>>>>> >>>>>>>>>>>> Hi Flavio, >>>>>>>>>>>> >>>>>>>>>>>> the pom from our quickstart is a good >>>>>>>>>>>> reference: >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>> >> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier >>>>>>>>>>>> <pomperma...@okkam.it <mailto:pomperma...@okkam.it>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Ok, get it. >>>>>>>>>>>> And is there a reference pom.xml for shading my >>>> application >>>>>>>>> into >>>>>>>>>>>> one fat-jar? which flink dependencies can I exclude? >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske < >>>>>>>>> fhue...@gmail.com >>>>>>>>>>>> <mailto:fhue...@gmail.com>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> I didn't say that the main should return the >>>>>>>>>>>> ExecutionEnvironment. >>>>>>>>>>>> You can define and execute as many programs in a >> main >>>>>>>>>>>> function as you like. >>>>>>>>>>>> The program can be defined somewhere else, e.g., in >> a >>>>>>>>>>>> function that receives an ExecutionEnvironment and >>>>>>> attaches >>>>>>>>>>>> a program such as >>>>>>>>>>>> >>>>>>>>>>>> public void buildMyProgram(ExecutionEnvironment >> env) { >>>>>>>>>>>> DataSet<String> lines = env.readTextFile(...); >>>>>>>>>>>> // do something >>>>>>>>>>>> lines.writeAsText(...); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> That method could be invoked from main(): >>>>>>>>>>>> >>>>>>>>>>>> psv main() { >>>>>>>>>>>> ExecutionEnv env = ... >>>>>>>>>>>> >>>>>>>>>>>> if(...) { >>>>>>>>>>>> buildMyProgram(env); >>>>>>>>>>>> } >>>>>>>>>>>> else { >>>>>>>>>>>> buildSomeOtherProg(env); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> env.execute(); >>>>>>>>>>>> >>>>>>>>>>>> // run some more programs >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> 2015-05-08 12:56 GMT+02:00 Flavio Pompermaier >>>>>>>>>>>> <pomperma...@okkam.it <mailto:pomperma...@okkam.it >>>> : >>>>>>>>>>>> >>>>>>>>>>>> Hi Fabian, >>>>>>>>>>>> thanks for the response. >>>>>>>>>>>> So my mains should be converted in a method >>>> returning >>>>>>>>>>>> the ExecutionEnvironment. >>>>>>>>>>>> However it think that it will be very nice to >>>> have a >>>>>>>>>>>> syntax like the one of the Hadoop ProgramDriver >> to >>>>>>>>>>>> define jobs to invoke from a single root class. >>>>>>>>>>>> Do you think it could be useful? >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske >>>>>>>>>>>> <fhue...@gmail.com <mailto:fhue...@gmail.com>> >>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> You easily have multiple Flink programs in a >>>>>>> single >>>>>>>>>>>> JAR file. >>>>>>>>>>>> A program is defined using an >>>>>>> ExecutionEnvironment >>>>>>>>>>>> and executed when you call >>>>>>>>>>>> ExecutionEnvironment.exeucte(). >>>>>>>>>>>> Where and how you do that does not matter. >>>>>>>>>>>> >>>>>>>>>>>> You can for example implement a main >> function >>>>>>> such >>>>>>>>>>> as: >>>>>>>>>>>> >>>>>>>>>>>> public static void main(String... args) { >>>>>>>>>>>> >>>>>>>>>>>> if (today == Monday) { >>>>>>>>>>>> ExecutionEnvironment env = ... >>>>>>>>>>>> // define Monday prog >>>>>>>>>>>> env.execute() >>>>>>>>>>>> } >>>>>>>>>>>> else { >>>>>>>>>>>> ExecutionEnvironment env = ... >>>>>>>>>>>> // define other prog >>>>>>>>>>>> env.execute() >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> 2015-05-08 11:41 GMT+02:00 Flavio >> Pompermaier >>>>>>>>>>>> <pomperma...@okkam.it <mailto: >>>>>>> pomperma...@okkam.it >>>>>>>>>>>>> : >>>>>>>>>>>> >>>>>>>>>>>> Hi to all, >>>>>>>>>>>> is there any way to keep multiple jobs >> in >>>> a >>>>>>> jar >>>>>>>>>>>> and then choose at runtime the one to >>>> execute >>>>>>>>>>>> (like what ProgramDriver does in >> Hadoop)? >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Flavio >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >
signature.asc
Description: OpenPGP digital signature