[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14545125#comment-14545125
 ] 

ASF GitHub Bot commented on FLINK-1525:
---------------------------------------

Github user uce commented on a diff in the pull request:

    https://github.com/apache/flink/pull/664#discussion_r30391587
  
    --- Diff: docs/apis/best_practices.md ---
    @@ -0,0 +1,155 @@
    +---
    +title: "Best Practices"
    +---
    +<!--
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +
    +  http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +-->
    +
    +<a href="#top"></a>
    +
    +
    +This page contains a collection of best practices for Flink programmers on 
how to solve frequently encountered problems.
    +
    +
    +* This will be replaced by the TOC
    +{:toc}
    +
    +## Parsing command line arguments and passing them around in your Flink 
application
    +
    +
    +Almost all Flink applications, both batch and streaming rely on external 
configuration parameters.
    +For example for specifying input and output sources (like paths or 
addresses), also system parameters (parallelism, runtime configuration) and 
application specific parameters (often used within the user functions).
    +
    +Since version 0.9 we are providing a simple called `ParameterTool` to 
provide at least some basic tooling for solving these problems.
    +
    +As you'll see Flink is very flexible when it comes to parsing input 
parameters. You are free to choose any other framework, like [Commons 
CLI](https://commons.apache.org/proper/commons-cli/), 
[argparse4j](http://argparse4j.sourceforge.net/), or others.
    +
    +
    +### Getting your configuration values into the `ParameterTool`
    +
    +The `ParameterTool` provides a set of predefined static methods for 
reading the configuration. The tool is internally expecting a `Map<String, 
String>`, so its very easy to integrate it with your own configuration style.
    +
    +
    +#### From `.properties` files
    +
    +The following method will read a 
[Properties](https://docs.oracle.com/javase/tutorial/essential/environment/properties.html)
 file and provide the key/value pairs:
    +{% highlight java %}
    +String propertiesFile = "/home/sam/flink/myjob.properties";
    +ParameterTool parameter = ParameterTool.fromPropertiesFile(propertiesFile);
    +{% endhighlight %}
    +
    +
    +#### From the command line arguments
    +
    +This allows getting arguments like `--input hdfs:///mydata --elements 42` 
from the command line.
    +{% highlight java %}
    +public static void main(String[] args) {
    +   ParameterTool parameter = ParameterTool.fromArgs(args);
    +   // .. regular code ..
    +{% endhighlight %}
    +
    +
    +#### From system properties
    +
    +When starting a JVM, you can pass system properties to it: 
`-Dinput=hdfs:///mydata`. You can also initialize the `ParameterTool` from 
these system properties:
    +
    +{% highlight java %}
    +ParameterTool parameter = ParameterTool.fromSystemProperties();
    +{% endhighlight %}
    +
    +
    +### Using the parameters in your Flink program
    +
    +Now that we've got the parameters from somewhere (see above) we can use 
them in various ways.
    +
    +**Directly from the `ParameterTool`**
    +
    +The `ParameterTool` itself has methods for accessing the values.
    +{% highlight java %}
    +ParameterTool parameters = // ...
    +parameter.getRequired("input");
    +parameter.get("output", "myDefaultValue");
    +parameter.getLong("expectedCount", -1L);
    +parameter.getNumberOfParameters()
    +// .. there are more methods available.
    +{% endhighlight %}
    +
    +You can use the return values of these methods directly in the main() 
method (=the client submitting the application).
    +For example you could set the parallelism of a operator like this:
    +
    +{% highlight java %}
    +ParameterTool parameters = ParameterTool.fromArgs(args);
    +DataSet<Tuple2<String, Integer>> counts = text.flatMap(new 
Tokenizer()).setParallelism(parameters.getInt("mapParallelism", 2));
    --- End diff --
    
    Maybe to make this patter readable... do
    ```
    int parallelism = parameters.get("mapParallelism", 2);
    counts = ....setParalellism(parallelism);
    ```


> Provide utils to pass -D parameters to UDFs 
> --------------------------------------------
>
>                 Key: FLINK-1525
>                 URL: https://issues.apache.org/jira/browse/FLINK-1525
>             Project: Flink
>          Issue Type: Improvement
>          Components: flink-contrib
>            Reporter: Robert Metzger
>            Assignee: Robert Metzger
>              Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to