rmetzger commented on a change in pull request #14346: URL: https://github.com/apache/flink/pull/14346#discussion_r542142908
########## File path: docs/deployment/resource-providers/standalone/index.md ########## @@ -24,153 +24,203 @@ specific language governing permissions and limitations under the License. --> -This page provides instructions on how to run Flink in a *fully distributed fashion* on a *static* (but possibly heterogeneous) cluster. - * This will be replaced by the TOC {:toc} -## Requirements -### Software Requirements +## Getting Started -Flink runs on all *UNIX-like environments*, e.g. **Linux**, **Mac OS X**, and **Cygwin** (for Windows) and expects the cluster to consist of **one master node** and **one or more worker nodes**. Before you start to setup the system, make sure you have the following software installed **on each node**: +This *Getting Started* section guides you through the local setup (on one machine, but in separate processes) of a Flink cluster. This can easily be expanded to set up a distibuted standalone cluster, which we describe in the [reference section](#distributed-cluster-setup). -- **Java 1.8.x** or higher, -- **ssh** (sshd must be running to use the Flink scripts that manage - remote components) +### Introduction -If your cluster does not fulfill these software requirements you will need to install/upgrade it. +The standalone mode is the most barebone way of deploying Flink: The Flink services described in the [deployment overview]({% link deployment/index.md %}) are just launched as processes on the operating system. Unlike deploying Flink with a resource provider such as Kubernetes or YARN, you have to take care of restarting failed processes, or allocation and de-allocation of resources during operation. -Having __passwordless SSH__ and -__the same directory structure__ on all your cluster nodes will allow you to use our scripts to control -everything. +In the additional subpages of the standalone mode resource provider, we describe additional deployment methods which are based on the standalone mode: [Deployment in Docker containers]({% link deployment/resource-providers/standalone/docker.md %}), and on [Kubernetes]({% link deployment/resource-providers/standalone/kubernetes.md %}). -{% top %} +### Preparation -### `JAVA_HOME` Configuration +Flink runs on all *UNIX-like environments*, e.g. **Linux**, **Mac OS X**, and **Cygwin** (for Windows). Before you start to setup the system, make sure you have the fulfilled the following requirements. -Flink requires the `JAVA_HOME` environment variable to be set on the master and all worker nodes and point to the directory of your Java installation. +- **Java 1.8.x** or higher installed, +- Downloaded a recent Flink distribution from the [download page]({{ site.download_url }}) and unpacked it. -You can set this variable in `conf/flink-conf.yaml` via the `env.java.home` key. +### Starting a Standalone Cluster (Session Mode) -{% top %} +These steps show how to launch a Flink standalone cluster, and submit an example job: + +{% highlight bash %} +# we assume to be in the root directory of the unzipped Flink distribution + +# (1) Start Cluster +./bin/start-cluster.sh + +# (2) You can now access the Flink Web Interface on http://localhost:8081 + +# (3) Submit example job +./bin/flink run ./examples/streaming/TopSpeedWindowing.jar + +# (4) Stop the cluster again +./bin/stop-cluster.sh +{% endhighlight %} + +In step `(1)`, we've started 2 processes: A JVM for the JobManager, and a JVM for the TaskManager. The JobManager is serving the web interface accessible at [localhost:8081](http://localhost:8081). +In step `(3)`, we are starting a Flink Client (a short-lived JVM process) that submits an application to the JobManager. + +## Deployment Modes Supported by the Standalone Cluster + +### Application Mode -## Flink Setup +To start a Flink JobManager with an embedded application, we use the `bin/standalone-job.sh` script. +We demonstrate this mode by locally starting the `TopSpeedWindowing.jar` example, running on a TaskManager. -Go to the [downloads page]({{ site.download_url }}) and get the ready-to-run package. -After downloading the latest release, copy the archive to your master node and extract it: +The application jar file needs to be available in the classpath. The easiest approach to achieve that is putting the jar into the `lib/` folder: {% highlight bash %} -tar xzf flink-*.tgz -cd flink-* +cp ./examples/streaming/TopSpeedWindowing.jar lib/ {% endhighlight %} -### Configuring Flink +Then, we can launch the JobManager: -After having extracted the system files, you need to configure Flink for the cluster by editing *conf/flink-conf.yaml*. +{% highlight bash %} +./bin/standalone-job.sh start --job-classname org.apache.flink.streaming.examples.windowing.TopSpeedWindowing +{% endhighlight %} -Set the `jobmanager.rpc.address` key to point to your master node. You should also define the maximum amount of main memory Flink is allowed to allocate on each node by setting the `jobmanager.memory.process.size` and `taskmanager.memory.process.size` keys. +The web interface is now available at [localhost:8081](http://localhost:8081). However, the application won't be able to start, because there are no TaskManagers running yet: -These values are given in MB. If some worker nodes have more main memory which you want to allocate to the Flink system you can overwrite the default value by setting `taskmanager.memory.process.size` or `taskmanager.memory.flink.size` in *conf/flink-conf.yaml* on those specific nodes. +{% highlight bash %} +./bin/taskmanager.sh start +{% endhighlight %} -Finally, you must provide a list of all nodes in your cluster that shall be used as worker nodes, i.e., nodes running a TaskManager. Edit the file *conf/workers* and enter the IP/host name of each worker node. +Note: You can start multiple TaskManagers, if your application needs more resources. -The following example illustrates the setup with three nodes (with IP addresses from _10.0.0.1_ -to _10.0.0.3_ and hostnames _master_, _worker1_, _worker2_) and shows the contents of the -configuration files (which need to be accessible at the same path on all machines): +Stopping the services is also supported via the scripts: -<div class="row"> - <div class="col-md-6 text-center"> - <img src="{% link /page/img/quickstart_cluster.png %}" style="width: 60%"> - </div> -<div class="col-md-6"> - <div class="row"> - <p class="lead text-center"> - /path/to/<strong>flink/conf/<br>flink-conf.yaml</strong> - <pre>jobmanager.rpc.address: 10.0.0.1</pre> - </p> - </div> -<div class="row" style="margin-top: 1em;"> - <p class="lead text-center"> - /path/to/<strong>flink/<br>conf/workers</strong> - <pre> -10.0.0.2 -10.0.0.3</pre> - </p> -</div> -</div> -</div> +{% highlight bash %} +./bin/taskmanager.sh stop +./bin/standalone-job.sh stop +{% endhighlight %} -The Flink directory must be available on every worker under the same path. You can use a shared NFS directory, or copy the entire Flink directory to every worker node. -Please see the [configuration page]({% link deployment/config.md %}) for details and additional configuration options. +### Session Mode -In particular, +Local deployment in the session mode has already been described in the [introduction](#starting-a-standalone-cluster-session-mode) above. - * the amount of available memory per JobManager (`jobmanager.memory.process.size`), - * the amount of available memory per TaskManager (`taskmanager.memory.process.size` and check [memory setup guide]({% link deployment/memory/mem_tuning.md %}#configure-memory-for-standalone-deployment)), - * the number of available CPUs per machine (`taskmanager.numberOfTaskSlots`), - * the total number of CPUs in the cluster (`parallelism.default`) and - * the temporary directories (`io.tmp.dirs`) +## Standalone Cluster Reference -are very important configuration values. +### Configuration -{% top %} +All available configuration options are listed on the [configuration page]({% link deployment/config.md %}), in particular the [Basic Setup]({% link deployment/config.md %}#basic-setup) section contains good advise on configuring the ports, memory, parallelism etc. + +### Debugging + +If Flink is behaving unexpectedly, we recommend looking at Flink's log files as a starting point for further investigations. + +The log files are located in the `logs/` directory. There's a `.log` file for each Flink service running on this machine. + +Alternatively, logs are available from the Flink web frontend (both for the JobManager and each TaskManager). + +By default, Flink is logging on the "INFO" log level, which provides basic information for all obvious issues. For cases where Flink supposedly behaving wrongly, reducing the log level to "DEBUG" is advised. The logging level is controlled via the `conf/log4.properties` file. +Setting `rootLogger.level = DEBUG` will boostrap Flink on the DEBUG log level. Note that a restart of Flink is required for the changes to take effect. -### Starting Flink +There's a dedicated page on the [logging]({%link deployment/advanced/logging.md %}) in Flink. -The following script starts a JobManager on the local node and connects via SSH to all worker nodes listed in the *workers* file to start the TaskManager on each node. Now your Flink system is up and running. The JobManager running on the local node will now accept jobs at the configured RPC port. +### The start and stop scripts -Assuming that you are on the master node and inside the Flink directory: +#### start-cluster.sh +The scripts provided with the standalone mode (in the `bin/` directory) use the `conf/workers` and `conf/masters` files, to determine the number of cluster instances to start and stop with the `bin/start-cluster.sh` and `bin/stop-cluster.sh` scripts. + +If password-less ssh access to the listed machines is configured, and they share the same directory structure, the script also supports starting and stopping instances remotely. + +**Example 1: Start a cluster with 2 TaskManagers locally** + +`conf/masters` contents: {% highlight bash %} -bin/start-cluster.sh +localhost {% endhighlight %} -To stop Flink, there is also a `stop-cluster.sh` script. +`conf/workers` contents: +{% highlight bash %} +localhost +localhost +{% endhighlight %} -{% top %} +**Example 2: Start a distributed cluster JobMangers** + +This assumes a cluster with 4 machines (`master1, worker1, worker2, worker3`), which all can reach each other over the network. + +`conf/masters` contents: +{% highlight bash %} +master1 +{% endhighlight %} + +`conf/workers` contents: +{% highlight bash %} +worker1 +worker2 +worker3 +{% endhighlight %} -### Adding JobManager/TaskManager Instances to a Cluster +Note that the configuration key `jobmanager.rpc.address` needs to be set to `master1` for this to work. -You can add both JobManager and TaskManager instances to your running cluster with the `bin/jobmanager.sh` and `bin/taskmanager.sh` scripts. +We show a third example with a standby JobManager in the [high-availability section](#setting-up-high-availability). -#### Adding a JobManager +#### (jobmanager|taskmanager).sh + +The `bin/jobmanager.sh` and `bin/taskmanager.sh` script support starting the respective daemon in the background (using the `start` argument), or in the foreground (using `start-foreground`). In the foreground mode, the logs are printed to standard out. This mode is useful for deployment scenarios where another process is controlling the Flink daemon (e.g. Docker). + +The scripts can be called multiple times, for example if multiple TaskManagers are needed. The instances are tracked by the scripts, and can be stopped one-by-one (using `stop`) or all together (using `stop-all`). + +#### Windows Cygwin Users + +If you are installing Flink from the git repository and you are using the Windows git shell, Cygwin can produce a failure similar to this one: + +{% highlight bash %} +c:/flink/bin/start-cluster.sh: line 30: $'\r': command not found +{% endhighlight %} + +This error occurs because git is automatically transforming UNIX line endings to Windows style line endings when running in Windows. The problem is that Cygwin can only deal with UNIX style line endings. The solution is to adjust the Cygwin settings to deal with the correct line endings by following these three steps: Review comment: I think you are right. Stuff runs on windows 😕 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org