[GitHub] [flink] rmetzger commented on a change in pull request #14346: [FLINK-20354] Rework standalone docs pages

GitBox Sun, 13 Dec 2020 22:34:43 -0800


rmetzger commented on a change in pull request #14346:
URL: https://github.com/apache/flink/pull/14346#discussion_r542142908




##########
File path: docs/deployment/resource-providers/standalone/index.md
##########
@@ -24,153 +24,203 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-This page provides instructions on how to run Flink in a *fully distributed 
fashion* on a *static* (but possibly heterogeneous) cluster.
-
 * This will be replaced by the TOC
 {:toc}
 
-## Requirements
 
-### Software Requirements
+## Getting Started
 
-Flink runs on all *UNIX-like environments*, e.g. **Linux**, **Mac OS X**, and 
**Cygwin** (for Windows) and expects the cluster to consist of **one master 
node** and **one or more worker nodes**. Before you start to setup the system, 
make sure you have the following software installed **on each node**:
+This *Getting Started* section guides you through the local setup (on one 
machine, but in separate processes) of a Flink cluster. This can easily be 
expanded to set up a distibuted standalone cluster, which we describe in the 
[reference section](#distributed-cluster-setup).
 
-- **Java 1.8.x** or higher,
-- **ssh** (sshd must be running to use the Flink scripts that manage
-  remote components)
+### Introduction
 
-If your cluster does not fulfill these software requirements you will need to 
install/upgrade it.
+The standalone mode is the most barebone way of deploying Flink: The Flink 
services described in the [deployment overview]({% link deployment/index.md %}) 
are just launched as processes on the operating system. Unlike deploying Flink 
with a resource provider such as Kubernetes or YARN, you have to take care of 
restarting failed processes, or allocation and de-allocation of resources 
during operation.
 
-Having __passwordless SSH__ and
-__the same directory structure__ on all your cluster nodes will allow you to 
use our scripts to control
-everything.
+In the additional subpages of the standalone mode resource provider, we 
describe additional deployment methods which are based on the standalone mode: 
[Deployment in Docker containers]({% link 
deployment/resource-providers/standalone/docker.md %}), and on [Kubernetes]({% 
link deployment/resource-providers/standalone/kubernetes.md %}).
 
-{% top %}
+### Preparation
 
-### `JAVA_HOME` Configuration
+Flink runs on all *UNIX-like environments*, e.g. **Linux**, **Mac OS X**, and 
**Cygwin** (for Windows). Before you start to setup the system, make sure you 
have the fulfilled the following requirements.
 
-Flink requires the `JAVA_HOME` environment variable to be set on the master 
and all worker nodes and point to the directory of your Java installation.
+- **Java 1.8.x** or higher installed,
+- Downloaded a recent Flink distribution from the [download page]({{ 
site.download_url }}) and unpacked it.
 
-You can set this variable in `conf/flink-conf.yaml` via the `env.java.home` 
key.
+### Starting a Standalone Cluster (Session Mode)
 
-{% top %}
+These steps show how to launch a Flink standalone cluster, and submit an 
example job:
+
+{% highlight bash %}
+# we assume to be in the root directory of the unzipped Flink distribution
+
+# (1) Start Cluster
+./bin/start-cluster.sh
+
+# (2) You can now access the Flink Web Interface on http://localhost:8081
+
+# (3) Submit example job
+./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
+
+# (4) Stop the cluster again
+./bin/stop-cluster.sh
+{% endhighlight %}
+
+In step `(1)`, we've started 2 processes: A JVM for the JobManager, and a JVM 
for the TaskManager. The JobManager is serving the web interface accessible at 
[localhost:8081](http://localhost:8081).
+In step `(3)`, we are starting a Flink Client (a short-lived JVM process) that 
submits an application to the JobManager.
+
+## Deployment Modes Supported by the Standalone Cluster
+
+### Application Mode
 
-## Flink Setup
+To start a Flink JobManager with an embedded application, we use the 
`bin/standalone-job.sh` script. 
+We demonstrate this mode by locally starting the `TopSpeedWindowing.jar` 
example, running on a TaskManager.
 
-Go to the [downloads page]({{ site.download_url }}) and get the ready-to-run 
package.
 
-After downloading the latest release, copy the archive to your master node and 
extract it:
+The application jar file needs to be available in the classpath. The easiest 
approach to achieve that is putting the jar into the `lib/` folder:
 
 {% highlight bash %}
-tar xzf flink-*.tgz
-cd flink-*
+cp ./examples/streaming/TopSpeedWindowing.jar lib/
 {% endhighlight %}
 
-### Configuring Flink
+Then, we can launch the JobManager:
 
-After having extracted the system files, you need to configure Flink for the 
cluster by editing *conf/flink-conf.yaml*.
+{% highlight bash %}
+./bin/standalone-job.sh start --job-classname 
org.apache.flink.streaming.examples.windowing.TopSpeedWindowing
+{% endhighlight %}
 
-Set the `jobmanager.rpc.address` key to point to your master node. You should 
also define the maximum amount of main memory Flink is allowed to allocate on 
each node by setting the `jobmanager.memory.process.size` and 
`taskmanager.memory.process.size` keys.
+The web interface is now available at [localhost:8081](http://localhost:8081). 
However, the application won't be able to start, because there are no 
TaskManagers running yet:
 
-These values are given in MB. If some worker nodes have more main memory which 
you want to allocate to the Flink system you can overwrite the default value by 
setting `taskmanager.memory.process.size` or `taskmanager.memory.flink.size` in 
*conf/flink-conf.yaml* on those specific nodes.
+{% highlight bash %}
+./bin/taskmanager.sh start
+{% endhighlight %}
 
-Finally, you must provide a list of all nodes in your cluster that shall be 
used as worker nodes, i.e., nodes running a TaskManager. Edit the file 
*conf/workers* and enter the IP/host name of each worker node.
+Note: You can start multiple TaskManagers, if your application needs more 
resources.
 
-The following example illustrates the setup with three nodes (with IP 
addresses from _10.0.0.1_
-to _10.0.0.3_ and hostnames _master_, _worker1_, _worker2_) and shows the 
contents of the
-configuration files (which need to be accessible at the same path on all 
machines):
+Stopping the services is also supported via the scripts:
 
-<div class="row">
-  <div class="col-md-6 text-center">
-    <img src="{% link /page/img/quickstart_cluster.png %}" style="width: 60%">
-  </div>
-<div class="col-md-6">
-  <div class="row">
-    <p class="lead text-center">
-      /path/to/<strong>flink/conf/<br>flink-conf.yaml</strong>
-    <pre>jobmanager.rpc.address: 10.0.0.1</pre>
-    </p>
-  </div>
-<div class="row" style="margin-top: 1em;">
-  <p class="lead text-center">
-    /path/to/<strong>flink/<br>conf/workers</strong>
-  <pre>
-10.0.0.2
-10.0.0.3</pre>
-  </p>
-</div>
-</div>
-</div>
+{% highlight bash %}
+./bin/taskmanager.sh stop
+./bin/standalone-job.sh stop
+{% endhighlight %}
 
-The Flink directory must be available on every worker under the same path. You 
can use a shared NFS directory, or copy the entire Flink directory to every 
worker node.
 
-Please see the [configuration page]({% link deployment/config.md %}) for 
details and additional configuration options.
+### Session Mode
 
-In particular,
+Local deployment in the session mode has already been described in the 
[introduction](#starting-a-standalone-cluster-session-mode) above.
 
- * the amount of available memory per JobManager 
(`jobmanager.memory.process.size`),
- * the amount of available memory per TaskManager 
(`taskmanager.memory.process.size` and check [memory setup guide]({% link 
deployment/memory/mem_tuning.md %}#configure-memory-for-standalone-deployment)),
- * the number of available CPUs per machine (`taskmanager.numberOfTaskSlots`),
- * the total number of CPUs in the cluster (`parallelism.default`) and
- * the temporary directories (`io.tmp.dirs`)
+## Standalone Cluster Reference
 
-are very important configuration values.
+### Configuration
 
-{% top %}
+All available configuration options are listed on the [configuration page]({% 
link deployment/config.md %}), in particular the [Basic Setup]({% link 
deployment/config.md %}#basic-setup) section contains good advise on 
configuring the ports, memory, parallelism etc.
+
+### Debugging
+
+If Flink is behaving unexpectedly, we recommend looking at Flink's log files 
as a starting point for further investigations.
+
+The log files are located in the `logs/` directory. There's a `.log` file for 
each Flink service running on this machine.
+
+Alternatively, logs are available from the Flink web frontend (both for the 
JobManager and each TaskManager).
+
+By default, Flink is logging on the "INFO" log level, which provides basic 
information for all obvious issues. For cases where Flink supposedly behaving 
wrongly, reducing the log level to "DEBUG" is advised. The logging level is 
controlled via the `conf/log4.properties` file.
+Setting `rootLogger.level = DEBUG` will boostrap Flink on the DEBUG log level. 
Note that a restart of Flink is required for the changes to take effect.
 
-### Starting Flink
+There's a dedicated page on the [logging]({%link 
deployment/advanced/logging.md %}) in Flink.
 
-The following script starts a JobManager on the local node and connects via 
SSH to all worker nodes listed in the *workers* file to start the TaskManager 
on each node. Now your Flink system is up and running. The JobManager running 
on the local node will now accept jobs at the configured RPC port.
+### The start and stop scripts
 
-Assuming that you are on the master node and inside the Flink directory:
+#### start-cluster.sh
 
+The scripts provided with the standalone mode (in the `bin/` directory) use 
the `conf/workers` and `conf/masters` files, to determine the number of cluster 
instances to start and stop with the `bin/start-cluster.sh` and 
`bin/stop-cluster.sh` scripts.
+
+If password-less ssh access to the listed machines is configured, and they 
share the same directory structure, the script also supports starting and 
stopping instances remotely.
+
+**Example 1: Start a cluster with 2 TaskManagers locally**
+
+`conf/masters` contents:
 {% highlight bash %}
-bin/start-cluster.sh
+localhost
 {% endhighlight %}
 
-To stop Flink, there is also a `stop-cluster.sh` script.
+`conf/workers` contents:
+{% highlight bash %}
+localhost
+localhost
+{% endhighlight %}
 
-{% top %}
+**Example 2: Start a distributed cluster JobMangers**
+
+This assumes a cluster with 4 machines (`master1, worker1, worker2, worker3`), 
which all can reach each other over the network.
+
+`conf/masters` contents:
+{% highlight bash %}
+master1
+{% endhighlight %}
+
+`conf/workers` contents:
+{% highlight bash %}
+worker1
+worker2
+worker3
+{% endhighlight %}
 
-### Adding JobManager/TaskManager Instances to a Cluster
+Note that the configuration key `jobmanager.rpc.address` needs to be set to 
`master1` for this to work.
 
-You can add both JobManager and TaskManager instances to your running cluster 
with the `bin/jobmanager.sh` and `bin/taskmanager.sh` scripts.
+We show a third example with a standby JobManager in the [high-availability 
section](#setting-up-high-availability).
 
-#### Adding a JobManager
+#### (jobmanager|taskmanager).sh
+
+The `bin/jobmanager.sh` and `bin/taskmanager.sh` script support starting the 
respective daemon in the background (using the `start` argument), or in the 
foreground (using `start-foreground`). In the foreground mode, the logs are 
printed to standard out. This mode is useful for deployment scenarios where 
another process is controlling the Flink daemon (e.g. Docker).
+
+The scripts can be called multiple times, for example if multiple TaskManagers 
are needed. The instances are tracked by the scripts, and can be stopped 
one-by-one (using `stop`) or all together (using `stop-all`).
+
+#### Windows Cygwin Users
+
+If you are installing Flink from the git repository and you are using the 
Windows git shell, Cygwin can produce a failure similar to this one:
+
+{% highlight bash %}
+c:/flink/bin/start-cluster.sh: line 30: $'\r': command not found
+{% endhighlight %}
+
+This error occurs because git is automatically transforming UNIX line endings 
to Windows style line endings when running in Windows. The problem is that 
Cygwin can only deal with UNIX style line endings. The solution is to adjust 
the Cygwin settings to deal with the correct line endings by following these 
three steps:

Review comment:
       I think you are right. Stuff runs on windows 😕 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] rmetzger commented on a change in pull request #14346: [FLINK-20354] Rework standalone docs pages

Reply via email to