Did you by chance use the RemoteEnvironment and pass in 6123 as the
port? If so, try using 8081 instead, which is the REST port.
On 06.09.2018 18:24, Miguel Coimbra wrote:
Hello Chesnay,
Thanks for the information.
Decided to move straight away to launching a standalone cluster.
I'm now having another problem when trying to submit a job through my
Java program after launching the standalone cluster.
I configured the cluster (flink-1.6.0/conf/flink-conf.yaml) to use 2
TaskManager instances and assigned port ranges for most Flink cluster
entities (to avoid port collisions with more than 1 TaskManager):
query.server.ports: 30000-35000
query.proxy.ports: 35001-40000
taskmanager.rpc.port: 45001-50000
taskmanager.data.port: 50001-55000
blob.server.port: 55001-60000
I'm launching in Linux with:
./start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host xxxxxxx.
Starting taskexecutor daemon on host xxxxxxx.
[INFO] 1 instance(s) of taskexecutor are already running on xxxxxxx.
Starting taskexecutor daemon on host xxxxxxx.
However, my Java program ends up hanging as soon as I perform
anexecute() call (for example by calling count() on a DataSet).
Checking the JobManager log, I find the following exception whenever
my Java program calls execute() over the ExecutionEnvironment (either
using Maven on the terminal or from IntelliJ IDEA):
WARN akka.remote.transport.netty.NettyTransport - Remote connection to
[/127.0.0.1:47774 <http://127.0.0.1:47774>] failed with
org.apache.flink.shaded.akka.org.jboss.netty.handler.codec.frame.TooLongFrameException:
Adjusted frame length exceeds 10485760: 1347375960 - discarded
I checked that the problem is happening on a count(), so I don't think
it has to do with the JobManager/TaskManagers trying to exchange
excessively-big messages.
While searching, I tried to make sure my program compiles with the
same library versions as those in this cluster version of Flink.
I downloaded the Apache Flink 1.6 binaries to launch the cluster:
https://www.apache.org/dyn/closer.lua/flink/flink-1.6.0/flink-1.6.0-bin-scala_2.11.tgz
I then checked the library versions used in the pom.xml of the 1.6.0
branch of the Flink repository:
https://github.com/apache/flink/blob/release-1.6/pom.xml
On my project's pom.xml, I have the following:
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<flink.version>1.6.0</flink.version><slf4j.version>1.7.7</slf4j.version>
<log4j.version>1.2.17</log4j.version>
<scala.version>2.11.12</scala.version>
<scala.binary.version>2.11</scala.binary.version>
<akka.version>2.4.20</akka.version>
<junit.version>4.12</junit.version>
<junit.jupiter.version>5.0.0</junit.jupiter.version>
<junit.vintage.version>${junit.version}.1</junit.vintage.version>
<junit.platform.version>1.0.1</junit.platform.version>
<aspectj.version>1.9.1</aspectj.version>
</properties>
My project's dependency versions match those of the Flink 1.6
repository (for libraries such as akka).
However, I'm having difficulty understanding what else may be causing
this problem.
Thanks for your attention.
Best,
On Wed, 5 Sep 2018 at 20:18, Chesnay Schepler <ches...@apache.org
<mailto:ches...@apache.org>> wrote:
No, the cluster isn't shared. For each job a separate cluster is
spun up when calling execute(), at the end of which it is shut down.
For explicitly creation and shutdown of a cluster I would suggest
to execute your jobs as a test that contains a MiniClusterResource.
On 05.09.2018 20:59, Miguel Coimbra wrote:
Thanks for the reply.
However, I think my case differs because I am running a sequence
of independent Flink jobs on the same environment instance.
I only create the LocalExecutionEnvironment once.
The web manager shows the job ID changing correctly every time a
new job is executed.
Since it is the same execution environment (and therefore the
same cluster instance I imagine), those completed jobs should
show as well, no?
On Wed, 5 Sep 2018 at 18:40, Chesnay Schepler <ches...@apache.org
<mailto:ches...@apache.org>> wrote:
When you create an environment that way, then the cluster is
shutdown once the job completes.
The WebUI can _appear_ as still working since all the files,
and data about the job, is cached in the browser.
On 05.09.2018 17:39, Miguel Coimbra wrote:
Hello,
I'm having difficulty reading the status (such as time taken
for each dataflow operator in a job) of jobs that have
completed.
First, when I click on "Completed jobs" on the web interface
(by default at 8081), no job shows up.
I see jobs that exist as "Running", but as soon as they
finish, I would expect them to appear in the "Complete jobs"
section, but no luck.
Consider that I am running locally (web UI is running, I
checked and it is available via browser) on 8081.
None of these links worked for checking jobs that have
already finished, such as the job ID
618fac9da6ea458f5091a9c40e54cbcc that had been running:
http://127.0.0.1:8081/jobs/618fac9da6ea458f5091a9c40e54cbcc
http://127.0.0.1:8081/completed-jobs/618fac9da6ea458f5091a9c40e54cbcc
I'm running with a LocalExecutionEnvironment with with the
method:
ExecutionEnvironment.createLocalEnvironmentWithWebUI(conf)
I hope anyone may be able to help.
Best,