Hi,
There is a release note for Flink 1.7 that could be relevant for you [1]
Granularity of latency metrics
The default granularity for latency metrics has been modified. To
restore the previous behavior users have to explicitly set the granularity
to subtask.
Best,
Gary
[1]
https://ci.
Hi Puneet,
Can you describe how you validated that the state is not restored properly?
Specifically, how did you introduce faults to the cluster?
Best,
Gary
On Tue, Mar 3, 2020 at 11:08 AM Puneet Kinra <
puneet.ki...@customercentria.com> wrote:
> Sorry for the missed information
>
> On recovery
Hi Flavio,
I am looping in Becket (cc'ed) who might be able to answer your question.
Best,
Gary
On Tue, Mar 3, 2020 at 12:19 PM Flavio Pompermaier
wrote:
> Hi to all,
> since Alink has been open sourced, is there any good reason to keep both
> Flink ML and Alink?
> From what I understood Alink
Hi,
Can you post the complete stacktrace?
Best,
Gary
On Tue, Mar 3, 2020 at 1:08 PM kant kodali wrote:
> Hi All,
>
> I am just trying to read edges which has the following format in Kafka
>
> 1,2
> 1,3
> 1,5
>
> using the Table API and then converting to DataStream of Edge Objects and
> printi
ly(CaseStatements.scala:21)
>>
>> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>
>> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>>
>> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>>
>
Hi Morgan,
> I am interested in knowing more about the failure detection mechanism
used by Flink, unfortunately information is a little thin on the ground and
I was hoping someone could shed a little light on the topic.
It is probably best to look into the implementation (see my answers below).
>
Hi David,
> Would it be safe to automatically clear the temporary storage every time
when a TaskManager is started?
> (Note: the temporary volumes in use are dedicated to the TaskManager and
not shared :-)
Yes, it is safe in your case.
Best,
Gary
On Tue, Mar 10, 2020 at 6:39 PM David Maddison
w
>
> Additionally even though I add all necessary dependencies defiend in [1] I
> cannot see ProcessFunctionTestHarnesses class.
>
That class was added in Flink 1.10 [1].
[1]
https://github.com/apache/flink/blame/f765ad09ae2b2aa478c887b988e11e92a8b730bd/flink-streaming-java/src/test/java/org/apach
Can you try to set config option taskmanager.numberOfTaskSlots to 2? By
default the TMs only offer one slot [1] independent from the number of CPU
cores.
Best,
Gary
[1]
https://github.com/apache/flink/blob/da3082764117841d885f41c645961f8993a331a0/flink-core/src/main/java/org/apache/flink/configur
> Bytes Sent but Records Sent is always 0
Sounds like a bug. However, I am unable to reproduce this using the
AsyncIOExample [1]. Can you provide a minimal working example?
> Is there an Async Sink? Or do I just rewrite my Sink as an AsyncFunction
followed by a dummy sink?
You will have to impl
Hi Suraj,
This question has been asked before:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-kafka-consumers-don-t-honor-group-id-td21054.html
Best,
Gary
On Wed, Apr 22, 2020 at 6:08 PM Suraj Puvvada wrote:
>
> Hello,
>
> I have two JVMs that run LocalExecution
Hi Nick,
Are you able to upgrade to Flink 1.9? Beginning with Flink 1.9 you can use
KafkaSerializationSchema to produce a ProducerRecord [1][2].
Best,
Gary
[1] https://issues.apache.org/jira/browse/FLINK-11693
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.9/api/java/org/apache/f
Hi Averell,
If you are seeing the log message from [1] and Scheduled#report() is
not called, the thread in the "Flink-MetricRegistry" thread pool might
be blocked. You can use the jstack utility to see on which task the
thread pool is blocked.
Best,
Gary
[1]
https://github.com/apache/flink/blob
to me so quickly.
>
> Best,
> Nick
>
> On Tue, May 12, 2020 at 3:37 AM Gary Yao wrote:
>>
>> Hi Nick,
>>
>> Are you able to upgrade to Flink 1.9? Beginning with Flink 1.9 you can use
>> KafkaSerializationSchema to produce a ProducerRecord [1][2].
>>
ctory in flink the artifact is
flink-core-1.7.2.jar . I need to pack flink-core-1.9.0.jar from application
and use child first class loading to use newer version of flink-core. If I
have it as provided scope, sure it will work in IntelliJ but not outside of
it .
>
> Best,
> Nick
>
>
Hi Bhaskar,
> Why the reset counter is not zero after streaming job restart is successful?
The short answer is that the fixed delay restart strategy is not
implemented like that (see [1] if you are using Flink 1.10 or above).
There are also other systems that behave similarly, e.g., Apache
Hadoop
Congratulations Andrey, well deserved!
Best,
Gary
On Thu, Aug 15, 2019 at 7:50 AM Bowen Li wrote:
> Congratulations Andrey!
>
> On Wed, Aug 14, 2019 at 10:18 PM Rong Rong wrote:
>
>> Congratulations Andrey!
>>
>> On Wed, Aug 14, 2019 at 10:14 PM chaojianok wrote:
>>
>> > Congratulations Andre
Hi Felipe,
I am glad that you were able to fix the problem yourself.
> But I suppose that Mesos will allocate Slots and Task Managers
dynamically.
> Is that right?
Yes, that is the case since Flink 1.5 [1].
> Then I put the number of "mesos.resourcemanager.tasks.cpus" to be equal or
> less the
Hi,
You are not supposed to change that part of the exercise code. You have to
pass the path to the input file as a program argument (e.g., --input
/path/to/file). See [1] and [2] on how to configure program arguments in
IntelliJ.
Best,
Gary
[1]
https://www.jetbrains.com/help/idea/run-debug-conf
Program arguments should be set to "--input /home/alaa/nycTaxiRides.gz"
(without the quotes).
On Wed, Sep 11, 2019 at 10:39 AM alaa wrote:
> Hallo
>
> I put arguments but the same error appear .. what should i do ?
>
>
> <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/fi
Hi Steve,
> I also tried attaching a shared NFS folder between the two machines and
> tried to set their web.tmpdir property to the shared folder, however it
> appears that each job manager creates a seperate job inside that
directory.
You can create a fixed upload directory via the config option
Hi community,
Because we have approximately one month of development time left until the
targeted Flink 1.10 feature freeze, we thought now would be a good time to
give another progress update. Below we have included a list of the ongoing
efforts that have made progress since our last release prog
Hi Eric,
What you say should be possible because your job will be executed in a
MiniCluster [1] which has HA support. I have not tried this out myself,
and I am not aware that people are doing this in production. However,
there are integration tests that use MiniCluster + ZooKeeper [2].
Best,
Gar
Hi,
You can use
yarn application -status
to find the host and port that the server is listening on (AM host & RPC
Port). If you need to access that information programmatically, take a look
at
the YarnClient [1].
Best,
Gary
[1]
https://hadoop.apache.org/docs/r2.8.5/api/org/apache/hadoop/
Hi David,
Before with the both n and -s it was not the case.
>
What do you mean by before? At least in 1.8 "-s" could be used to specify
the
number of slots per TM.
how can I be sure that my Sink that uses this lib is in one JVM ?
>
Is it enough that no other parallel instance of your sink run
Hi Vishal,
Could it be that you are not using the 1.5.0 client? The stacktrace you
posted
does not reference valid lines of code in the release-1.5.0-rc6 tag.
If you have a HA setup, the host and port of the leading JM will be looked
up
from ZooKeeper before job submission. Therefore, the flink-c
ea07
>>
>> 2018-06-26 13:32:01 INFO ZooKeeper:684 - Session: 0x35add547801ea07
>> closed
>>
>> 2018-06-26 13:32:01 INFO ClientCnxn:519 - EventThread shut down for
>> session: 0x35add547801ea07
>>
>> 2018-06-26 13:32:01 DEBUG ClientCnxn:1146 - An exception was t
Hi Vishal,
The znode /flink_test/da_15/leader/rest_server_lock should exist as long as
your
Flink 1.5 cluster is running. In 1.4 this znode will not be created. Are you
sure that the znode does not exist? Unfortunately you only attached the
output
of "ls /flink_test/da_15".
Can you share the comp
Hi Oleksandr,
I think your conclusions are correct. Thank you for looking into it. You can
open a JIRA ticket describing the issue.
Best,
Gary
On Wed, Jul 4, 2018 at 9:30 AM, Oleksandr Nitavskyi
wrote:
> Hello guys,
>
> We have discovered minor issue with Flink 1.5 on YARN particularly which
>
Hi,
If you are able to re-produce this reliably, can you post the jobmanager
logs?
Best,
Gary
On Wed, Jul 18, 2018 at 10:33 AM, Renjie Liu
wrote:
> Hi, all:
>
> I'm testing flink 1.5.0 and find that flink mesos resource manager unable
> to connect to mesos after restart. Have you seen this hap
Hi Lukas,
It seems that when using MiniCluster, the config key akka.ask.timeout is not
respected. Instead, a hardcoded timeout of 10s is used [1]. Since all
communication is locally, it would be interesting to see in detail what your
job looks like that it exceeds the timeout.
The key akka.ask.ti
Hi,
The first exception should be only logged on info level. It's expected to
see
this exception when a TaskManager unregisters from the ResourceManager.
Heartbeats can be configured via heartbeat.interval and hearbeat.timeout
[1].
The default timeout is 50s, which should be a generous value. It
just need to kill job manager and restart
>> it.
>>
>> Attached is jobmanager's log, but I don't find anyting valuable since it
>> just keep reporting unable to connect to mesos master.
>>
>> On Thu, Jul 19, 2018 at 4:55 AM Gary Yao wrote:
>>
>
Hi Joey,
If the other components (e.g., Dispatcher, ResourceManager) are able to
finish
the leader election in a timely manner, I currently do not see a reason why
it
should take the REST server 20 - 45 minutes.
You can check the contents of znode /flink/.../leader/rest_server_lock to
see
if ther
Hi,
The per-job YARN mode also supports HA. Even if the YARN ResourceManager
only
brings up one JobManager at a time, you require ZooKeeper for storing
metadata, such as which jobs are supposed to be running. In fact, the
distributed components will still take part in a leader election with only
o
Hi Elias,
If you are using an ad blocker, can you turn it off and try it again?
I have forwarded your email internally at data Artisans, and we are working
on
a proper solution.
Best,
Gary
On Sun, Aug 5, 2018 at 8:30 PM, Elias Levy
wrote:
> It appears the Flink Forwards 2018 videos are FUBAR.
Hi,
nc exits after the first connection is closed. Are you re-running the nc
command every time the job finishes?
The stacktrace you copied does not indicate that a TaskManager cannot
connect
to the JobManager. I can only see that the SocketTextStreamFunction (from
the
SocketWindowWordCount job?)
nal dependencies that I didn`t
>>> include in my deploy. Do you have any clue?
>>>
>>> I am following the original quickstart (https://ci.apache.org/
>>> projects/flink/flink-docs-master/quickstart/setup_quickstart.html)
>>>
>>> Kind Regards,
>&
Hi Florian,
You write that Flink 1.4.2 works but what version is not working for you?
Best,
Gary
On Tue, Aug 7, 2018 at 8:25 AM, Florian Simond
wrote:
> Hi all,
>
>
> I'm trying to run the wordCount example on my YARN cluster and this is not
> working.. I get the error message specified in ti
(CliFrontend.
> java:1101)
> Caused by: java.io.FileNotFoundException: JAR file does not exist: -yn
> at org.apache.flink.client.cli.CliFrontend.buildProgram(
> CliFrontend.java:828)
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.
> java:205)
>
4.2, that's why I missed that
> part...
>
>
> Do you have any pointer about the dynamic number of TaskManagers ? I'm
> curious to know how it works. Is it possible to still fix it ?
>
>
> Thank you,
>
> Florian
>
>
> -
everal jobs, it could be harder to make sure they do not
> interfere with each other...
>
>
> --
> *De :* Gary Yao
> *Envoyé :* mardi 7 août 2018 12:27
>
> *À :* Florian Simond
> *Cc :* vino yang; user@flink.apache.org
> *Objet :* Re: Coul
k longer to start (10 minutes)
>
> And completed on this line:
>
> 2018-08-07 14:31:11,852 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph- Source:
> Custom Source -> Sink: Unnamed (1/1) (655509c673d8ae19aac195276ad2c3e6)
> switched from DEPLOYING to RUNNING.
>
>
>
> Thanks a lot for yo
to be correct afterall, I see "
> 1a9b648" there too: https://github.com/apache/flink/releases
>
>
> But I don't know why it's written version 0.1...
> --
> *De :* Gary Yao
> *Envoyé :* mardi 7 août 2018 19:30
> *À :* Florian
Hi Vipul,
We are aware of YARN-2031. There are some ideas how to workaround it, which
are tracked here:
https://issues.apache.org/jira/browse/FLINK-9478
At the moment you have the following options:
1. Find out the master's address from ZooKeeper [1] and issue the HTTP
request against t
g them equitably.
>> TMs selectively are under more stress then in a pure RR distribution I
>> think. We may have to lower the slots on each TM to define a good upper
>> bound. You are correct 50s is a a pretty generous value.
>>
>> On Sun, Jul 22, 2018 at 6:55 AM,
his was repeated until all restart attempts had been used (we've set it
> to 50), and then the job finally failed.
>
> I would like to know also how to prevent Flink from going into such bad
> state. At least it should exit immediately instead of retrying in such a
> situation. A
Hello Navneet Kumar Pandey,
org.apache.log4j.rolling.RollingFileAppender is part of Apache Extras
Companion for Apache log4j [1]. Is that library in your classpath?
Are there hints in taskmanager.err?
Can you run:
cat /usr/lib/flink/conf/log4j.properties
on the EMR master node and show the
ink-queryable-state-runtime_2.11-1.4.2.jar log4j-1.2.17.jar
> flink-dist_2.11-1.4.2.jar flink-python_2.11-1.4.2.jar
> flink-shaded-hadoop2-uber-1.4.2.jar slf4j-log4j12-1.7.7.jar
>
> On Fri, Aug 17, 2018 at 5:15 PM, Gary Yao wrote:
>
>> Hello Navneet Kumar P
Hi Yubraj Singh,
Can you try submitting with HADOOP_CLASSPATH=`hadoop classpath` set? [1]
For example:
HADOOP_CLASSPATH=`hadoop classpath` bin/flink run [...]
Best,
Gary
[1]
https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/hadoop.html#configuring-flink-with-hadoop-classpat
Hi Greg,
Can you describe the steps to reproduce the problem, or can you attach the
full jobmanager logs? Because JobExecutionResultHandler appears in your
log, I
assume that you are starting a job cluster on YARN. Without seeing the
complete logs, I cannot be sure what exactly happens. For now, y
setting
>> and reply again if the problem comes up again. Thanks again for your quick
>> response!
>>
>> On Fri, Aug 31, 2018 at 1:38 PM Gary Yao wrote:
>>
>>> Hi Greg,
>>>
>>> Can you describe the steps to reproduce the problem, or can yo
Hi James,
What version of Flink are you running? In 1.5.0, tasks can spread out due to
changes that were introduced to support "local recovery". There is a
mitigation in 1.5.1 that prevents task spread out but local recovery must be
disabled [2].
Best,
Gary
[1] https://issues.apache.org/jira/bro
In 1.5.0, I’ve not enable local
> recovery.
>
>
>
> So whether I need manual disable local recovery via flink.conf?
>
>
>
> Regards
>
>
>
> James
>
>
>
> *From: *"James (Jian Wu) [FDS Data Platform]"
> *Date: *Monday, September 3, 20
Hi Jelmer,
I saw that you have already found the JIRA issue tracking this problem [1]
but
I will still answer on the mailing list for transparency.
The timeout for "cancel with savepoint" should be RpcUtils.INF_TIMEOUT.
Unfortunately Flink is currently not respecting this timeout. A pull request
Hi Jason,
>From the stacktrace it seems that you are using the 1.4.0 client to list
jobs
on a 1.5.x cluster. This will not work. You have to use the 1.5.x client.
Best,
Gary
On Wed, Sep 5, 2018 at 5:35 PM, Jason Kania wrote:
> Hello,
>
> Thanks for the response. I had already tried setting the
Hi Austin,
The config options rest.port, jobmanager.web.port, etc. are intentionally
ignored on YARN. The port should be chosen randomly to avoid conflicts with
other containers [1]. I do not see a way how you can set a fixed port at the
moment but there is a related ticket for that [2]. The Flink
Hi all,
The question is being handled on the dev mailing list [1].
Best,
Gary
[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Cancel-flink-job-occur-exception-td24056.html
On Tue, Sep 4, 2018 at 2:21 PM, rileyli(李瑞亮) wrote:
> Hi all,
> I submit a flink job through yar
Hi Tony,
You are right that with FLIP-6 Akka is abstracted away. If you want custom
heartbeat settings, you can configure the options below [1]:
heatbeat.interval
heartbeat.timeout
The config option taskmanager.exit-on-fatal-akka-error is also not relevant
anymore. I closest I can think
nation-via-akka
On Mon, Sep 10, 2018 at 7:38 AM, Gary Yao wrote:
> Hi Tony,
>
> You are right that with FLIP-6 Akka is abstracted away. If you want custom
> heartbeat settings, you can configure the options below [1]:
>
> heatbeat.interval
> heartbeat.timeout
>
>
LIP-6 mode.
>
> 1. akka.transport.heartbeat.interval
> 2. akka.transport.heartbeat.pause
>
> It seems they are different from HeartbeatServices and possibly still
> valid.
>
> Best,
> tison.
>
>
> Gary Yao 于2018年9月10日周一 下午1:50写道:
>
>> I should add that
Hi,
Do you also have pmml-model-moxy as a dependency in your job? Using mvn
dependency:tree, I do not see that pmml-evaluator has a compile time
dependency on jaxb-api. The jaxb-api dependency actually comes from pmml-
model-moxy. The exclusion should be therefore defined on pmml-model-moxy.
You
Hi Tony,
You are right that these metrics are missing. There is already a ticket for
that [1]. At the moment you can obtain these information from the REST API
(/overview) [2].
Since FLIP-6, the JM is no longer responsible for these metrics but for
backwards compatibility we can leave them in the
Hi Jose,
You can find an example here:
https://github.com/apache/flink/blob/1a94c2094b8045a717a92e232f9891b23120e0f2/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/avro/ParquetStreamingFileSinkITCase.java#L58
Best,
Gary
On Tue, Sep 11, 2018 at 11:59 AM, jose farfan
Hi Averell,
According to the AWS documentation [1], the master node only runs the YARN
ResourceManager and the HDFS NameNode. Containers can only by launched on
nodes that are running the YARN NodeManager [2]. Therefore, if you want TMs
or
JMs to be launched on your EMR master node, you have to st
Hi Averell,
Flink compares the number of user selected vcores to the vcores configured
in
the yarn-site.xml of the submitting node, i.e., in your case the master
node.
If there are not enough configured vcores, the client throws an exception.
This behavior is not ideal and I found an old JIRA tick
Hi Henry,
The URL below looks like the one from the YARN proxy (note that "proxy"
appears in the URL):
http://storage4.test.lan:8089/proxy/application_1532081306755_0124/jobs/269608367e0f548f30d98aa4efa2211e/savepoints
You can use
yarn application -status
to find the host and port of the
Hi Averell,
There is no general answer to your question. If you are running more TMs,
you
get better isolation between different Flink jobs because one TM is backed
by
one JVM [1]. However, every TMs brings additional overhead (heartbeating,
running more threads, etc.) [1]. It also depends on the
Hi Borys,
To debug how many containers Flink is requesting, you can look out for the
log
statement below [1]:
Requesting new TaskExecutor container with resources [...]
If you need help debugging, can you attach the full JM logs (preferably on
DEBUG level)? Would it be possible for you to te
Hi Averell,
It is up to the YARN scheduler on which hosts the containers are started.
What Flink version are you using? I assume you are using 1.4 or earlier
because you are specifying a fixed number of TMs. If you launch Flink with
-yn
2, you should be only seeing 2 TMs in total (not 4). Are you
Hi Pawel,
As far as I know, the application attempt is incremented if the application
master fails and a new one is brought up. Therefore, what you are seeing
should not happen. I have just deployed on AWS EMR 5.17.0 (Hadoop 2.8.4) and
killed the container running the application master – the cont
Hi Borys,
I remember that another user reported a similar issue recently [1] –
attached
to the ticket you can find his log file. If I recall correctly, we concluded
that YARN returned the containers very quickly. At the time, Flink's debug
level logs were inconclusive because we did not log the re
Hi,
Could it be that you are submitting the job in attached mode, i.e., without
-d
parameter? In the "job cluster attached mode", we actually start a Flink
session cluster (and stop it again from the CLI) [1]. Therefore, in attached
mode, the config option "yarn.per-job-cluster.include-user-jar" i
Hi,
If the job is actually running and consuming from Kinesis, the log you
posted
is unrelated to your problem. To understand why the process function is not
invoked, we would need to see more of your code, or you would need to
provide
an executable example. The log only shows that all offered slo
Hi,
You are using event time but are you assigning watermarks [1]? I do not see
it
in the code.
Best,
Gary
[1]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kinesis.html#event-time-for-consumed-records
On Fri, Nov 9, 2018 at 6:58 PM Vijay Balakrishnan
wrote:
> Hi,
> An
Hi Paul,
Can you share the complete logs, or at least the logs after invoking the
cancel command?
If you want to debug it yourself, check if
MiniDispatcher#jobReachedGloballyTerminalState [1] is invoked, and see how
the
jobTerminationFuture is used.
Best,
Gary
[1]
https://github.com/apache/flin
Hi,
We only propagate the exception message but not the complete stacktrace [1].
Can you create a ticket for that?
Best,
Gary
[1]
https://github.com/apache/flink/blob/091cff3299aed4bb143619324f6ec8165348d3ae/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/AbstractRestHandler.ja
Hi Henry,
What you see in the API documentation is a schema definition and not a
sample
request. The request body should be:
{
"target-directory": "hdfs:///flinkDsl",
"cancel-job": false
}
Let me know if that helps.
Best,
Gary
On Mon, Nov 12, 2018 at 7:15 AM vino yang
commands.
>
> And also thank you for the pointer, I’ll keep tracking this problem.
>
> Best,
> Paul Lam
>
>
> 在 2018年11月10日,02:10,Gary Yao 写道:
>
> Hi Paul,
>
> Can you share the complete logs, or at least the logs after invoking the
> cancel command?
>
Hi Abhi Thakur,
We need more information to help you. What docker images are you using? Can
you share the kubernetes resource definitions? Can you share the complete
logs
of the JM and TMs? Did you follow the steps outlined in the Flink
documentation [1]?
Best,
Gary
[1]
https://ci.apache.org/pro
Hi,
You can use the YARN client to list all applications on your YARN cluster:
yarn application -list
If this does not show any running applications, the Flink cluster must have
somehow terminated. If you have YARN's log aggregation enabled, you should
be
able to view the Flink logs by runni
Hi all,
I think increasing the default value of the config option web.timeout [1] is
what you are looking for.
Best,
Gary
[1]
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/RestHandlerConfiguration.java#L76
[2]
https://github.com/apa
Hi Wei,
Did you build Flink with maven 3.2.5 as recommended in the documentation
[1]?
Also, did you use the -Pvendor-repos flag to add the cloudera repository
when
building?
Best,
Gary
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.7/flinkDev/building.html#build-flink
[2]
https://
Hi,
The API still returns the location of a completed savepoint. See the example
in the Javadoc [1].
Best,
Gary
[1]
https://github.com/apache/flink/blob/1325599153b162fc85679589cab0c2691bf398f1/flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.jav
Hi Piotr,
What was the version you were using before 1.7.1?
How do you deploy your cluster, e.g., YARN, standalone?
Can you attach full TM and JM logs?
Best,
Gary
On Fri, Jan 18, 2019 at 3:05 PM Piotr Szczepanek
wrote:
> Hello,
> we have scenario with running Data Processing jobs that generate
ld you like to have log entries with DEBUG enabled or
> INFO would be enough?
>
> Thanks,
> Piotr
>
> pt., 18 sty 2019 o 15:14 Gary Yao napisał(a):
>
>> Hi Piotr,
>>
>> What was the version you were using before 1.7.1?
>> How do you deploy your cluster, e.g.,
Hi Henry,
Can you share your pom.xml and the full stacktrace with us? It is expected
behavior that org.elasticsearch.client.RestClientBuilder is not shaded. That
class comes from the elasticsearch Java client, and we only shade its
transitive dependencies. Could it be that you have a dependency in
Hi Averell,
What Flink version are you using? Can you attach the full logs from JM and
TMs? Since Flink 1.5, the -n parameter (number of taskmanagers) should be
omitted unless you are in legacy mode [1].
> As per that screenshot, it looks like there are 2 tasks manager still
> running (one on eac
Hi Averell,
> Then I have another question: when JM cannot start/connect to the JM on
.88,
> why didn't it try on .82 where resource are still available?
When you are deploying on YARN, the TM container placement is decided by the
YARN scheduler and not by Flink. Without seeing the complete logs,
Hi Averell,
> Is there any way to avoid this? As if I run this as an AWS EMR job, the
job
> would be considered failed, while it is actually be restored
automatically by
> YARN after 10 minutes).
You are writing that it takes YARN 10 minutes to restart the application
master (AM). However, in my
Hi Tim,
There is an end-to-end test in the Flink repository that starts a job
cluster
in Kubernetes (minikube) [1]. If that does not help you, can you answer the
questions below?
What docker images are you using? Can you share the kubernetes resource
definitions? Can you share the complete logs o
Hi Averell,
That log file does not look complete. I do not see any INFO level log
messages
such as [1].
Best,
Gary
[1]
https://github.com/apache/flink/blob/46326ab9181acec53d1e9e7ec8f4a26c672fec31/flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java#L544
On Fri, Feb 1, 2019 a
Hi,
Can you define what you mean by "job logs"? For code that is run on the
cluster, i.e., JM or TM, you should add your config to log4j.properties. The
log4j-cli.properties file is only used by the Flink CLI process.
Best,
Gary
On Mon, Feb 11, 2019 at 7:39 AM simpleusr wrote:
> Hi Chesnay,
>
Hi Averell,
Logback has this feature [1] but is not enabled out of the box. You will
have
to enable the JMX agent by setting the com.sun.management.jmxremote system
property [2][3]. I have not tried this out, though.
Best,
Gary
[1] https://logback.qos.ch/manual/jmxConfig.html
[2]
https://docs.or
Hi,
Are you logging from your own operator implementations, and you expect these
log messages to end up in a file prefixed with XYZ-? If that is the case,
modifying log4j-cli.properties will not be expedient as I wrote earlier.
You should modify the log4j.properties on all hosts that are running
Hi Jins George,
This has been asked before [1]. The bottom line is that you currently cannot
pre-allocate TMs and distribute your tasks evenly. You might be able to
achieve a better distribution across hosts by configuring fewer slots in
your
TMs.
Best,
Gary
[1]
http://apache-flink-user-mailing-
Hi Averell,
The TM containers fetch the Flink binaries and config files form HDFS (or
another DFS if configured) [1]. I think you should be able to change the log
level by patching the logback configuration in HDFS, and kill all Flink
containers on all hosts. If you are running an HA setup, your c
machine(8 core), I have 4 nodes,
> that will end up 28 taskmanagers and 1 job manager. I was wondering if this
> can bring additional burden on jobmanager? Is it recommended?
>
> Thanks,
>
> Jins George
> On 2/14/19 8:49 AM, Gary Yao wrote:
>
> Hi Jins George,
>
>
Hi Henry,
If I understand you correctly, you want YARN to allocate 4 vcores per TM
container. You can achieve this by enabling the FairScheduler in YARN
[1][2].
Best,
Gary
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#yarn-containers-vcores
[2]
https://hadoop.ap
Hi,
Beginning with Flink 1.7, you cannot use the legacy mode anymore [1][2]. I
am
currently working on removing references to the legacy mode in the
documentation [3]. Is there any reason, you cannot use the "new mode"?
Best,
Gary
[1] https://flink.apache.org/news/2018/11/30/release-1.7.0.html
[
1 - 100 of 173 matches
Mail list logo