Thank you for telling me about the cause.
It recovered by restarting jobmanager-5 and jobmanager-1.
I restart jobmanager-1 because when I restarted jobmanager-5 ,
checkpointing started to
fail with the following message.
2016-07-28 10:42:28,217 WARN
org.apache.flink.runtime.checkpoint.Checkp
Hi,
We can see an occasional OOM issue with our Flink jobs. Maybe the
input got more diverse, and the grouping has much more keys, not
really sure about that part.
How do you usually tackle these issues? We are running with
parallelism between 5-30. Would it help if we turn that down?
We do set
Thanks for the logs. Looking through them it's caused by this issue:
https://issues.apache.org/jira/browse/FLINK-3800. The ExecutionGraph
(Flink's internal scheduling structure) is not terminated properly and
tries to restart the job over and over again.
This is fixed for 1.1.0. Is it an option fo
Hello users,
I rebuilt the project and now on doing mvn clean package i have got two jar
files and I can run with the fat jar in the local jvm properly. However
when executing in the cluster I get error as follows:
Source: Custom Source -> Flat Map -> Sink: Unnamed(1/1) switched to FAILED
java.la
Hello,
I am comparing Flink, Spark and some other streaming frameworks to find the
best fit for my project.
Currently we have a system, which works on single server and uses off-heap to
save data. We now want to go distributed with streaming support.
So we have designed a rough data flow for th
I just tried playing with the source paralleism setting, and I got a very
strange result:
If specify the source parallism
using env.addSource(kafka).setParallelism(N), results are printed correctly
for any number N except for N=4. I guess that's related to the number of
task slots since I have a 4
Hi Kostas,
When I remove the window and the apply() and put print() after
assignTimestampsAndWatermarks,
the messages are printed correctly:
2> Request{ts=2015-01-01, 06:15:34:000}
2> Request{ts=2015-01-02, 16:38:10:000}
2> Request{ts=2015-01-02, 18:58:41:000}
2> Request{ts=2015-01-02, 19:10:00:0
Which version of Flink are you running on? I think this might have
been fixed for the 1.1 release
(http://people.apache.org/~uce/flink-1.1.0-rc1/).
It looks like the ExecutionGraph is still trying to restart although
the JobManager is not the leader anymore. If you could provide the
complete logs
Hey Gary,
your configuration looks good to me. I think that it could be an issue
with S3 as you suggest. It might help to decrease the checkpointing
interval (if you use case requirements allow for this) in order to
have less interaction with S3. In general, your program should still
continue as e
Hi Kostas,
Thank you very much for the explanation.
Best,
Yassine
On Wed, Jul 27, 2016 at 1:09 PM, Kostas Kloudas wrote:
> Hi Yassine,
>
> When the WindowFunction is applied to the content of a window, the
> timestamp of the resulting record
> is the window.maxTimestamp, which is the endOfWind
Hello,
I have standalone Flink cluster with JobManager HA.
Last night, JobManager failovered because of the connection timeout to
Zookeeper.
Job is successfully running under new leader JobManager, but when
I see the old leader JobManager log, it is trying to re-submit job and
getting errors. ( fo
Yes, the back pressure behaviour you describe is correct. With
checkpointing enable, the job should resume as soon as the sink can
contact the backend service again (you would see that the job fails
many times until the service is live again, but at the end it should
work). You can control the rest
Hi Yassine,
Could you just remove the window and the apply, and just put a print() after
the:
> .assignTimestampsAndWatermarks(new AscendingTimestampExtractor() {
> @Override
> public long extractAscendingTimestamp(Request req) {
> return req.ts;
> }
> })
This at least will
Hi Yassine,
When the WindowFunction is applied to the content of a window, the timestamp of
the resulting record
is the window.maxTimestamp, which is the endOfWindow-1.
You can imaging if you have a Tumbling window from 0 to 2000, the result will
have a timestamp of 1999.
Window boundaries are
Hi Gábor, hi Ufuk, hi Greg,
thank you for your very helpful responses!
> You can try to make your `RichGroupReduceFunction` implement the
> `GroupCombineFunction` interface, so that Flink can do combining
> before the shuffle, which might significantly reduce the network load.
> (How much the
Dear Flink community,
Please vote on releasing the following candidate as Apache Flink version 1.1.0.
I've CC'd user@flink.apache.org as users are encouraged to help
testing Flink 1.1.0 for their specific use cases. Please feel free to
report issues and successful tests on d...@flink.apache.org.
Hi all,
I am using the filesystem state backend with checkpointing to S3.
From the JobManager logs, I can see that it works most of the time, e.g.,
2016-07-26 17:49:07,311 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering
checkpoint 3 @ 1469555347310
2016-07-26 17
Hi,
I was wondering how Flink's fault tolerance works, because this page
is short on the details:
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/fault_tolerance.html
My environment has a backend service that may be out for a couple of
hours (sad, but working on fixing that). I
18 matches
Mail list logo