Hi Nicolaus,
I double checked again our hdfs config, it is setting 1 instead of 2.
I will try the solution you provided.
Thanks again.
Best regards
Rainie
On Wed, Aug 4, 2021 at 10:40 AM Rainie Li wrote:
> Thanks for the context Nicolaus.
> We are using S3 instead of HDFS.
>
>
tacktrace:
> https://stackoverflow.com/questions/64400280/flink-unable-to-recover-after-yarn-node-termination
> Do you replicate data on multiple hdfs nodes like suggested in the answer
> there?
>
> Best,
> Nico
>
> On Wed, Aug 4, 2021 at 9:24 AM Rainie Li wrote:
>
>> Thanks Ti
he actively
> maintained Flink versions (1.12 or 1.13) and try whether it works with this
> version.
>
> Cheers,
> Till
>
> On Tue, Aug 3, 2021 at 9:56 AM Rainie Li wrote:
>
>> Hi Flink Community,
>>
>> My flink application is running version 1.9 and it faile
Hi Flink Community,
My flink application is running version 1.9 and it failed to recover
(application was running but checkpoint failed and job stopped to process
data) during hadoop yarn node termination.
*Here is job manager log error:*
*2021-07-26 18:02:58,605 INFO
org.apache.hadoop.io.retry.
I see, then it passed longer than 5 mins.
Thanks for the help.
Best regards
Rainie
On Tue, Jun 29, 2021 at 12:29 AM Chesnay Schepler
wrote:
> How much time has passed between the requests? (You can only query the
> status for about 5 minutes)
>
> On 6/29/2021 6:37 AM, Rai
epoint.
> The meta information for such requests is only stored locally on each JM
> and neither distributed to all JMs nor persisted anywhere.
>
> Did you send both requests ( the ones for creating a savepoint and one for
> querying the status) to the same JM?
>
> On 6/26/202
Hi Flink Community,
I found this error when I tried to create a savepoint for my flink job.
It's in version 1.9.
{
"errors": [
"Operation not found under key:
org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@57b9711e"
]
}
Here is error from JM log:
2021-06-2
of a bigger issue?) addressed in Flink 1.13 (see
> FLINK-21066). I'm pulling Yun Gao into this thread. Let's see whether Yun
> can confirm that finding.
>
> I hope that helps.
> Matthias
>
> [1] https://issues.apache.org/jira/browse/FLINK-21066
>
> On Mon, May 3
Hi Flink Community,
Our flink jobs are in version 1.11 and we use this to trigger savepoint.
$ bin/flink savepoint :jobId [:targetDirectory]
We can get trigger Id with savepoint path successfully.
But we saw these errors by querying savepoint endpoint:
https://ci.apache.org/projects/flink/flink-d
data.
>
> Regards,
> David
>
>
>
> On Sat, Mar 20, 2021 at 12:02 AM Rainie Li wrote:
>
>> Hi Arvid,
>>
>> After increasing producer.kafka.request.timeout.ms from 9 to 12.
>> The job has been running fine for almost 5 days, but one of the tasks
1 at 7:12 AM Rainie Li wrote:
> Thanks for the suggestion, Arvid.
> Currently my job is using producer.kafka.request.timeout.ms=9
> I will try to increase to 12.
>
> Best regards
> Rainie
>
> On Thu, Mar 11, 2021 at 3:58 AM Arvid Heise wrote:
>
>> Hi R
loss. So you probably also want to increase the
> transaction timeout.
>
> [1]
> https://stackoverflow.com/questions/53223129/kafka-producer-timeoutexception
>
> On Mon, Mar 8, 2021 at 8:34 PM Rainie Li wrote:
>
>> Thanks for the info, David.
>>
loss, depending on how you manage
> the offsets, transactions, and other state during the restart. What
> happened in this case?
>
> David
>
> On Mon, Mar 8, 2021 at 7:53 PM Rainie Li wrote:
>
>> Thanks Yun and David.
>> There were some tasks that got restarted. We
ts, or is this discrepancy observed without
> any disruption to the processing?
>
> Regards,
> David
>
> On Mon, Mar 8, 2021 at 10:14 AM Rainie Li wrote:
>
>> Thanks for the quick response, Smile.
>> I don't use window operators or flatmap.
>> Here is the core log
Thanks for the quick response, Smile.
I don't use window operators or flatmap.
Here is the core logic of my filter, it only iterates on filters list.
Will *rebalance()
*cause it?
Thanks again.
Best regards
Rainie
SingleOutputStreamOperator> matchedRecordsStream =
eventStream
.rebalanc
Hello Flink Community,
Our flink application in v1.9, the basic logic of this application is
consuming one large kafka topic and filter some fields, then produce data
to a new kafka topic.
After comparing the original kafka topic count with the generated kafka
topic based on the same field by usin
ou think the issue is due to high latency in communicating with your
> Kafka cluster, increase your configured timeouts.
>
>
>
> If you think the issue is due to (possibly temporary) dynamic config
> changes, check your error handling. Retrying the fetch from the Flink side
> wil
destroyed." in your
> case. It looks like you had some timeout problem while fetching data from a
> Kafka topic.
>
> Matthias
>
> On Tue, Mar 2, 2021 at 10:39 AM Rainie Li wrote:
>
>> Thanks for checking, Matthias.
>>
>> I have another flink job which
handling the operator-related network buffer is destroyed. That
> causes the "java.lang.RuntimeException: Buffer pool is destroyed." in your
> case. It looks like you had some timeout problem while fetching data from a
> Kafka topic.
>
> Matthias
>
> On Tue, Mar 2, 2021 at
problem between the TaskManager and the JobManager that caused the shutdown
>> of the operator resulting in the NetworkBufferPool to be destroyed. For
>> this scenario I would expect other failures to occur besides the ones you
>> shared.
>>
>> Best,
>> Ma
Matthias
>
>
> On Fri, Feb 26, 2021 at 8:39 AM Rainie Li wrote:
>
>> Hi All,
>>
>> Our flink application kept restarting and it did lots of RPC calls to a
>> dependency service.
>>
>> *We saw this exception from failed task manager log: *
>> o
Hi All,
Our flink application kept restarting and it did lots of RPC calls to a
dependency service.
*We saw this exception from failed task manager log: *
org.apache.flink.streaming.runtime.tasks.ExceptionInChainedOperatorException:
Could not forward element to next operator
at
org.apache.flink.s
right before that time.
>
> In both cases, Flink should restart the job with the correct restart
> policies if configured.
>
> On Sat, Feb 20, 2021 at 10:07 PM Rainie Li wrote:
>
>> Hello,
>>
>> I launched a job with a larger load on hadoop yarn cluster.
>&g
Hello,
I launched a job with a larger load on hadoop yarn cluster.
The Job finished after running 5 hours, I didn't find any error from
JobManger log besides this connect exception.
*2021-02-20 13:20:14,110 WARN akka.remote.transport.netty.NettyTransport
- Remote connection
than 1.
>
>
> BTW, the logs you provided are not Yarn NodeManager logs. And if you could
> provide the full jobmanager
> log, it will help a lot.
>
>
>
> Best,
> Yang
>
> Rainie Li 于2020年7月22日周三 下午3:54写道:
>
>> Hi Flink help,
>>
>> I a
Hi Flink help,
I am new to Flink.
I am investigating one flink app that cannot restart when we lose yarn node
manager (tc.yarn.rm.cluster.NumActiveNMs=0), while other flink apps can
restart automatically.
*Here is job's restartPolicy setting:*
*env.setRestartStrategy(RestartStrategies.fixedDelay
Rainie Li wrote:
> Thank you, Jesse.
>
> Here are more log info:
>
> 2020-07-15 18:19:36,456 INFO org.apache.flink.client.cli.CliFrontend
> -
>
> 2020
end more
> of the log or find an error line that might help others debug.
>
>
>
> Thanks,
>
> Jesse
>
>
>
> *From: *Rainie Li
> *Date: *Wednesday, July 15, 2020 at 10:54 AM
> *To: *"user@flink.apache.org"
> *Subject: *flink app crashed
>
>
Hi All,
I am new to Flink, any idea why flink app's Job Manager stuck, here is
bottom part from the Job Manager log. Any suggestion will be appreciated.
2020-07-15 16:49:52,749 INFO
org.apache.flink.runtime.rpc.akka.AkkaRpcService - Starting RPC endpoint
for org.apache.flink.runtime.dispatcher.Sta
29 matches
Mail list logo