Re: Flink application has slightly data loss using Processing Time

2021-03-22 Thread Rainie Li
I will try that. Thanks for your help, David. Best regards Rainie On Sat, Mar 20, 2021 at 9:46 AM David Anderson wrote: > You should increase the kafka transaction timeout -- > transaction.max.timeout.ms -- to something much larger than the default, > which I believe is 15 minutes. Suitable val

Re: Flink application has slightly data loss using Processing Time

2021-03-20 Thread David Anderson
You should increase the kafka transaction timeout -- transaction.max.timeout.ms -- to something much larger than the default, which I believe is 15 minutes. Suitable values are more on the order of a few hours to a few days -- long enough to allow for any conceivable outage. This way, if a request

Re: Flink application has slightly data loss using Processing Time

2021-03-19 Thread Rainie Li
Hi Arvid, After increasing producer.kafka.request.timeout.ms from 9 to 12. The job has been running fine for almost 5 days, but one of the tasks failed again recently for the same timeout error. (attached stack trace below) Should I keep increasing producer.kafka.request.timeout.ms value?

Re: Flink application has slightly data loss using Processing Time

2021-03-11 Thread Rainie Li
Thanks for the suggestion, Arvid. Currently my job is using producer.kafka.request.timeout.ms=9 I will try to increase to 12. Best regards Rainie On Thu, Mar 11, 2021 at 3:58 AM Arvid Heise wrote: > Hi Rainie, > > This looks like the record batching in Kafka producer timed out. At this

Re: Flink application has slightly data loss using Processing Time

2021-03-11 Thread Arvid Heise
Hi Rainie, This looks like the record batching in Kafka producer timed out. At this point, the respective records are lost forever. You probably want to tweak your Kafka settings [1]. Usually, Flink should fail and restart at this point and recover without data loss. However, if the transactions

Re: Flink application has slightly data loss using Processing Time

2021-03-08 Thread Rainie Li
Thanks for the info, David. The job has checkpointing. I saw some tasks failed due to "org.apache.flink.streaming.connectors.kafka.FlinkKafkaException: Failed to send data to Kafka" Here is stacktrack from JM log: container_e17_1611597945897_8007_01_000240 @ worker-node-host (dataPort=42321). 2021

Re: Flink application has slightly data loss using Processing Time

2021-03-08 Thread David Anderson
Rainie, A restart after a failure can cause data loss if you aren't using checkpointing, or if you experience a transaction timeout. A manual restart can also lead to data loss, depending on how you manage the offsets, transactions, and other state during the restart. What happened in this case?

Re: Flink application has slightly data loss using Processing Time

2021-03-08 Thread Rainie Li
Thanks Yun and David. There were some tasks that got restarted. We configured the restart policy and the job didn't fail. Will task restart cause data loss? Thanks Rainie On Mon, Mar 8, 2021 at 10:42 AM David Anderson wrote: > Rainie, > > Were there any failures/restarts, or is this discrepanc

Re: Flink application has slightly data loss using Processing Time

2021-03-08 Thread David Anderson
Rainie, Were there any failures/restarts, or is this discrepancy observed without any disruption to the processing? Regards, David On Mon, Mar 8, 2021 at 10:14 AM Rainie Li wrote: > Thanks for the quick response, Smile. > I don't use window operators or flatmap. > Here is the core logic of my

Re: Re: Flink application has slightly data loss using Processing Time

2021-03-08 Thread Yun Gao
Hi Rainie, From the code it seems the current problem does not use the time-related functionality like window/timer? If so, the problem would be indepdent with the time type used. Also, it would not likely due to rebalance() since the network layer has the check of sequence number. If there are

Re: Flink application has slightly data loss using Processing Time

2021-03-08 Thread Rainie Li
Thanks for the quick response, Smile. I don't use window operators or flatmap. Here is the core logic of my filter, it only iterates on filters list. Will *rebalance() *cause it? Thanks again. Best regards Rainie SingleOutputStreamOperator> matchedRecordsStream = eventStream .rebalanc

Re: Flink application has slightly data loss using Processing Time

2021-03-08 Thread Smile
Hi Rainie, Could you please provide more information about your processing logic? Do you use window operators? If there's no time-based operator in your logic, late arrival data won't be dropped by default and there might be something wrong with your flat map or filter operator. Otherwise, you ca