Re: Subtask much slower than the others when creating checkpoints

Till Rohrmann Tue, 15 Jan 2019 05:06:20 -0800

Hi Pasquale,

if you configured a checkpoint directory, then the MemoryStateBackend will
also write the checkpoint data to disk in order to persist it.


Cheers,
Till

On Tue, Jan 15, 2019 at 1:08 PM Pasquale Vazzana <p.vazz...@mwam.com> wrote:

> I can send you some debug logs and the execution plan, can I use your
> personal email? There might be sensitive info in the logs.
>
>
>
> Incoming and Outgoing records are fairly distributed across subtasks, with
> similar but alternate loads, when the checkpoint is triggered, the load
> drops to nearly zero, all the fetch requests sent to kafka (2.0.1) time out
> and often the clients disconnect from the brokers.
>
> Both source topics are 30 partitions each, they get keyed, connected and
> co-processed.
>
> I am checkpointing with EOS, as I said I’ve tried all the backend with
> either DELETE_ON_CANCELLATION or RETAIN_ON_CANCELLATION. I assume that
> using the MemoryStateBackend and CANCELLATION should remove any possibility
> of disk/IO congestions, am I wrong?.
>
>
>
> Pasquale
>
>
>
> *From:* Till Rohrmann <trohrm...@apache.org>
> *Sent:* 15 January 2019 10:33
> *To:* Pasquale Vazzana <p.vazz...@mwam.com>
> *Cc:* Bruno Aranda <bara...@apache.org>; user <user@flink.apache.org>
> *Subject:* Re: Subtask much slower than the others when creating
> checkpoints
>
>
>
> Same here Pasquale, the logs on DEBUG log level could be helpful. My guess
> would be that the respective tasks are overloaded or there is some resource
> congestion (network, disk, etc).
>
>
>
> You should see in the web UI the number of incoming and outgoing events.
> It would be good to check that the events are similarly sized and can be
> computed in roughly the same time.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Mon, Jan 14, 2019 at 4:07 PM Pasquale Vazzana <p.vazz...@mwam.com>
> wrote:
>
> I have the same problem, even more impactful. Some subtasks stall forever
> quite consistently.
> I am using Flink 1.7.1, but I've tried downgrading to 1.6.3 and it didn't
> help.
> The Backend doesn't seem to make any difference, I've tried Memory, FS and
> RocksDB back ends but nothing changes. I've also tried to change the
> medium, local spinning disk, SAN or mounted fs but nothing helps.
> Parallelism is the only thing which mitigates the stalling, when I set 1
> everything works but if I increase the number of parallelism then
> everything degrades, 10 makes it very slow 30 freezes it.
> It's always one of two subtasks, most of them does the checkpoint in few
> milliseconds but there is always at least one which stalls for minutes
> until it times out. The Alignment seems to be a problem.
> I've been wondering whether some Kafka partitions where empty but there is
> not much data skew and the keyBy uses the same key strategy as the Kafka
> partitions, I've tried to use murmur2 for hashing but it didn't help either.
> The subtask that seems causing problems seems to be a CoProcessFunction.
> I am going to debug Flink but since I'm relatively new to it, it might
> take a while so any help will be appreciated.
>
> Pasquale
>
>
> From: Till Rohrmann <trohrm...@apache.org>
> Sent: 08 January 2019 17:35
> To: Bruno Aranda <bara...@apache.org>
> Cc: user <user@flink.apache.org>
> Subject: Re: Subtask much slower than the others when creating checkpoints
>
> Hi Bruno,
>
> there are multiple reasons wh= one of the subtasks can take longer for
> checkpointing. It looks as if the=e is not much data skew since the state
> sizes are relatively equal. It als= looks as if the individual tasks all
> start at the same time with the chec=pointing which indicates that there
> mustn't be a lot of back-pressure =n the DAG (or all tasks were equally
> back-pressured). This narrows the pro=lem cause down to the asynchronous
> write operation. One potential problem =ould be if the external system to
> which you write your checkpoint data has=some kind of I/O limit/quota.
> Maybe the sum of write accesses deplete the =aximum quota you have. You
> could try whether running the job with a lower =arallelism solves the
> problems.
>
> For further debug=ing it could be helpful to get access to the logs of the
> JobManager and th= TaskManagers on DEBUG log level. It could also be
> helpful to learn which =tate backend you are using.
>
> Cheers,
> Til=
>
> On Tue, Jan 8,=2019 at 12:52 PM Bruno Aranda <mailto:bara...@apache.org>
> wrote:
> Hi,
>
> We are using Flink =.6.1 at the moment and we have a streaming job
> configured to create a chec=point every 10 seconds. Looking at the
> checkpointing times in the UI, we c=n see that one subtask is much slower
> creating the endpoint, at least in i=s "End to End Duration", and seems
> caused by a longer "Chec=point Duration (Async)".
>
> For instance, in th= attach screenshot, while most of the subtasks take
> half a second, one (an= it is always one) takes 2 seconds.
>
> But we have w=rse problems. We have seen cases where the checkpoint times
> out for one ta=ks, while most take one second, the outlier takes more than
> 5 minutes (whi=h is the max time we allow for a checkpoint). This can
> happen if there is =ack pressure. We only allow one checkpoint at a time as
> well.
> Why could one subtask take more time? This jobs read from kafk= partitions
> and hash by key, and we don't see any major data skew betw=en the
> partitions. Does one partition do more work?
>
> We do have a cluster of 20 machines, in EMR, with TMs that have
> multiple=slots (in legacy mode).
>
> Is this something that co=ld have been fixed in a more recent version?
>
> Than=s for any insight!
>
> Bruno
>
>
> This e-mail and any attachments are confidential to the addressee(s) and
> may contain information that is legally privileged and/or confidential.
> Please refer to http://www.mwam.com/email-disclaimer-uk for important
> disclosures regarding this email. If we collect and use your personal data
> we will use it in accordance with our privacy policy, which can be reviewed
> at https://www.mwam.com/privacy-policy .
>
> Marshall Wace LLP is authorised and regulated by the Financial Conduct
> Authority. Marshall Wace LLP is a limited liability partnership registered
> in England and Wales with registered number OC302228 and registered office
> at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving
> this e-mail as a client, or an investor in an investment vehicle, managed
> or advised by Marshall Wace North America L.P., the sender of this e-mail
> is communicating with you in the sender's capacity as an associated person
> of and on behalf of Marshall Wace North America L.P., which is registered
> with the US Securities and Exchange Commission as an investment adviser.
>
>
>
> This e-mail and any attachments are confidential to the addressee(s) and
> may contain information that is legally privileged and/or confidential.
> Please refer to http://www.mwam.com/email-disclaimer-uk for important
> disclosures regarding this email. If we collect and use your personal data
> we will use it in accordance with our privacy policy, which can be reviewed
> at https://www.mwam.com/privacy-policy.
>
> Marshall Wace LLP is authorised and regulated by the Financial Conduct
> Authority. Marshall Wace LLP is a limited liability partnership registered
> in England and Wales with registered number OC302228 and registered office
> at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving
> this e-mail as a client, or an investor in an investment vehicle, managed
> or advised by Marshall Wace North America L.P., the sender of this e-mail
> is communicating with you in the sender's capacity as an associated person
> of and on behalf of Marshall Wace North America L.P., which is registered
> with the US Securities and Exchange Commission as an investment adviser.
>

Re: Subtask much slower than the others when creating checkpoints

Reply via email to