Re: Checkpoint

Stephan Ewen Thu, 10 Mar 2016 02:52:26 -0800

Just to be sure: Is the task whose backpressure you want to monitor the
Kafka Source?


There is an open issue that backpressure monitoring does not work for the
Kafka Source: https://issues.apache.org/jira/browse/FLINK-3456

To circumvent that, add an "IdentityMapper" after the Kafka source, make
sure it is non-chained, and monitor the backpressure on that MapFunction.

Greetings,
Stephan


On Thu, Mar 10, 2016 at 11:23 AM, Robert Metzger <rmetz...@apache.org>
wrote:

> Hi Vijay,
>
> regarding your other questions:
>
> 1) On the TaskManagers, the FlinkKafkaConsumers will write the partitions
> they are going to read in the log. There is currently no way of seeing the
> state of a checkpoint in Flink (which is the offsets).
> However, once a checkpoint is completed, the Kafka consumer is committing
> the offset to the Kafka broker. (I could not find tool to get the committed
> offsets from the broker, but its either stored in ZK or in a special topic
> by the broker. In Kafka 0.8 that's easily doable with the
> kafka.tools.ConsumerOffsetChecker)
>
> 2) Do you see duplicate data written by the rolling file sink? Or do you
> see it somewhere else?
> HDP 2.4 is using Hadoop 2.7.1 so the truncate() of invalid data should
> actually work properly.
>
>
>
>
>
> On Thu, Mar 10, 2016 at 10:44 AM, Ufuk Celebi <u...@apache.org> wrote:
>
>> How many vertices does the web interface show and what parallelism are
>> you running? If the sleeping operator is chained you will not see
>> anything.
>>
>> If your goal is to just see some back pressure warning, you can call
>> env.disableOperatorChaining() and re-run the program. Does this work?
>>
>> – Ufuk
>>
>>
>> On Thu, Mar 10, 2016 at 1:35 AM, Vijay Srinivasaraghavan
>> <vijikar...@yahoo.com> wrote:
>> > Hi Ufuk,
>> >
>> > I have increased the sampling size to 1000 and decreased the refresh
>> > interval by half. In my Kafka topic I have pumped million messages
>> which is
>> > read by KafkaConsumer pipeline and then pass it to a transofmation step
>> > where I have introduced sleep (3 sec) for every single message received
>> and
>> > the final step is HDFS sink using RollingSinc API.
>> >
>> > jobmanager.web.backpressure.num-samples: 1000
>> > jobmanager.web.backpressure.refresh-interval: 30000
>> >
>> >
>> > I was hoping to see the backpressure tab from UI to display some
>> warning but
>> > I still see "OK" message.
>> >
>> > This makes me wonder if I am testing the backpressure scenario properly
>> or
>> > not?
>> >
>> > Regards
>> > Vijay
>> >
>> > On Monday, March 7, 2016 3:19 PM, Ufuk Celebi <u...@apache.org> wrote:
>> >
>> >
>> > Hey Vijay!
>> >
>> > On Mon, Mar 7, 2016 at 8:42 PM, Vijay Srinivasaraghavan
>> > <vijikar...@yahoo.com> wrote:
>> >> 3) How can I simulate and verify backpressure? I have introduced some
>> >> delay
>> >> (Thread Sleep) in the job before the sink but the "backpressure" tab
>> from
>> >> UI
>> >> does not show any indication of whether backpressure is working or not.
>> >
>> > If a task is slow, it is back pressuring upstream tasks, e.g. if your
>> > transformations have the sleep, the sources should be back pressured.
>> > It can happen that even with the sleep the tasks still produce their
>> > data as fast as they can and hence no back pressure is indicated in
>> > the web interface. You can increase the sleep to check this.
>> >
>> > The mechanism used to determine back pressure is based on sampling the
>> > stack traces of running tasks. You can increase the number of samples
>> > and/or decrease the delay between samples via config parameters shown
>> > in [1]. It can happen that the samples miss the back pressure
>> > indicators, but usually the defaults work fine.
>> >
>> >
>> > [1]
>> >
>> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#jobmanager-web-frontend
>> >
>> >
>> >
>>
>
>

Re: Checkpoint

Reply via email to