Hi Seed,

Sorry for confusion, I see now it is separate. Back pressure should still
be created because internal async queue has capacity
but not sure about reporting problem, Ken and Till probably have better
idea.

As for consumption speed up, async operator creates another thread to
collect the result and Cassandra sink probably uses that thread to write
data.
This might parallelize and pipeline previous steps like Kafka fetching and
Cassandra IO but I am also not sure about this explanation.

Best,
Andrey


On Tue, Mar 19, 2019 at 8:05 PM Ken Krugler <[email protected]>
wrote:

> Hi Seed,
>
> I was assuming the Cassandra sink was separate from and after your async
> function.
>
> I was trying to come up for an explanation as to why adding the async
> function would improve your performance.
>
> The only very unlikely reason I thought of was that the async function
> somehow caused data arriving at the sink to be more “batchy”, which (if the
> Cassandra sink had an “every x seconds do a write” batch mode) could
> improve performance.
>
> — Ken
>
> On Mar 19, 2019, at 11:35 AM, Seed Zeng <[email protected]> wrote:
>
> Hi Ken and Andrey,
>
> Thanks for the response. I think there is a confusion that the writes to
> Cassandra are happening within the Async function.
> In my test, the async function is just a pass-through without doing any
> work.
>
> So any Cassandra related batching or buffering should not be the cause for
> this.
>
> Thanks,
>
> Seed
>
> On Tue, Mar 19, 2019 at 12:35 PM Ken Krugler <[email protected]>
> wrote:
>
>> Hi Seed,
>>
>> It’s a known issue that Flink doesn’t report back pressure properly for
>> AsyncFunctions, due to how it monitors the output collector to gather back
>> pressure statistics.
>>
>> But that wouldn’t explain how you get a faster processing with the
>> AsyncFunction inserted into your workflow.
>>
>> I haven’t looked at how the Cassandra sink handles batching, but if the
>> AsyncFunction somehow caused fewer, bigger Cassandra writes to happen then
>> that’s one (serious hand waving) explanation.
>>
>> — Ken
>>
>> On Mar 18, 2019, at 7:48 PM, Seed Zeng <[email protected]> wrote:
>>
>> Flink Version - 1.6.1
>>
>> In our application, we consume from Kafka and sink to Cassandra in the
>> end. We are trying to introduce a custom async function in front of the
>> Sink to carry out some customized operations. In our testing, it appears
>> that the Async function is not generating backpressure to slow down our
>> Kafka Source when Cassandra becomes unhappy. Essentially compared to an
>> almost identical job where the only difference is the lack of the Async
>> function, Kafka source consumption speed is much higher under the same
>> settings and identical Cassandra cluster. The experiment is like this.
>>
>> Job 1 - without async function in front of Cassandra
>> Job 2 - with async function in front of Cassandra
>>
>> Job 1 is backpressured because Cassandra cannot handle all the writes and
>> eventually slows down the source rate to 6.5k/s.
>> Job 2 is slightly backpressured but was able to run at 14k/s.
>>
>> Is the AsyncFunction somehow not reporting the backpressure correctly?
>>
>> Thanks,
>> Seed
>>
>>
>> --------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://www.scaleunlimited.com
>> Custom big data solutions & training
>> Flink, Solr, Hadoop, Cascading & Cassandra
>>
>>
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
>
>

Reply via email to