Hi Seed,

when you create `AsyncDataStream.(un)orderedWait` which capacity do you
pass in or you use the default one (100)?

Best,
Andrey

On Thu, Mar 21, 2019 at 2:49 AM Seed Zeng <[email protected]> wrote:

> Hey Andrey and Ken,
> Sorry about the late reply. I might not have been clear in my question
> The performance of writing to Cassandra is the same in both cases, only
> that the source rate was higher in the case of the async function is
> present.
> Something is "buffering" and not propagating backpressure to slow down the
> source speed from Kafka.
>
> In our use case, we prefer the backpressure to slow down the source so
> that the write to Cassandra is not delayed while the source is consuming
> fast.
>
> Thanks,
> Seed
>
> On Wed, Mar 20, 2019 at 9:38 AM Andrey Zagrebin <[email protected]>
> wrote:
>
>> Hi Seed,
>>
>> Sorry for confusion, I see now it is separate. Back pressure should still
>> be created because internal async queue has capacity
>> but not sure about reporting problem, Ken and Till probably have better
>> idea.
>>
>> As for consumption speed up, async operator creates another thread to
>> collect the result and Cassandra sink probably uses that thread to write
>> data.
>> This might parallelize and pipeline previous steps like Kafka fetching
>> and Cassandra IO but I am also not sure about this explanation.
>>
>> Best,
>> Andrey
>>
>>
>> On Tue, Mar 19, 2019 at 8:05 PM Ken Krugler <[email protected]>
>> wrote:
>>
>>> Hi Seed,
>>>
>>> I was assuming the Cassandra sink was separate from and after your async
>>> function.
>>>
>>> I was trying to come up for an explanation as to why adding the async
>>> function would improve your performance.
>>>
>>> The only very unlikely reason I thought of was that the async function
>>> somehow caused data arriving at the sink to be more “batchy”, which (if the
>>> Cassandra sink had an “every x seconds do a write” batch mode) could
>>> improve performance.
>>>
>>> — Ken
>>>
>>> On Mar 19, 2019, at 11:35 AM, Seed Zeng <[email protected]> wrote:
>>>
>>> Hi Ken and Andrey,
>>>
>>> Thanks for the response. I think there is a confusion that the writes to
>>> Cassandra are happening within the Async function.
>>> In my test, the async function is just a pass-through without doing any
>>> work.
>>>
>>> So any Cassandra related batching or buffering should not be the cause
>>> for this.
>>>
>>> Thanks,
>>>
>>> Seed
>>>
>>> On Tue, Mar 19, 2019 at 12:35 PM Ken Krugler <
>>> [email protected]> wrote:
>>>
>>>> Hi Seed,
>>>>
>>>> It’s a known issue that Flink doesn’t report back pressure properly for
>>>> AsyncFunctions, due to how it monitors the output collector to gather back
>>>> pressure statistics.
>>>>
>>>> But that wouldn’t explain how you get a faster processing with the
>>>> AsyncFunction inserted into your workflow.
>>>>
>>>> I haven’t looked at how the Cassandra sink handles batching, but if the
>>>> AsyncFunction somehow caused fewer, bigger Cassandra writes to happen then
>>>> that’s one (serious hand waving) explanation.
>>>>
>>>> — Ken
>>>>
>>>> On Mar 18, 2019, at 7:48 PM, Seed Zeng <[email protected]> wrote:
>>>>
>>>> Flink Version - 1.6.1
>>>>
>>>> In our application, we consume from Kafka and sink to Cassandra in the
>>>> end. We are trying to introduce a custom async function in front of the
>>>> Sink to carry out some customized operations. In our testing, it appears
>>>> that the Async function is not generating backpressure to slow down our
>>>> Kafka Source when Cassandra becomes unhappy. Essentially compared to an
>>>> almost identical job where the only difference is the lack of the Async
>>>> function, Kafka source consumption speed is much higher under the same
>>>> settings and identical Cassandra cluster. The experiment is like this.
>>>>
>>>> Job 1 - without async function in front of Cassandra
>>>> Job 2 - with async function in front of Cassandra
>>>>
>>>> Job 1 is backpressured because Cassandra cannot handle all the writes
>>>> and eventually slows down the source rate to 6.5k/s.
>>>> Job 2 is slightly backpressured but was able to run at 14k/s.
>>>>
>>>> Is the AsyncFunction somehow not reporting the backpressure correctly?
>>>>
>>>> Thanks,
>>>> Seed
>>>>
>>>>
>>>> --------------------------
>>>> Ken Krugler
>>>> +1 530-210-6378
>>>> http://www.scaleunlimited.com
>>>> Custom big data solutions & training
>>>> Flink, Solr, Hadoop, Cascading & Cassandra
>>>>
>>>>
>>> --------------------------
>>> Ken Krugler
>>> +1 530-210-6378
>>> http://www.scaleunlimited.com
>>> Custom big data solutions & training
>>> Flink, Solr, Hadoop, Cascading & Cassandra
>>>
>>>

Reply via email to