Hi,

Does this seem like it would help?

Thanks!

On Tue, Nov 10, 2020 at 10:38 PM Rex Fenley <r...@remind101.com> wrote:

> Thanks! We did give that a shot and ran into the bug that I reported here
> https://issues.apache.org/jira/browse/FLINK-20036 .
>
> I'm also seeing this function
>
>   public void emitUpdateWithRetract(ACC accumulator, RetractableCollector<T> 
> out); // OPTIONAL
>
> and it says it's more performant in some cases vs
>
>   public void emitValue(ACC accumulator, Collector<T> out); // OPTIONAL
>
> . I'm having some trouble understanding in which cases it benefits
> performance and if it would help our case. Would using
> `emitUpdateWithRetract` instead of `emitValue` reduce the number of
> retracts we're seeing yet preserve the same end results, where our
> Elasticsearch documents stay up to date?
>
> On Sun, Nov 8, 2020 at 6:43 PM Jark Wu <imj...@gmail.com> wrote:
>
>> Hi Rex,
>>
>> There is a similar question asked recently which I think is the same
>> reason [1] called retraction amplification.
>> You can try to turn on the mini-batch optimization to reduce the
>> retraction amplification.
>>
>> Best,
>> Jark
>>
>> [1]:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/A-question-about-flink-sql-retreact-stream-td39216.html
>> [2]:
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tuning/streaming_aggregation_optimization.html#minibatch-aggregation
>>
>> On Fri, 6 Nov 2020 at 03:56, Rex Fenley <r...@remind101.com> wrote:
>>
>>> Also, just to be clear our ES connector looks like this:
>>>
>>> CREATE TABLE sink_es_groups (
>>> id BIGINT,
>>> //.. a bunch of scalar fields
>>> array_of_ids ARRAY<BIGINT NOT NULL>,
>>> PRIMARY KEY (id) NOT ENFORCED
>>> ) WITH (
>>> 'connector' = 'elasticsearch-7',
>>> 'hosts' = '${env:ELASTICSEARCH_HOSTS}',
>>> 'index' = '${env:GROUPS_ES_INDEX}',
>>> 'format' = 'json',
>>> 'sink.bulk-flush.max-actions' = '512',
>>> 'sink.bulk-flush.max-size' = '1mb',
>>> 'sink.bulk-flush.interval' = '5000',
>>> 'sink.bulk-flush.backoff.delay' = '1000',
>>> 'sink.bulk-flush.backoff.max-retries' = '4',
>>> 'sink.bulk-flush.backoff.strategy' = 'CONSTANT'
>>> )
>>>
>>>
>>> On Thu, Nov 5, 2020 at 11:52 AM Rex Fenley <r...@remind101.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm using the Table API to do a bunch of stateful transformations on
>>>> CDC Debezium rows and then insert final documents into Elasticsearch via
>>>> the ES connector.
>>>>
>>>> I've noticed that Elasticsearch is constantly deleting and then
>>>> inserting documents as they update. Ideally, there would be no delete
>>>> operation for a row update, only for a delete. I'm using the Elasticsearch
>>>> 7 SQL connector, which I'm assuming uses `Elasticsearch7UpsertTableSink`
>>>> under the hood, which implies upserts are actually what it's capable of.
>>>>
>>>> Therefore, I think it's possibly my table plan that's causing row
>>>> upserts to turn into deletes + inserts. My plan is essentially a series of
>>>> Joins and GroupBys + UDF Aggregates (aggregating arrays of data). I think,
>>>> possibly the UDF Aggs following the Joins + GroupBys are causing the
>>>> upserts to split into delete + inserts somehow. If this is correct, is it
>>>> possible to make UDFs that preserve Upserts? Or am I totally off-base with
>>>> my assumptions?
>>>>
>>>> Thanks!
>>>> --
>>>>
>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>
>>>>
>>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>>> <https://www.facebook.com/remindhq>
>>>>
>>>
>>>
>>> --
>>>
>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>
>>>
>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>> <https://www.facebook.com/remindhq>
>>>
>>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>


-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Reply via email to