Hello,

I'm using the Table API to do a bunch of stateful transformations on CDC
Debezium rows and then insert final documents into Elasticsearch via the ES
connector.

I've noticed that Elasticsearch is constantly deleting and then inserting
documents as they update. Ideally, there would be no delete operation for a
row update, only for a delete. I'm using the Elasticsearch 7 SQL connector,
which I'm assuming uses `Elasticsearch7UpsertTableSink` under the hood,
which implies upserts are actually what it's capable of.

Therefore, I think it's possibly my table plan that's causing row upserts
to turn into deletes + inserts. My plan is essentially a series of Joins
and GroupBys + UDF Aggregates (aggregating arrays of data). I think,
possibly the UDF Aggs following the Joins + GroupBys are causing the
upserts to split into delete + inserts somehow. If this is correct, is it
possible to make UDFs that preserve Upserts? Or am I totally off-base with
my assumptions?

Thanks!
-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Reply via email to