Hello, I still don't have a good understanding of how UDAF in the Table API handles deletes. If every row aggregated into one groupBy(key) gets a retract, meaning nothing should be grouped by that key, will the state get deleted? Is there a way to delete the state for that row i.e. forward a retract but not an append and remove the state from RocksDB?
Thanks! On Fri, Dec 11, 2020 at 9:15 AM Rex Fenley <r...@remind101.com> wrote: > Hi, > > Does this question make sense or am I missing something? > > Thanks! > > On Thu, Dec 10, 2020 at 10:24 AM Rex Fenley <r...@remind101.com> wrote: > >> Ok, that makes sense. >> >> > You just need to correct the acc state to what it expects to be (say >> re-evaluate the acc without the record that needs retraction) when you >> received the retraction message. >> >> So for example, if i just remove all items from acc.groupIdSet on >> retraction it will know to clear out the state entirely from rocks? >> >> If a user gets deleted altogether (and my groupby is on user_id) what >> sort of retraction do I need to evaluate then? Because I'm thinking now it >> will need to just delete the state entirely and pass a full retraction of >> the state downstream, but I don't know if deleting state from rocks happens >> automatically or I need to make it do that in the retract method somehow. >> >> On Wed, Dec 9, 2020 at 6:16 PM Danny Chan <danny0...@apache.org> wrote: >> >>> No, the group agg, stream-stream join and rank are all stateful >>> operators which need a state-backend to bookkeep the acc values. >>> >>> But it is only required to emit the retractions when the stateful >>> operator A has a downstream operator B that is also stateful, because the B >>> needs the retractions to correct the accs. If B is not stateful, just >>> emitting the new record to override is enough. >>> >>> You just need to correct the acc state to what it expects to be (say >>> re-evaluate the acc without the record that needs retraction) when you >>> received the retraction message. >>> >>> Rex Fenley <r...@remind101.com> 于2020年12月10日周四 上午2:44写道: >>> >>>> So from what I'm understanding, the aggregate itself is not a "stateful >>>> operator" but one may follow it? How does the aggregate accumulator keep >>>> old values then? It can't all just live in memory, actually, looking at the >>>> savepoints it looks like there's state associated with our aggregate >>>> operator. >>>> >>>> To clarify my concern too, in my retract function impl in the aggregate >>>> function class, all I do is remove a value (a group id) from the >>>> accumulator set (which is an array). For example, if there is only 1 >>>> group_id left for a user and it gets deleted, that group_id will be removed >>>> from the accumulator set and the set will be empty. I would hope that at >>>> that point, given that there are no remaining rows for the aggregate, that >>>> I could or flink will just delete the associated stored accumulator >>>> altogether i.e. delete `user_id_1 -> []`. Is it possible that both the >>>> groups and the user need to be deleted for everything to clear from >>>> storage? That might make more sense actually.. >>>> >>>> If this doesn't happen, since users delete themselves and their groups >>>> all the time, we'll be storing all these empty data sets in rocks for no >>>> reason. To clarify, we're using Debezium as our source and using Flink as a >>>> materialization engine, so we never want to explicitly set a timeout on any >>>> of our data, we just want to scale up predictably with our user growth. >>>> >>>> Thanks! >>>> >>>> On Wed, Dec 9, 2020 at 4:14 AM Danny Chan <danny0...@apache.org> wrote: >>>> >>>>> Hi, Rex Fenley ~ >>>>> >>>>> If there is stateful operator as the output of the aggregate function. >>>>> Then each time the function receives an update (or delete) for the key, >>>>> the >>>>> agg operator would emit 2 messages, one for retracting the old record, one >>>>> for the new message. For your case, the new message is the DELETE. >>>>> >>>>> If there is no stateful operator, the aggregate operator would just >>>>> emit the update after (the new) message which is the delete. >>>>> >>>>> Rex Fenley <r...@remind101.com> 于2020年12月9日周三 上午4:30写道: >>>>> >>>>>> Hello, >>>>>> >>>>>> I'd like to better understand delete behavior of AggregateFunctions. >>>>>> Let's assume there's an aggregate of `user_id` to a set of `group_ids` >>>>>> for >>>>>> groups belonging to that user. >>>>>> `user_id_1 -> [group_id_1, group_id_2, etc.]` >>>>>> Now let's assume sometime later that deletes arrive for all rows >>>>>> which produce user_id_1's group_id's. >>>>>> >>>>>> Would the aggregate function completely delete the associated state >>>>>> from RocksDB or would it leave something like `user_id_1 -> []` sitting >>>>>> in >>>>>> RocksDB forever? >>>>>> >>>>>> We have an aggregate similar to this where users could delete >>>>>> themselves and we want to make sure we're not accumulating data forever >>>>>> for >>>>>> those users. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> -- >>>>>> >>>>>> Rex Fenley | Software Engineer - Mobile and Backend >>>>>> >>>>>> >>>>>> Remind.com <https://www.remind.com/> | BLOG >>>>>> <http://blog.remind.com/> | FOLLOW US >>>>>> <https://twitter.com/remindhq> | LIKE US >>>>>> <https://www.facebook.com/remindhq> >>>>>> >>>>> >>>> >>>> -- >>>> >>>> Rex Fenley | Software Engineer - Mobile and Backend >>>> >>>> >>>> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >>>> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >>>> <https://www.facebook.com/remindhq> >>>> >>> >> >> -- >> >> Rex Fenley | Software Engineer - Mobile and Backend >> >> >> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >> <https://www.facebook.com/remindhq> >> > > > -- > > Rex Fenley | Software Engineer - Mobile and Backend > > > Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | > FOLLOW US <https://twitter.com/remindhq> | LIKE US > <https://www.facebook.com/remindhq> > -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>