+1, I have no issues with the practicality and value of this feature itself. I've left some comments concerning ongoing maintenance and compatibility-related matters, which we can continue to discuss.
Jungtaek Lim <kabhwan.opensou...@gmail.com> 于2023年10月17日周二 05:23写道: > Thanks Bartosz and Anish for your support! > > I'll wait for a couple more days to see whether we can hear more voices on > this. We could probably look for initiating a VOTE thread if there is no > objection. > > On Tue, Oct 17, 2023 at 5:48 AM Anish Shrigondekar < > anish.shrigonde...@databricks.com> wrote: > >> Hi Jungtaek, >> >> Thanks for putting this together. +1 from me and looks good overall. >> Posted some minor comments/questions to the doc. >> >> Thanks, >> Anish >> >> On Mon, Oct 16, 2023 at 11:25 AM Bartosz Konieczny < >> bartkoniec...@gmail.com> wrote: >> >>> Thank you, Jungtaek, for your answers! It's clear now. >>> >>> +1 for me. It seems like a prerequisite for further ops-related >>> improvements for the state store management. I mean especially here the >>> state rebalancing that could rely on this read+write state store API. I >>> don't mean here the dynamic state rebalancing that could probably be >>> implemented with a lower latency directly in the stateful API. Instead I'm >>> thinking more of an offline job to rebalance the state and later restart >>> the stateful pipeline with the changed number of shuffle partitions. >>> >>> Best, >>> Bartosz. >>> >>> On Mon, Oct 16, 2023 at 6:19 PM Jungtaek Lim < >>> kabhwan.opensou...@gmail.com> wrote: >>> >>>> bump for better reach >>>> >>>> On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim < >>>> kabhwan.opensou...@gmail.com> wrote: >>>> >>>>> Sorry, please use this link instead for SPIP doc: >>>>> https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing >>>>> >>>>> >>>>> On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim < >>>>> kabhwan.opensou...@gmail.com> wrote: >>>>> >>>>>> Hi dev, >>>>>> >>>>>> I'd like to start a discussion on "State Data Source - Reader". >>>>>> >>>>>> This proposal aims to introduce a new data source "statestore" which >>>>>> enables reading the state rows from existing checkpoint via offline >>>>>> (batch) >>>>>> query. This will enable users to 1) create unit tests against stateful >>>>>> query verifying the state value (especially flatMapGroupsWithState), 2) >>>>>> gather more context on the status when an incident occurs, especially for >>>>>> incorrect output. >>>>>> >>>>>> *SPIP*: >>>>>> https://docs.google.com/document/d/1HjEupRv8TRFeULtJuxRq_tEG1Wq-9UNu-ctGgCYRke0/edit?usp=sharing >>>>>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-45511 >>>>>> >>>>>> Looking forward to your feedback! >>>>>> >>>>>> Thanks, >>>>>> Jungtaek Lim (HeartSaVioR) >>>>>> >>>>>> ps. The scope of the project is narrowed to the reader in this SPIP, >>>>>> since the writer requires us to consider more cases. We are planning on >>>>>> it. >>>>>> >>>>> >>> >>> -- >>> Bartosz Konieczny >>> freelance data engineer >>> https://www.waitingforcode.com >>> https://github.com/bartosz25/ >>> https://twitter.com/waitingforcode >>> >>>