Hi all,

Just to clarify: there actually is a position file. It was a small detail of 
the IQv2 implementation to add it, otherwise a persistent store's position 
would be lost after a restart.

Otherwise, Sophie is right on the money. The checkpoint refers to an offset in 
the changelog, while the position refers to offsets in the task's input topics 
topics. So they are similar in function and structure, but they refer to two 
different things.

I agree that, given this, it doesn't seem like consolidating them (for example, 
into one file) would be worth it. It would make the code more complicated 
without deduping any information.

I hope this helps, and look forward to what you're cooking up, Nick!
-John

On 2022/11/12 00:50:27 Sophie Blee-Goldman wrote:
> Hey Nick,
> 
> I haven't been following the new IQv2 work very closely so take this with a
> grain of salt,
> but as far as I'm aware there's no such thing as "position files" -- the
> Position is just an
> in-memory object and is related to a user's query against the state store,
> whereas a
> checkpoint file reflects the current state of the store ie how much of the
> changelog it
> contains.
> 
> In other words while these might look like they do similar things, the
> actual usage and
> implementation of Positions vs checkpoint files is pretty much unrelated.
> So I don't think
> it would sense for Streams to try and consolidate these or replace one with
> another.
> 
> Hope this answers your question, and I'll ping John to make sure I'm not
> misleading
> you regarding the usage/intention of Positions
> 
> Sophie
> 
> On Fri, Nov 11, 2022 at 6:48 AM Nick Telford <nick.telf...@gmail.com> wrote:
> 
> > Hi everyone,
> >
> > I'm trying to understand how StateStores work internally for some changes
> > that I plan to propose, and I'd like some clarification around checkpoint
> > files and position files.
> >
> > It appears as though position files are relatively new, and were created as
> > part of the IQv2 initiative, as a means to track the position of the local
> > state store so that reads could be bound by particular positions?
> >
> > Checkpoint files look much older, and are managed by the Task itself
> > (actually, ProcessorStateManager). It looks like this is used exclusively
> > for determining a) whether to restore a store, and b) which offsets to
> > restore from?
> >
> > If I've understood the above correctly, is there any scope to potentially
> > replace checkpoint files with StateStore#position()?
> >
> > Regards,
> >
> > Nick
> >
> 

Reply via email to