iterator state persistence in 2.0.1

Scott Kirklin Wed, 08 Jun 2022 11:40:38 -0700

Hello,

I am trying to do graph traversal with a custom Iterator. Simplifying a
bit, a “node” is a unique row id and edges are represented as an entry
where the Key.row is the source node and the Key.colQualifier is the target
node. The custom iterator maintains a stack and uses a subordinate iterator
to traverse following these edges. For small graphs this works exactly as
hoped, but once the graph becomes large enough to fill a scan batch the
iterator is torn down and when re-init’ed the stack is gone, so I can’t
resume from where it left off. From the docs it says that "Being torn-down
is equivalent to a new instance of the Iterator being creating and deepCopy
being called on the new instance with the old instance provided as the
argument to deepCopy". I thought that meant that I could carry state
through the life of the traversal, at least as long as the iterator stays
on a single TServer and deepCopy copies the right data, but I cannot find
evidence that this actually happens in the code or by tracing.
IterConfigUtil looks like it is responsible for re-creating the iterator
when resuming a scan, and it only calls ‘init’.


Now, my actual question: Is there a supported way to maintain internal
state throughout the lifetime of an Iterator? Is my approach at all
sensible?

I am able to accomplish what I want 100% from the client as well of course,
but that will have much worse performance for many users. A lot of usage
happens by users who connect (over high latency connections) through the
thrift proxy, which will make a client side solution very non-performant,
so I am motivated to figure out a server-side solution, but am not married
to any particular pattern. Totally changing the key design is on the table
as well, as this effort is still somewhat greenfield.

Thanks in advance,
Scott

iterator state persistence in 2.0.1

Reply via email to