Matthias, Damian:

Thank you for your replies.

> Can you check if the problem exist for 0.10.2, too?

I will upgrade to 0.10.2 after this development cycle. I'm still in
development so compatibility is not as big an issue as getting to
production.

>  range() should return ordered data,

In my experiments, the order in which the data was returned the first time
is the order it was returned all subsequent times. But that order was not
lexicographic, but seemingly random.

> what the key type and serializer you use?

I am using Protocol Buffers, which are ordered structs. You construct the
protocol buffers object, and then the serializer calls ".toByteArray()" on
it to get the bytes. I thought this would a very simple way to create keys
that when serialized would facilitate prefixed range scans. For example, a
Protocol Buffer message like

message {
   bytes user_id = 1;
   bytes post_id = 2;
}

when serialized puts the user_id first, then the post_id in the total byte
string. Some Protocol Buffers data types use variable-length encoding, so I
was careful not to use any of these types in my keys.

> When you use Queryable State you are actually querying multiple
underlying stores, i.e., one per partition.

Huh? I was only querying one partition. In my example, I have a user's
posts. Upon creation, they are routed to a particular partition using a
partitioner that hashes the post's user ID. The posts are then indexed on
that partition by prefixed keys using the method described above. When
querying, I am only querying the one partition that has all of the user's
posts. As far as I know, I am not querying across multiple partitions.
Furthermore, I did not even think this was possible, given the fact that
Interactive Queries require you to manually forward requests that should go
to other partitions.






On Thu, Mar 16, 2017 at 2:11 PM, Damian Guy <damian....@gmail.com> wrote:

> I think what you are seeing is that the order is not guaranteed across
> partitions. When you use Queryable State you are actually querying multiple
> underlying stores, i.e., one per partition. The implementation iterates
> over one store/partition at a time, so the ordering will appear random.
> This could be improved
>
> The tombstone records appearing in the results seems like a bug.
>
> Thanks,
> Damian
>
> On Thu, 16 Mar 2017 at 17:37 Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> > Can you check if the problem exist for 0.10.2, too? (0.10.2 is
> > compatible to 0.10.1 broker -- so you can upgrade your Streams code
> > independently from the brokers).
> >
> > About the range: I did double check this, and I guess my last answer was
> > not correct, and range() should return ordered data, but I got a follow
> > up question: what the key type and serializer you use? Internally, data
> > is stored in serialized form and ordered according to
> > `LexicographicByteArrayComparator` -- thus, if the serialized bytes
> > don't reflect the order of the deserialized data, it returned range
> > shows up unordered to you.
> >
> >
> > -Matthias
> >
> >
> >
> >
> > On 3/16/17 10:14 AM, Dmitry Minkovsky wrote:
> > > Hi Matthias. Thank you for your response.
> > >
> > > Yes, I was able to reproduce the null issue reliably. I can't open a
> JIRA
> > > at this time, but I can say I was using 0.10.1.0 and it was trivial to
> > > reproduce. Just send records and the tombstones to a table topic. Then
> > scan
> > > the range. You'll see the tombstones.
> > >
> > > Indeed, ranges are returned with no specific order. I'm not sure what
> you
> > > mean that default stores are hash-based, but this ordering thing is a
> > shame
> > > because it kind of kills the ability to use KS as a full fledged DB
> that
> > > lets you index things like HBase (composite keys for lists of items).
> Is
> > > that how RocksDB works? Just returns range scans in random order? I
> don't
> > > know C++ so the documentation is a bit opaque to me. But what's the
> point
> > > of scanning a range if the data comes in some random order? That being
> > the
> > > case, the number of possible use-case scenarios seem to become
> > > significantly limited.
> > >
> > >
> > > Thank you!
> > > Dmitry
> > >
> > > On Tue, Mar 14, 2017 at 1:12 PM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > >>> However,
> > >>>> for keys that have been tombstoned, it does return null for me.
> > >>
> > >> Sound like a bug. Can you reliable reproduce this? Would you mind
> > >> opening a JIRA?
> > >>
> > >> Can you check if this happens for both cases: caching enabled and
> > >> disabled? Or only for once case?
> > >>
> > >>
> > >>> "No ordering guarantees are provided."
> > >>
> > >> That is correct. Internally, default stores are hash-based -- thus, we
> > >> don't give a sorted list/iterator back. You could replace RocksDB
> with a
> > >> custom store though.
> > >>
> > >>
> > >> -Matthias
> > >>
> > >>
> > >> On 3/13/17 3:56 PM, Dmitry Minkovsky wrote:
> > >>> I am using interactive streams to query tables:
> > >>>
> > >>>             ReadOnlyKeyValueStore<Messages.ByUserAndDate,
> > >>> Messages.UserLetter> store
> > >>>               = streams.store("view-user-drafts",
> > >>> QueryableStoreTypes.keyValueStore());
> > >>>
> > >>> Documentation says that #range() should not return null values.
> > However,
> > >>> for keys that have been tombstoned, it does return null for me.
> > >>>
> > >>> Also, I noticed only just now that "No ordering guarantees are
> > >> provided." I
> > >>> haven't done enough testing or looked at the code carefully enough
> yet
> > >> and
> > >>> wonder if someone who knows could confirm: is this true? Is this
> common
> > >> to
> > >>> all store implementations? I was hoping to use interactive streams
> like
> > >>> HBase to scan ranges. It appears this is not possible.
> > >>>
> > >>> Thank you,
> > >>> Dmitry
> > >>>
> > >>
> > >>
> > >
> >
> >
>

Reply via email to