[
https://issues.apache.org/jira/browse/KAFKA-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369654#comment-16369654
]
Peter Davis commented on KAFKA-5285:
------------------------------------
Noting that after upgrading from 0.11.0.1 to 1.0.1 today, I'm seeing severely
degraded performance of `(ReadOnly)SessionStore.fetch(key)` as well. Before we
were only seeing the problem with `fetch(from,to)`. Browsed the source code
and I didn't immediately see what changed between 0.11 and 1.0 there. (Another
guess is it's a subtle side effect of some other change like perhaps
https://issues.apache.org/jira/browse/KAFKA-4868 resulting in different
compacted DB levels somehow?)
Anyway, workaround for me is to use `findSessions(key, 0,
System.currentTimeMillis() + <some reasonable time in the future>)`, since the
0x00 bytes in a timestamp < Long.MAX_VALUE yield a few extra usable bytes of
maxKey prefix.
Both `ReadOnlySessionStore.fetch(...)` variants are entirely unusable for me at
this time.
> Without any additional information about the key length or or the lower
> bound, we can only assume that keys are at least 1 byte, and that byte has to
> be smaller or equal to the first byte of keyTo (i.e. our upper bound has to
> start with the first byte of keyTo), so our best guess for and upper bound in
> that case is ADFFF.
Doing a range query with *one byte* of prefix will never give acceptable
performance for any database with more than 8 keys(!), or in use cases where
key prefixes are not randomly distributed (common in business applications).
May I suggest a few options, not mutually exclusive, but in order of preference:
1. Optimize where fromKey and toKey are the same or have a common prefix.
(Isn't that your minimum key length right there? I'm not really sure I
understand why it's not just this simple. Note, this is the only case I
personally care about.)
2. Deprecate the `fetch` variants in favor of `findSessions`, and document that
using max=Long.MAX_VALUE is not recommended. Promote findSessions to
ReadOnlySessionStore. (This at least gives a few more bytes of usable key
prefix.)
3. Configuration for default timeStartLatest = currentTimeMillis() +
<reasonable offset like 1 day>. (Same benefit as #2)
4. Configure minimum key length. I don't like this because if natural keys are
used (user names, human-readable business object references like "file number",
etc.) then there isn't necessarily a good minimum key length that can be
enforced by the application. And if there were, it'd likely vary by store,
raising the question of how do you easily configure per-store configs.
> Optimize upper / lower byte range for key range scan on windowed stores
> -----------------------------------------------------------------------
>
> Key: KAFKA-5285
> URL: https://issues.apache.org/jira/browse/KAFKA-5285
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Xavier Léauté
> Assignee: Guozhang Wang
> Priority: Major
> Labels: performance
>
> The current implementation of {{WindowKeySchema}} / {{SessionKeySchema}}
> {{upperRange}} and {{lowerRange}} does not make any assumptions with respect
> to the other key bound (e.g. the upper byte bound does not depends on lower
> key bound).
> It should be possible to optimize the byte range somewhat further using the
> information provided by the lower bound.
> More specifically, by incorporating that information, we should be able to
> eliminate the corresponding {{upperRangeFixedSize}} and
> {{lowerRangeFixedSize}}, since the result should be the same if we implement
> that optimization.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)