[
https://issues.apache.org/jira/browse/KAFKA-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370505#comment-16370505
]
Guozhang Wang commented on KAFKA-5285:
--------------------------------------
[~davispw][~xvrl] Thanks for your inputs. I think there are some
mis-communications here about my proposal. I understand the issue that you
raised, and hence proposing to always use the keyFrom only for lower range. For
upper range I think Xavier's point is valid and I'd like to slightly modify my
proposal accordingly. This time I'd try to propose them in a clearer way:
0. The following bullet points 1) and 2) are ONLY discussed for window stores,
NOT session stores.
1. Note that within the [lowerRange, upperRange] iterating, we will still check
the validity of each key within the range to determine whether or not if it
should be returned. So the goal of setting lowerRange and upperRange is to make
sure they are "safe" to include all possible keys. And from that point setting
lowerRange as keyFrom should be safe, since lexicograpically all keys larger
than keyFrom should have the its [key, timestamp, sequence] bytes
representation larger than just [keyFrom] byte representation. Our current
lowerRange, [keyFrom + 0000] is also safe but it is not "more efficient" than
[keyFrom] since we know there will be not records at all in between these two
lower bounds. On the other hand, the [keyFrom + timestampFrom] is not safe, as
you pointed out:
{code}
The smallest byte sequence for that key will be keyFrom + S + minSuffix which
is greater than keyFrom, but still smaller than keyFrom + minSuffix
{code}
As you mentioned, this key is still larger than {{keyFrom}} bytes. *I think
{{keyFrom}} should be actually safe under all circumstances such that, any key
larger than keyFrom should have its complete [key, timestamp, sequence] larger
than the keyFrom bytes only, and hence they would all be included in the range
for validation, please let me know if this is wrong.*
2. For the upper range, again the goal is to be "safe" first, and then see if
we can do better on "efficiency". What I was proposing is MAX({{keyFrom +
timestampFrom}}, {{keyTo + timestampTo}}), note it is NOT MAX({{keyFrom +
timeEndEarliest}}, {{keyTo + timeStartLatest}}), as {{timeStartLatest}} and
{{timeEndEarliest}} is only from session windows. So going back to your
example, {{keyFrom + timestampFrom = A1000}}, {{keyTo + timestampTo =
AABDFFF}}, so their larger value, {{AABDFFF}} will be selected as upper range.
But Xavier's example exposed that {{AABDFFF}} is not a safe upper range either,
since same records like {{AADFFF}} should be included in consideration but
their bytes are larger than {{AABDFFF}}. *So I'm proposing to consider the
following combinations, and pick the largest*:
a). keyFrom + timestampTo
b). keyTo + timestampTo
Please let me know if you think this works.
3. This bullet point is for session store: for session store the current byte
representation is {{key, session-start-time, session-end-time}}, where
{{session-end-time >= session-start-time}}. Similarly to window store, for the
lower range I'm proposing to use only {{key}} bytes, which is safe.
For the upper range, similarly to my previous modifications on window stores,
*we can consider the following combinations and choose the largest one*:
a) keyFrom + timeStartLatest + MAX_LONG
b) keyTo + timeStartLatest + MAX_LONG
> Optimize upper / lower byte range for key range scan on windowed stores
> -----------------------------------------------------------------------
>
> Key: KAFKA-5285
> URL: https://issues.apache.org/jira/browse/KAFKA-5285
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Xavier Léauté
> Assignee: Guozhang Wang
> Priority: Major
> Labels: performance
>
> The current implementation of {{WindowKeySchema}} / {{SessionKeySchema}}
> {{upperRange}} and {{lowerRange}} does not make any assumptions with respect
> to the other key bound (e.g. the upper byte bound does not depends on lower
> key bound).
> It should be possible to optimize the byte range somewhat further using the
> information provided by the lower bound.
> More specifically, by incorporating that information, we should be able to
> eliminate the corresponding {{upperRangeFixedSize}} and
> {{lowerRangeFixedSize}}, since the result should be the same if we implement
> that optimization.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)