[ 
https://issues.apache.org/jira/browse/KAFKA-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370505#comment-16370505
 ] 

Guozhang Wang commented on KAFKA-5285:
--------------------------------------

[~davispw][~xvrl] Thanks for your inputs. I think there are some 
mis-communications here about my proposal. I understand the issue that you 
raised, and hence proposing to always use the keyFrom only for lower range. For 
upper range I think Xavier's point is valid and I'd like to slightly modify my 
proposal accordingly. This time I'd try to propose them in a clearer way:

0. The following bullet points 1) and 2) are ONLY discussed for window stores, 
NOT session stores.

1. Note that within the [lowerRange, upperRange] iterating, we will still check 
the validity of each key within the range to determine whether or not if it 
should be returned. So the goal of setting lowerRange and upperRange is to make 
sure they are "safe" to include all possible keys. And from that point setting 
lowerRange as keyFrom should be safe, since lexicograpically all keys larger 
than keyFrom should have the its [key, timestamp, sequence] bytes 
representation larger than just [keyFrom] byte representation. Our current 
lowerRange, [keyFrom + 0000] is also safe but it is not "more efficient" than 
[keyFrom] since we know there will be not records at all in between these two 
lower bounds. On the other hand, the [keyFrom + timestampFrom] is not safe, as 
you pointed out:

{code}
The smallest byte sequence for that key will be keyFrom + S + minSuffix which 
is greater than keyFrom, but still smaller than keyFrom + minSuffix
{code}

As you mentioned, this key is still larger than {{keyFrom}} bytes. *I think 
{{keyFrom}} should be actually safe under all circumstances such that, any key 
larger than keyFrom should have its complete [key, timestamp, sequence] larger 
than the keyFrom bytes only, and hence they would all be included in the range 
for validation, please let me know if this is wrong.*

2. For the upper range, again the goal is to be "safe" first, and then see if 
we can do better on "efficiency". What I was proposing is MAX({{keyFrom + 
timestampFrom}}, {{keyTo + timestampTo}}), note it is NOT MAX({{keyFrom + 
timeEndEarliest}}, {{keyTo + timeStartLatest}}), as {{timeStartLatest}} and 
{{timeEndEarliest}} is only from session windows. So going back to your 
example, {{keyFrom + timestampFrom = A1000}}, {{keyTo + timestampTo = 
AABDFFF}}, so their larger value, {{AABDFFF}} will be selected as upper range. 

But Xavier's example exposed that {{AABDFFF}} is not a safe upper range either, 
since same records like {{AADFFF}} should be included in consideration but 
their bytes are larger than {{AABDFFF}}. *So I'm proposing to consider the 
following combinations, and pick the largest*:

a). keyFrom + timestampTo
b). keyTo + timestampTo

Please let me know if you think this works.

3. This bullet point is for session store: for session store the current byte 
representation is {{key, session-start-time, session-end-time}}, where 
{{session-end-time >= session-start-time}}. Similarly to window store, for the 
lower range I'm proposing to use only {{key}} bytes, which is safe.

For the upper range, similarly to my previous modifications on window stores, 
*we can consider the following combinations and choose the largest one*:

a) keyFrom + timeStartLatest + MAX_LONG               
b) keyTo + timeStartLatest + MAX_LONG    

> Optimize upper / lower byte range for key range scan on windowed stores
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-5285
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5285
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Xavier Léauté
>            Assignee: Guozhang Wang
>            Priority: Major
>              Labels: performance
>
> The current implementation of {{WindowKeySchema}} / {{SessionKeySchema}} 
> {{upperRange}} and {{lowerRange}} does not make any assumptions with respect 
> to the other key bound (e.g. the upper byte bound does not depends on lower 
> key bound).
> It should be possible to optimize the byte range somewhat further using the 
> information provided by the lower bound.
> More specifically, by incorporating that information, we should be able to 
> eliminate the corresponding {{upperRangeFixedSize}} and 
> {{lowerRangeFixedSize}}, since the result should be the same if we implement 
> that optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to