[ 
https://issues.apache.org/jira/browse/KAFKA-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366286#comment-16366286
 ] 

Guozhang Wang commented on KAFKA-5285:
--------------------------------------

[~davispw] [~xvrl] Thinking about this a bit more, I think we can provide a 
better heuristic to define the upper and lower bound for window store 
{{fetch(keyFrom, keyTo, timeFrom, timeTo)}} indeed, which is the following:

0. First of all, documentation wise we require users to make sure the 
serialized bytes of {{keyFrom}} is smaller than serialized bytes of {{keyTo}} 
lexicograpically. And inside this call, we first check the bytes for this 
condition. 

1. For lower range, we just use the {{keyFrom bytes}}, instead of {{keyFrom + 
minSuffix (0000s)}}. Since we know there would be no data in between these two 
keys at all, we can save some overhead of bytes alloc and puts.

2. For upper range, we just consider the following two combinations:

 a). keyFrom + minSuffix
 b). keyTo + maxSuffix

where minSuffix = timeFrom + seq0, maxSuffix = timeTo + seqMAX. And then we 
just pick the largest among these two. In all we would replace this with the 
current {{OrderedBytes}} implementations. 

Similarly, for session store {{fetch}}, we do the same thing, except for single 
key fetch we handle it more in a more optimized way:

0. For single key fetch, lower range = key + timeFrom + timFrom, upper range = 
key + timeTo + timeTo (this is because timeTo <= timeFrom, so they can be the 
bound of each other).

1. For range key fetch, lower range = keyFrom.

2. For range key fetch, upper range = MAX (keyFrom + timeFrom + timeFrom, keyTo 
+ timeTo + timeTo).

WDYT?

> Optimize upper / lower byte range for key range scan on windowed stores
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-5285
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5285
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Xavier Léauté
>            Assignee: Xavier Léauté
>            Priority: Major
>              Labels: performance
>
> The current implementation of {{WindowKeySchema}} / {{SessionKeySchema}} 
> {{upperRange}} and {{lowerRange}} does not make any assumptions with respect 
> to the other key bound (e.g. the upper byte bound does not depends on lower 
> key bound).
> It should be possible to optimize the byte range somewhat further using the 
> information provided by the lower bound.
> More specifically, by incorporating that information, we should be able to 
> eliminate the corresponding {{upperRangeFixedSize}} and 
> {{lowerRangeFixedSize}}, since the result should be the same if we implement 
> that optimization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to