[ https://issues.apache.org/jira/browse/KAFKA-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367892#comment-16367892 ]
Xavier Léauté commented on KAFKA-5285: -------------------------------------- [~davispw] yes I understand your concern, and that's part the reason I filed this JIRA in the first place. Currently our internal APIs assume that knowing keyTo and timeTo is sufficient to determine the upper bound. However, without information about the lower key, or timeFrom it is not possible to determine an optimal upper bound for the byte range. Let me illustrate the problem that [~guozhang] is trying to solve: Let's say we query the key range {{A1}} through {{AAB}}, and timestamps {{0000}} through {{DFFF}} (let's assume timestamps consist of 4 bytes for this example). We could have the following order: {code} A13333 (A1 | 3333) AA0000 (AA | 0000) AAA0000 (AAA | 0000) AAB1234 (AAB | 1234) AAC321 (AA | C321) AADFFF (AA | DFFF) {code} That is, it is possible for keys smaller than keyTo to show up after the bytes for (keyTo+maxSuffix). This happens only if: # keys within the range can be shorter than keyTo. # The suffix (a.k.a. timestamp) bytes are greater than the bytes in the same position for keyTo Without any additional information about the key length or or the lower bound, we can only assume that keys are at least 1 byte, and that byte has to be smaller or equal to the first byte of keyTo (i.e. our upper bound has to start with the first byte of keyTo), so our best guess for and upper bound in that case is ADFFF. NB: [~guozhang] in this particular case MAX(keyFrom + timeEndEarliest, keyTo + timeStartLatest) = max(A1DFFF, AAB0000) = AAB0000 would not work since it does not include the full time range for the AAB keys, or maybe I did not understand your suggestion. Any key smaller than keyTo that appears after (keyTo+maxSuffix) is necessarily a prefix of keyTo, so knowing the lower bound is only helpful if it shares a prefix with keyTo (e.g. if in this case keyFrom was {{7F}}, then the lower bound would not be helpful). However because the lower bound is {{A1}}, we now know keys starting with A have at least two bytes, and the second byte has to be an A based on the previous statement. This means me can reduce our upper bound to {{AADFFFF}}. Also, if we have a smaller timeTo we can include any prefix of keyTo that does not contain any byte larger than the first byte of of timeTo, but it gets trickier once you have matching bytes in the timestamp and the key[ Let me think this though a bit more and I'll try to come up with something that will work optimally in most cases, but my hunch is that unless we introduce optimizations for stores with fixed size keys, or have some information about the minimum key length, we will end up scanning a large portion of the table for long time ranges. > Optimize upper / lower byte range for key range scan on windowed stores > ----------------------------------------------------------------------- > > Key: KAFKA-5285 > URL: https://issues.apache.org/jira/browse/KAFKA-5285 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Xavier Léauté > Assignee: Guozhang Wang > Priority: Major > Labels: performance > > The current implementation of {{WindowKeySchema}} / {{SessionKeySchema}} > {{upperRange}} and {{lowerRange}} does not make any assumptions with respect > to the other key bound (e.g. the upper byte bound does not depends on lower > key bound). > It should be possible to optimize the byte range somewhat further using the > information provided by the lower bound. > More specifically, by incorporating that information, we should be able to > eliminate the corresponding {{upperRangeFixedSize}} and > {{lowerRangeFixedSize}}, since the result should be the same if we implement > that optimization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)