[ https://issues.apache.org/jira/browse/KAFKA-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370505#comment-16370505 ]
Guozhang Wang commented on KAFKA-5285: -------------------------------------- [~davispw][~xvrl] Thanks for your inputs. I think there are some mis-communications here about my proposal. I understand the issue that you raised, and hence proposing to always use the keyFrom only for lower range. For upper range I think Xavier's point is valid and I'd like to slightly modify my proposal accordingly. This time I'd try to propose them in a clearer way: 0. The following bullet points 1) and 2) are ONLY discussed for window stores, NOT session stores. 1. Note that within the [lowerRange, upperRange] iterating, we will still check the validity of each key within the range to determine whether or not if it should be returned. So the goal of setting lowerRange and upperRange is to make sure they are "safe" to include all possible keys. And from that point setting lowerRange as keyFrom should be safe, since lexicograpically all keys larger than keyFrom should have the its [key, timestamp, sequence] bytes representation larger than just [keyFrom] byte representation. Our current lowerRange, [keyFrom + 0000] is also safe but it is not "more efficient" than [keyFrom] since we know there will be not records at all in between these two lower bounds. On the other hand, the [keyFrom + timestampFrom] is not safe, as you pointed out: {code} The smallest byte sequence for that key will be keyFrom + S + minSuffix which is greater than keyFrom, but still smaller than keyFrom + minSuffix {code} As you mentioned, this key is still larger than {{keyFrom}} bytes. *I think {{keyFrom}} should be actually safe under all circumstances such that, any key larger than keyFrom should have its complete [key, timestamp, sequence] larger than the keyFrom bytes only, and hence they would all be included in the range for validation, please let me know if this is wrong.* 2. For the upper range, again the goal is to be "safe" first, and then see if we can do better on "efficiency". What I was proposing is MAX({{keyFrom + timestampFrom}}, {{keyTo + timestampTo}}), note it is NOT MAX({{keyFrom + timeEndEarliest}}, {{keyTo + timeStartLatest}}), as {{timeStartLatest}} and {{timeEndEarliest}} is only from session windows. So going back to your example, {{keyFrom + timestampFrom = A1000}}, {{keyTo + timestampTo = AABDFFF}}, so their larger value, {{AABDFFF}} will be selected as upper range. But Xavier's example exposed that {{AABDFFF}} is not a safe upper range either, since same records like {{AADFFF}} should be included in consideration but their bytes are larger than {{AABDFFF}}. *So I'm proposing to consider the following combinations, and pick the largest*: a). keyFrom + timestampTo b). keyTo + timestampTo Please let me know if you think this works. 3. This bullet point is for session store: for session store the current byte representation is {{key, session-start-time, session-end-time}}, where {{session-end-time >= session-start-time}}. Similarly to window store, for the lower range I'm proposing to use only {{key}} bytes, which is safe. For the upper range, similarly to my previous modifications on window stores, *we can consider the following combinations and choose the largest one*: a) keyFrom + timeStartLatest + MAX_LONG b) keyTo + timeStartLatest + MAX_LONG > Optimize upper / lower byte range for key range scan on windowed stores > ----------------------------------------------------------------------- > > Key: KAFKA-5285 > URL: https://issues.apache.org/jira/browse/KAFKA-5285 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Xavier Léauté > Assignee: Guozhang Wang > Priority: Major > Labels: performance > > The current implementation of {{WindowKeySchema}} / {{SessionKeySchema}} > {{upperRange}} and {{lowerRange}} does not make any assumptions with respect > to the other key bound (e.g. the upper byte bound does not depends on lower > key bound). > It should be possible to optimize the byte range somewhat further using the > information provided by the lower bound. > More specifically, by incorporating that information, we should be able to > eliminate the corresponding {{upperRangeFixedSize}} and > {{lowerRangeFixedSize}}, since the result should be the same if we implement > that optimization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)