Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20496 )

Change subject: IMPALA-12373: Small String Optimization for StringValue
......................................................................


Patch Set 14:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/parquet/parquet-common.h
File be/src/exec/parquet/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/parquet/parquet-common.h@573
PS14, Line 573:   
v->Assign(reinterpret_cast<char*>(const_cast<uint8_t*>(buffer)) + 
sizeof(int32_t),
Can you add a comment about not smallifying the string?


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/text-converter.inline.h
File be/src/exec/text-converter.inline.h:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/exec/text-converter.inline.h@99
PS14, Line 99:
Wouldn't it be clearer to use SimpleString instead of StringValue from the 
start? We always assume that the string is not smallified. It would be also 
nice to have a comment about not smallifying during this function.


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/buffered-tuple-stream.cc@948
PS14, Line 948: ComputeRowSize
It could be mentioned in the function comments that this also smallifies 
strings. It may also make sense to change the name to 
ComputeRowSizeAndSmallifyStrings or something similar, as the current name 
gives the impression of being "read only"


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/smallable-string.h
File be/src/runtime/smallable-string.h:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/smallable-string.h@44
PS14, Line 44: SMALL_LIMIT
future hack idea: the layout seems to be able to keep even bigger strings in 
the tuple. The planner could leave some extra bytes before the StringValue, and 
if len fits to 7 bits but > SMALL_LIMIT, then Ptr() could calculate the start 
of string based on length. This would make sense if based on the stats the 
strings are above SMALL_LIMIT but still quite small.


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc@607
PS14, Line 607:       string_values->push_back(string_val);
              :       if (string_val->IsSmall()) continue;
Do we actually need to collect these strings? The function could be renamed to 
CollectNonNullAndNonSmallVarSlots and skip small strings completely, as they 
need no further extra handling.


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc@654
PS14, Line 654:     if (string_val->IsSmall()) continue;
see comment at line 607, small strings could be ignore completely.


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/sorter.cc@686
PS14, Line 686:     if (string_val->IsSmall()) continue;
see comment at line 607


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/string-value.h
File be/src/runtime/string-value.h:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/string-value.h@67
PS14, Line 67:   void Assign(const StringValue& other) { 
string_impl_.Assign(other.string_impl_); }
             :
             :   void Assign(char* ptr, int len) {
optional: overloads with const std::string& and StringVal could be added as 
convenience functions as there are several places where assing is called with 
their ptr/len (StringVal is only valid if non null).


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/string-value.h@78
PS14, Line 78:   /// 'false' otherwise. In the latter case the object remains 
unmodified.
Can you also mention the case when it is already smallified?


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/tuple.cc
File be/src/runtime/tuple.cc:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/runtime/tuple.cc@91
PS14, Line 91:     if (string_v->IsSmall()) return;
Can you add comments about the reason for return instead of continue?


http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/util/in-list-filter.cc
File be/src/util/in-list-filter.cc:

http://gerrit.cloudera.org:8080/#/c/20496/14/be/src/util/in-list-filter.cc@141
PS14, Line 141:     if (s.IsSmall()) {
              :       values_.insert(s)
This seems inconsistent with newly_inserted_values_.total_len, as 
StringSetWithTotalLen::Insert increases the total size with len even for small 
strings.



--
To view, visit http://gerrit.cloudera.org:8080/20496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3
Gerrit-Change-Number: 20496
Gerrit-PatchSet: 14
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 13 Nov 2023 17:12:06 +0000
Gerrit-HasComments: Yes

Reply via email to