[Impala-ASF-CR] IMPALA-13186: Tag query option scope for tuple cache

Joe McDonnell (Code Review) Thu, 29 Aug 2024 11:49:24 -0700

Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21698 )


Change subject: IMPALA-13186: Tag query option scope for tuple cache
......................................................................


Patch Set 1:

(3 comments)

I'll do a pass on the individual query options, but here are some first 
thoughts.

http://gerrit.cloudera.org:8080/#/c/21698/1/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/21698/1/be/src/service/query-options.h@44
PS1, Line 44: // Scope descriptor for each query option to help with evaluating 
how different aspects of
            : // query execution are affected. Defined as a bitmask for options 
that affect multiple.
            : enum class QueryOptionScope : uint8_t {
            :   None        = 0,      // Not part of any known scope
            :   Performance = 1 << 0, // Solely impacts performance without 
changing plan or results
            :   Plan        = 1 << 1, // Can result in a different query plan
            :   Results     = 1 << 2, // Can produce different results in 
fragments of a query plan
            :   All         = 0xFF    // Can affect any scope (mainly used for 
DEBUG_ACTION)
            : };
My first thought is that this is somewhat abstract. I'm concerned a developer 
may not recognize the importance of getting it right. I do think having some 
categorization of options is useful, independent of tuple caching.

An alternative would be to hash all options unless the option specifies a 
reason that it doesn't need to be included. Some valid reasons:
1. This option is never used outside the planner / coordinator. If we didn't 
send this to the executors, it wouldn't notice. A bunch of options fit here. 
(This is also something we could think about enforcing.)
2. It does go to the executor, but it has no impact on the results (including 
whether or not errors are generated).

The default for any new option is to not specify one of these reasons, so it 
would be hashed. Hashing too much is relatively harmless unless the query 
option is changing a lot.


http://gerrit.cloudera.org:8080/#/c/21698/1/be/src/service/query-options.h@79
PS1, Line 79:   QUERY_OPT_FN(abort_on_error, ABORT_ON_ERROR,                    
                       \
            :       TQueryOptionLevel::REGULAR, QueryOptionScope::None)         
                       \
Here's a scenario:
1. User runs query with ABORT_ON_ERROR=false and it runs into errors but 
continues and produces results.
2. User runs the same query with ABORT_ON_ERROR=true. If it hits the cache, the 
result would be from a run with ABORT_ON_ERROR=false and the query might 
succeed when it otherwise should fail.


http://gerrit.cloudera.org:8080/#/c/21698/1/be/src/thirdparty/datasketches/MurmurHash3.h
File be/src/thirdparty/datasketches/MurmurHash3.h:

http://gerrit.cloudera.org:8080/#/c/21698/1/be/src/thirdparty/datasketches/MurmurHash3.h@91
PS1, Line 91: MURMUR3_FORCE_INLINE void MurmurHash3_x64_128(const void* key, 
int lenBytes, HashState& out) {
> I guess I could use HashUtil instead, just seemed like I'd want to also use
Using MurmurHash3 makes sense. It is ok to make MurmurHash3 more usable for our 
purpose.

At some point we may end up moving the datasketches code to the native 
toolchain. (They have done changes on top of this file.)



--
To view, visit http://gerrit.cloudera.org:8080/21698
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f4802ad9548749cd43df8848b6f46dca3739ae7
Gerrit-Change-Number: 21698
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Thu, 29 Aug 2024 18:45:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-13186: Tag query option scope for tuple cache

Reply via email to