Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/21698 )
Change subject: IMPALA-13186: Tag query option scope for tuple cache ...................................................................... Patch Set 1: (3 comments) I'll do a pass on the individual query options, but here are some first thoughts. http://gerrit.cloudera.org:8080/#/c/21698/1/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/21698/1/be/src/service/query-options.h@44 PS1, Line 44: // Scope descriptor for each query option to help with evaluating how different aspects of : // query execution are affected. Defined as a bitmask for options that affect multiple. : enum class QueryOptionScope : uint8_t { : None = 0, // Not part of any known scope : Performance = 1 << 0, // Solely impacts performance without changing plan or results : Plan = 1 << 1, // Can result in a different query plan : Results = 1 << 2, // Can produce different results in fragments of a query plan : All = 0xFF // Can affect any scope (mainly used for DEBUG_ACTION) : }; My first thought is that this is somewhat abstract. I'm concerned a developer may not recognize the importance of getting it right. I do think having some categorization of options is useful, independent of tuple caching. An alternative would be to hash all options unless the option specifies a reason that it doesn't need to be included. Some valid reasons: 1. This option is never used outside the planner / coordinator. If we didn't send this to the executors, it wouldn't notice. A bunch of options fit here. (This is also something we could think about enforcing.) 2. It does go to the executor, but it has no impact on the results (including whether or not errors are generated). The default for any new option is to not specify one of these reasons, so it would be hashed. Hashing too much is relatively harmless unless the query option is changing a lot. http://gerrit.cloudera.org:8080/#/c/21698/1/be/src/service/query-options.h@79 PS1, Line 79: QUERY_OPT_FN(abort_on_error, ABORT_ON_ERROR, \ : TQueryOptionLevel::REGULAR, QueryOptionScope::None) \ Here's a scenario: 1. User runs query with ABORT_ON_ERROR=false and it runs into errors but continues and produces results. 2. User runs the same query with ABORT_ON_ERROR=true. If it hits the cache, the result would be from a run with ABORT_ON_ERROR=false and the query might succeed when it otherwise should fail. http://gerrit.cloudera.org:8080/#/c/21698/1/be/src/thirdparty/datasketches/MurmurHash3.h File be/src/thirdparty/datasketches/MurmurHash3.h: http://gerrit.cloudera.org:8080/#/c/21698/1/be/src/thirdparty/datasketches/MurmurHash3.h@91 PS1, Line 91: MURMUR3_FORCE_INLINE void MurmurHash3_x64_128(const void* key, int lenBytes, HashState& out) { > I guess I could use HashUtil instead, just seemed like I'd want to also use Using MurmurHash3 makes sense. It is ok to make MurmurHash3 more usable for our purpose. At some point we may end up moving the datasketches code to the native toolchain. (They have done changes on top of this file.) -- To view, visit http://gerrit.cloudera.org:8080/21698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1f4802ad9548749cd43df8848b6f46dca3739ae7 Gerrit-Change-Number: 21698 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Yida Wu <[email protected]> Gerrit-Comment-Date: Thu, 29 Aug 2024 18:45:13 +0000 Gerrit-HasComments: Yes
