[ https://issues.apache.org/jira/browse/ARROW-11549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sikan Chen updated ARROW-11549: ------------------------------- Description: Gandiva's caching mechanism for filters relies on {{FilterCacheKey}} to return the correct cached filter. {{FilterCacheKey}}'s hash function factors in the string representation of a given expression, however {{Expression::ToString()}} doesn't really distinguish null and string literal 'null'. As a result, incorrect filters may be returned from the cache. In our case, we are building a SQL parser on top of gandiva, but, for instance, both of {code:java} WHERE null = null {code} and {code:java} WHERE 'null' = 'null' {code} result in the same string representation of gandiva expression: {code:java} bool equal((const string) null, (const string) null) {code} A simple test to demonstrate the issue: {code:java} auto f = field("foo", utf8()); auto schema = arrow::schema({f}); auto node_a = TreeExprBuilder::MakeStringLiteral("null"); auto node_b = TreeExprBuilder::MakeStringLiteral("null"); auto equal_func = TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean()); auto condition = TreeExprBuilder::MakeCondition(equal_func); std::shared_ptr<Filter> filter1; auto status = Filter::Make(schema, condition, &filter1); EXPECT_TRUE(status.ok()); auto string_type = std::make_shared<arrow::StringType>(); node_a = TreeExprBuilder::MakeNull(string_type); node_b = TreeExprBuilder::MakeNull(string_type); equal_func = TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean()); condition = TreeExprBuilder::MakeCondition(equal_func); std::shared_ptr<Filter> filter2; status = Filter::Make(schema, condition, &filter2); EXPECT_TRUE(status.ok()); EXPECT_TRUE(filter1.get() != filter2.get()); {code} Making {{LiteralToStream}} add quotes around the literal seems like a quick-and-dirty fix. was: Gandiva's caching mechanism for filters relies on {{FilterCacheKey}} to return the correct cached filter. {{FilterCacheKey}}'s hash function factors in the string representation of a given expression, however {{Expression::ToString()}} doesn't really distinguish null and string literal 'null'. As a result, incorrect filters may be returned from the cache. In our case, we are building a SQL parser on top of gandiva, but, for instance, both of {code:java} WHERE null = null {code} and {code:java} WHERE 'null' = 'null' {code} result in the same string representation of gandiva expression: {code:java} bool equal((const string) null, (const string) null) {code} A simple test to demonstrate the issue: {code:java} auto f = field("foo", utf8()); auto schema = arrow::schema({f}); auto node_a = TreeExprBuilder::MakeStringLiteral("null"); auto node_b = TreeExprBuilder::MakeStringLiteral("null"); auto equal_func = TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean()); auto condition = TreeExprBuilder::MakeCondition(equal_func); std::shared_ptr<Filter> filter1; auto status = Filter::Make(schema, condition, &filter1); EXPECT_TRUE(status.ok()); auto string_type = std::make_shared<arrow::StringType>(); node_a = TreeExprBuilder::MakeNull(string_type); node_b = TreeExprBuilder::MakeNull(string_type); equal_func = TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean()); condition = TreeExprBuilder::MakeCondition(equal_func); std::shared_ptr<Filter> filter2; status = Filter::Make(schema, condition, &filter2); EXPECT_TRUE(status.ok()); EXPECT_TRUE(filter1.get() != filter2.get()); {code} Making {{LiteralToStream}} adding quotes around the literal seems like a quick-and-dirty fix. > Expression::ToString() doesn't distinguish null and string literal 'null', > causing issues with FilterCacheKey > ------------------------------------------------------------------------------------------------------------- > > Key: ARROW-11549 > URL: https://issues.apache.org/jira/browse/ARROW-11549 > Project: Apache Arrow > Issue Type: Bug > Components: C++, C++ - Gandiva > Reporter: Sikan Chen > Priority: Critical > > Gandiva's caching mechanism for filters relies on {{FilterCacheKey}} to > return the correct cached filter. {{FilterCacheKey}}'s hash function factors > in the string representation of a given expression, however > {{Expression::ToString()}} doesn't really distinguish null and string literal > 'null'. As a result, incorrect filters may be returned from the cache. > In our case, we are building a SQL parser on top of gandiva, but, for > instance, both of > {code:java} > WHERE null = null > {code} > and > {code:java} > WHERE 'null' = 'null' > {code} > result in the same string representation of gandiva expression: > {code:java} > bool equal((const string) null, (const string) null) > {code} > A simple test to demonstrate the issue: > {code:java} > auto f = field("foo", utf8()); > auto schema = arrow::schema({f}); > auto node_a = TreeExprBuilder::MakeStringLiteral("null"); > auto node_b = TreeExprBuilder::MakeStringLiteral("null"); > auto equal_func = > TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, > arrow::boolean()); > auto condition = TreeExprBuilder::MakeCondition(equal_func); > std::shared_ptr<Filter> filter1; > auto status = Filter::Make(schema, condition, &filter1); > EXPECT_TRUE(status.ok()); > auto string_type = std::make_shared<arrow::StringType>(); > node_a = TreeExprBuilder::MakeNull(string_type); > node_b = TreeExprBuilder::MakeNull(string_type); > equal_func = > TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, > arrow::boolean()); > condition = TreeExprBuilder::MakeCondition(equal_func); > std::shared_ptr<Filter> filter2; > status = Filter::Make(schema, condition, &filter2); > EXPECT_TRUE(status.ok()); > EXPECT_TRUE(filter1.get() != filter2.get()); > {code} > > Making {{LiteralToStream}} add quotes around the literal seems like a > quick-and-dirty fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)