Sikan Chen created ARROW-11549: ---------------------------------- Summary: Expression::ToString() doesn't distinguish null and string literal 'null', causing issues with FilterCacheKey Key: ARROW-11549 URL: https://issues.apache.org/jira/browse/ARROW-11549 Project: Apache Arrow Issue Type: Bug Components: C++, C++ - Gandiva Reporter: Sikan Chen
Gandiva's caching mechanism for filters relies on {{FilterCacheKey}} to return the correct cached filter. {{FilterCacheKey}}'s hash function factors in the string representation of a given expression, however {{Expression::ToString()}} doesn't really distinguish null and string literal 'null'. As a result, incorrect filters may be returned from the cache. In our case, we are building a SQL parser on top of gandiva, but, for instance, both of {code:java} WHERE null = null {code} and {code:java} WHERE 'null' = 'null' {code} result in the same string representation of gandiva expression: {code:java} bool equal((const string) null, (const string) null) {code} A simple test to demonstrate the issue: {code:java} auto f = field("foo", utf8()); auto schema = arrow::schema({f}); auto node_a = TreeExprBuilder::MakeStringLiteral("null"); auto node_b = TreeExprBuilder::MakeStringLiteral("null"); auto equal_func = TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean()); auto condition = TreeExprBuilder::MakeCondition(equal_func); std::shared_ptr<Filter> filter1; auto status = Filter::Make(schema, condition, &filter1); EXPECT_TRUE(status.ok()); auto string_type = std::make_shared<arrow::StringType>(); node_a = TreeExprBuilder::MakeNull(string_type); node_b = TreeExprBuilder::MakeNull(string_type); equal_func = TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, arrow::boolean()); condition = TreeExprBuilder::MakeCondition(equal_func); std::shared_ptr<Filter> filter2; status = Filter::Make(schema, condition, &filter2); EXPECT_TRUE(status.ok()); EXPECT_TRUE(filter1.get() != filter2.get()); {code} Making {{LiteralToStream}} adding quotes around the literal seems like a quick-and-dirty fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)