[ https://issues.apache.org/jira/browse/ARROW-11549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson updated ARROW-11549: ------------------------------------ Summary: [C++][Gandiva] Expression::ToString() doesn't distinguish null and string literal 'null', causing issues with FilterCacheKey (was: Expression::ToString() doesn't distinguish null and string literal 'null', causing issues with FilterCacheKey) > [C++][Gandiva] Expression::ToString() doesn't distinguish null and string > literal 'null', causing issues with FilterCacheKey > ---------------------------------------------------------------------------------------------------------------------------- > > Key: ARROW-11549 > URL: https://issues.apache.org/jira/browse/ARROW-11549 > Project: Apache Arrow > Issue Type: Bug > Components: C++, C++ - Gandiva > Reporter: Sikan Chen > Priority: Critical > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Gandiva's caching mechanism for filters relies on {{FilterCacheKey}} to > return the correct cached filter. {{FilterCacheKey}}'s hash function factors > in the string representation of a given expression, however > {{Expression::ToString()}} doesn't really distinguish null and string literal > 'null'. As a result, incorrect filters may be returned from the cache. > In our case, we are building a SQL parser on top of gandiva, but, for > instance, both of > {code:java} > WHERE null = null > {code} > and > {code:java} > WHERE 'null' = 'null' > {code} > result in the same string representation of gandiva expression: > {code:java} > bool equal((const string) null, (const string) null) > {code} > A simple test to demonstrate the issue: > {code:java} > auto f = field("foo", utf8()); > auto schema = arrow::schema({f}); > auto node_a = TreeExprBuilder::MakeStringLiteral("null"); > auto node_b = TreeExprBuilder::MakeStringLiteral("null"); > auto equal_func = > TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, > arrow::boolean()); > auto condition = TreeExprBuilder::MakeCondition(equal_func); > std::shared_ptr<Filter> filter1; > auto status = Filter::Make(schema, condition, &filter1); > EXPECT_TRUE(status.ok()); > auto string_type = std::make_shared<arrow::StringType>(); > node_a = TreeExprBuilder::MakeNull(string_type); > node_b = TreeExprBuilder::MakeNull(string_type); > equal_func = > TreeExprBuilder::MakeFunction("equal", {node_a, node_b}, > arrow::boolean()); > condition = TreeExprBuilder::MakeCondition(equal_func); > std::shared_ptr<Filter> filter2; > status = Filter::Make(schema, condition, &filter2); > EXPECT_TRUE(status.ok()); > EXPECT_TRUE(filter1.get() != filter2.get()); // fails here > {code} > > Making {{LiteralToStream}} add quotes around the literal seems like a > quick-and-dirty fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)