peter-toth commented on code in PR #13315:
URL: https://github.com/apache/datafusion/pull/13315#discussion_r1838577228


##########
datafusion/common/src/cse.rs:
##########
@@ -50,33 +50,63 @@ impl<T: HashNode + ?Sized> HashNode for Arc<T> {
     }
 }
 
+/// The `Normalizeable` trait defines a method to determine whether a node can 
be normalized.
+///
+/// Normalization is the process of converting a node into a canonical form 
that can be used
+/// to compare nodes for equality. This is useful in optimizations like Common 
Subexpression Elimination (CSE),
+/// where semantically equivalent nodes (e.g., `a + b` and `b + a`) should be 
treated as equal.
+pub trait Normalizeable {
+    fn can_normalize(&self) -> bool;
+}
+
+/// The `NormalizeEq` trait extends `Eq` and `Normalizeable` to provide a 
method for comparing
+/// normlized nodes in optimizations like Common Subexpression Elimination 
(CSE).
+///
+/// The `normalize_eq` method ensures that two nodes that are semantically 
equivalent (after normalization)
+/// are considered equal in CSE optimization, even if their original forms 
differ.
+///
+/// This trait allows for equality comparisons between nodes with equivalent 
semantics, regardless of their
+/// internal representations.
+pub trait NormalizeEq: Eq + Normalizeable {
+    fn normalize_eq(&self, other: &Self) -> bool;
+}
+
 /// Identifier that represents a [`TreeNode`] tree.
 ///
 /// This identifier is designed to be efficient and  "hash", "accumulate", 
"equal" and
 /// "have no collision (as low as possible)"
-#[derive(Debug, Eq, PartialEq)]
-struct Identifier<'n, N> {
+#[derive(Debug, Eq)]
+struct Identifier<'n, N: NormalizeEq> {
     // Hash of `node` built up incrementally during the first, visiting 
traversal.
     // Its value is not necessarily equal to default hash of the node. E.g. it 
is not
     // equal to `expr.hash()` if the node is `Expr`.
     hash: u64,
     node: &'n N,
 }
 
-impl<N> Clone for Identifier<'_, N> {
+impl<N: NormalizeEq> Clone for Identifier<'_, N> {
     fn clone(&self) -> Self {
         *self
     }
 }
-impl<N> Copy for Identifier<'_, N> {}
+impl<N: NormalizeEq> Copy for Identifier<'_, N> {}
 
-impl<N> Hash for Identifier<'_, N> {
+impl<N: NormalizeEq> Hash for Identifier<'_, N> {
     fn hash<H: Hasher>(&self, state: &mut H) {
         state.write_u64(self.hash);
     }
 }
 
-impl<'n, N: HashNode> Identifier<'n, N> {
+impl<N: NormalizeEq> PartialEq for Identifier<'_, N> {
+    fn eq(&self, other: &Self) -> bool {
+        self.node.normalize_eq(other.node)

Review Comment:
   Let's compare `hash`es as well. I know that we use `Identifier` in hashmap 
keys and there hashes need to match first to evaluate `eq`, but still this 
would look more `eq` like...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to