alamb commented on code in PR #14038: URL: https://github.com/apache/datafusion/pull/14038#discussion_r1909473831
########## datafusion/physical-expr-common/src/sort_expr.rs: ########## @@ -540,6 +557,33 @@ impl LexRequirement { .collect(), ) } + + /// Constructs a duplicate-free `LexOrderingReq` by filtering out + /// duplicate entries that have same physical expression inside. + /// + /// For example, `vec![a Some(ASC), a Some(DESC)]` collapses to `vec![a + /// Some(ASC)]`. + /// + /// It will also filter out entries that are ordered if the next entry is; + /// for instance, `vec![floor(a) Some(ASC), a Some(ASC)]` will be collapsed to + /// `vec![a Some(ASC)]`. + pub fn collapse(self) -> Self { + let mut output = Vec::<PhysicalSortRequirement>::new(); + let mut exprs = IndexSet::new(); + let mut reqs = vec![]; + for item in self { + let PhysicalSortRequirement { expr, options: req } = item; + // new insertion + if exprs.insert(expr) { + reqs.push(req); + } + } + debug_assert_eq!(reqs.len(), exprs.len()); + for (expr, req) in izip!(exprs, reqs) { + output.push(PhysicalSortRequirement::new(expr, req)); + } + LexRequirement::new(output) + } Review Comment: After some more thought I reverted this change and restored it to the original version. My rationale was that I don't (yet) have evidence that this is a bottleneck for processing and thus this seems a bit like a premature optimization ########## datafusion/physical-expr-common/src/sort_expr.rs: ########## @@ -540,6 +557,33 @@ impl LexRequirement { .collect(), ) } + + /// Constructs a duplicate-free `LexOrderingReq` by filtering out + /// duplicate entries that have same physical expression inside. + /// + /// For example, `vec![a Some(ASC), a Some(DESC)]` collapses to `vec![a + /// Some(ASC)]`. + /// + /// It will also filter out entries that are ordered if the next entry is; Review Comment: I double checked -- and I agree with your assesment so have removed this comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org