Hi all,

This is another digest of open backlog issues, following the initial post
last month. It looks like we were able to close 3 of the 5 mentioned, so
thanks to those who picked up the issues and drove them to completion. New
list of issues as follows:

- Enable coverage CI jobs: https://github.com/apache/datafusion/issues/3678
  - I don't have much to add here as I haven't looked at it much, but
considering its separate from the main codebase (CI related) it might be
good to pickup for those not looking to go deep on the main codebase
- Some aggregates silently ignore IGNORE NULLS and ORDER BY on arguments:
https://github.com/apache/datafusion/issues/9924
  - This is mainly a review task, to check our aggregate functions to
ensure that if they support IGNORE NULLS they handle it appropriately, and
if they can be affected by ORDER BY they also handle that appropriately
- Support complete distinct usage for aggregate expressions:
https://github.com/apache/datafusion/issues/2406
  - This is similar to above in needing to do a comprehensive review of our
aggregate functions to see where fixes are needed; we'd need to look at
which aggregate functions already check for distinct and explicitly error
out, to see if we can implement handling for distinct, but we'd also need
to look at functions that DON'T check for distinct and incorrectly compute
their result without erroring out. I believe nth_value is of the latter kind
- Binary string (BYTEA, Binary) concatenation:
https://github.com/apache/datafusion/issues/12709
  - More of a straight forward implementation (I hope!); I've left a
comment on the issue with some pointers
- DataFrame write api should accept Overwrite option when the file already
exist: https://github.com/apache/datafusion/issues/4986
  - This is a bit interesting because the existing behaviour for the API is
not really intuitive; I've left a comment with more details on the issue

I'll repost the two remaining issues from last time:

- Support ANY operator: https://github.com/apache/datafusion/issues/2548
  - There was an attempt made but the PR went stale; I've closed the PR for
now to allow someone else to volunteer to pick it up, but it's worth taking
a look at the previous PR to see if we could build on top of it
- Support ALL operator: https://github.com/apache/datafusion/issues/2547
  - I think this will have a dependency on the ANY operator, as it'll be
easier to implement this in terms of that operator (see discussion on
issue), so this can wait

As before, I'll be happy to help review PRs and provide as much
clarification as I can for these issues, so feel free to tag me on them.

Cheers, Jeffrey

Reply via email to