GitHub user logan-keede edited a discussion: More thorough contribution guideline
I am opening this discussion to discuss about how to approach refactoring and perhaps changes in general to make it easier for downstream repos and be more efficient with review process. This came up while discussing my GSoC 2025 proposal for "Optimizing compile time and binary size" with @ozankabak which expects a large amount of refactoring. After some research, I found that almost no Open Source Repository has something like Refactoring Guideline and it is reasonable generally it is not needed, general contribution guideline is enough. However, Datafusion is perhaps a bit too refactoring happy/needy. DataFusion :-  A repo with 17 times more commit then datafusion:-  Perhaps a direct comparison is not fair, because we do need refactoring. So the best we can do is to make it easier for everyone. ## Proposed Solution 1. Make a feature branch, Do all the Major refactoring there publish a Roadmap on Why this refactoring/change is necessary and what does it change. This is perhaps more useful for refactoring Epics like #14444. _suggested by @ozankabak over discord_ 2. Use 'cargo-semver-checks' to detect unintentional API breakages. Smallest things can break APIs in ways we can not predict. [Here](https://predr.ag/blog/semver-in-rust-tooling-breakage-and-edge-cases/) is an article about this. 3. add do's and don'ts in Guideline. Start with a tentative version and refine it over time. DataFusion already has a Contribution Guideline, which explain the general style with which we handle PRs and Issues but it does not go into great detail what to do and to not do. While this is not a big problem(if a problem at all) for more experienced member of community it is still good highlight Good and Bad Practice for the newer members. This also make sure that we have a DataFusion way of dealing with problems and make sure that there is no unexpected or uninformed(as much as possible) API changes/breaking. It will also save some reviewing bandwidth as reviewer will not have to explain same old common reasons for rejection again and again. It will be valuable to collect community's ideas on this and reviews of downstream maintainers on what kind of Datafusion issues they face that can be avoided through better policy in this discussion. GitHub link: https://github.com/apache/datafusion/discussions/15365 ---- This is an automatically sent email for github@datafusion.apache.org. To unsubscribe, please send an email to: github-unsubscr...@datafusion.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org