Hey everyone, Google Java Format + Spotless integration have been merged as part of https://github.com/apache/iceberg/pull/5266.
The big bang reformatting (aka *spotlessApply*) will happen as part of https://github.com/apache/iceberg/pull/5312 and once this PR is merged, please perform the following steps if you have open PRs: 1. Rebase your PR branch against the commit before the „Big Bang“ 2. Run `*./gradlew spotlessApply*` 3. Squash all commits 4. Rebase against the latest *HEAD* of the master branch 5. Force-push your branch This should hopefully produce a minimum of conflicts on PRs. Additionally, make sure to install the *google-java-format* plugin for your IDE as outlined in https://github.com/apache/iceberg-docs/pull/125 so that you can format code inside the IDE as well. Eduard On Fri, Jul 15, 2022 at 12:45 PM Eduard Tudenhoefner <edu...@tabular.io> wrote: > That's correct Ryan. The Google style is what can be applied across tools > and Spotless is the tool that we use to check and *apply* that style (and > also auto-apply the license header). The *apply* part is the important > piece that Checkstyle for example is missing (as it's tedious to manually > apply what *checkstyle:check* complains about). > > Additionally, Spotless > <https://github.com/diffplug/spotless/tree/main/plugin-gradle> supports a > bunch of other languages, so it would even be possible to use it for > auto-formatting Gradle files (or Scala files using scalafmt) if we wanted > to, but for now I would probably focus on the Java codebase. > > Eduard > > On Thu, Jul 14, 2022 at 6:58 PM Ryan Blue <b...@tabular.io> wrote: > >> Okay, I understand. So it isn't that spotless is supported across tools, >> it is that google style is supported and we can use Spotless to both check >> and apply that style. Is that correct? >> >> On Thu, Jul 14, 2022 at 9:56 AM Eduard Tudenhoefner <edu...@tabular.io> >> wrote: >> >>> Yeah it's how Dmitri said, the Google style is most likely on purpose >>> not configurable, because otherwise it's getting more difficult to achieve >>> the same formatting results across IDEs & CMD line. >>> >>> >>> On Wed, Jul 13, 2022 at 7:39 PM Dmitri Bourlatchkov < >>> dmitri.bourlatch...@dremio.com> wrote: >>> >>>> Yeah, the Google style has very specific wrapping limits and javadoc >>>> format. It is not configurable, if we go with it. >>>> >>>> From my POV, using another style would be fine. However, with a custom >>>> style it can be hard and tricky to ensure that all IDEs enforce it the same >>>> way as spotless. >>>> >>>> The Google style has a plugin at least for IntelliJ (and likely for >>>> Eclipse too) so it will behave the same way in IDE and cmd. line build. >>>> >>>> On Wed, Jul 13, 2022 at 1:28 PM Ryan Blue <b...@tabular.io> wrote: >>>> >>>>> Is there any value in trying to get Spotless to match the current >>>>> format as closely as possible? It would be great to have fewer changes. >>>>> >>>>> For example, when I tested the apply, I see this: >>>>> >>>>> - /** >>>>> - * Returns an initialized {@link AliyunProperties} >>>>> - */ >>>>> + /** Returns an initialized {@link AliyunProperties} */ >>>>> >>>>> Changes like that don’t seem very valuable to me. Similarly, it looks >>>>> like a lot of text is wrapped at a different line length. If we can alter >>>>> that, we could easily avoid a lot of changes. >>>>> >>>>> Ryan >>>>> >>>>> On Wed, Jul 13, 2022 at 7:41 AM Eduard Tudenhoefner <edu...@tabular.io> >>>>> wrote: >>>>> >>>>>> As a first step, I created >>>>>> https://github.com/apache/iceberg/pull/5266 which configures >>>>>> Spotless to use the Google Java Format and also apply the correct >>>>>> copyright >>>>>> header for Java files. >>>>>> >>>>>> Once this PR is merged, the next steps would be: >>>>>> >>>>>> - removing conflicting Checkstyle rules that are not in line with >>>>>> the Google format >>>>>> - formatting the entire code base via `*./gradlew spotlessApply*` >>>>>> - setting `*enforceCheck*` to `*true*` in >>>>>> >>>>>> https://github.com/apache/iceberg/blob/80318d8cfbeb0d96d0afc27c84bc3dbddde35344/baseline.gradle#L48 >>>>>> so that validation fails if code isn't properly formatted >>>>>> - updating docs around the current Formatter usage and how to >>>>>> configure Eclipse/IntelliJ >>>>>> >>>>>> The first 3 steps should be done together as part of the big bang. >>>>>> >>>>>> Eduard >>>>>> >>>>>> On Mon, Jul 11, 2022 at 10:54 PM Ryan Blue <b...@apache.org> wrote: >>>>>> >>>>>>> Okay, it sounds like there's mostly agreement for going with >>>>>>> spotless. Let's try that out. We'll work on some changes to add >>>>>>> spotless so >>>>>>> that `spotlessApply` works. Then we can do the big bang migration >>>>>>> (which I >>>>>>> also agree is the best option) just before the 1.0. >>>>>>> >>>>>>> Thanks, everyone! >>>>>>> >>>>>>> On Mon, Jul 11, 2022 at 11:50 AM Dmitri Bourlatchkov < >>>>>>> dmitri.bourlatch...@dremio.com> wrote: >>>>>>> >>>>>>>> My experience with the Google Code Style + Spotless was positive >>>>>>>> too. >>>>>>>> >>>>>>>> I'd be fine with another code style as long as it is >>>>>>>> "deterministic" (e.g. does not make changes on repeated execution) and >>>>>>>> works in IntelliJ IDEA / Eclipse / etc. >>>>>>>> >>>>>>>> Regarding cherry-picking into older branches, I think Robert's >>>>>>>> suggestion can be tweaked slightly to be helpful there too: >>>>>>>> >>>>>>>> 1. Checkout old branch >>>>>>>> 2. Apply the new style (run gradle ...) >>>>>>>> 3. Cherry-pick without committing >>>>>>>> 4. Manually revert to old style >>>>>>>> 5. Commit >>>>>>>> 6. Reset to original branch HEAD >>>>>>>> 7. Cherry pick commit 5 again >>>>>>>> >>>>>>>> It's a bit lengthy and may be a tedious process, but it should >>>>>>>> allow applying the git-level changes mostly automatically. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Dmitri. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jul 8, 2022 at 2:53 AM Robert Stupp <sn...@snazy.de> wrote: >>>>>>>> >>>>>>>>> From my experience, it’s a big win to have automatic code >>>>>>>>> formatting. >>>>>>>>> >>>>>>>>> In projectnessie we use automatic code formatting for all >>>>>>>>> languages and haven’t serious issues with Spotless. It is just nice >>>>>>>>> to not >>>>>>>>> have to bike shed about whitespaces, line breaks, brackets, etc. It >>>>>>>>> was a >>>>>>>>> bit of discussion, because people had bad memories from past >>>>>>>>> experiences >>>>>>>>> with automatic code formatting breaking code and introducing subtle >>>>>>>>> bugs. >>>>>>>>> >>>>>>>>> I think that using code styles that „do not allow bike shedding“ >>>>>>>>> (Google Code Style) are a very good option. >>>>>>>>> >>>>>>>>> So far none of us has seen issues with any of the Spotless code >>>>>>>>> formatters that we use: XML, Kotlin/Gradle, Kotlin, Antlr4, Java, >>>>>>>>> Scala - >>>>>>>>> relying on the „standard“ settings w/o any customizations. We use this >>>>>>>>> piece of code, externalized into an internal Gradle plugin: >>>>>>>>> https://github.com/projectnessie/gradle-build-plugins/blob/main/spotless/src/main/kotlin/org/projectnessie/buildtools/spotless/SpotlessHelperPlugin.kt >>>>>>>>> For >>>>>>>>> Iceberg, it would probably be nice to have some Groovy code >>>>>>>>> formatting for >>>>>>>>> the build scripts as well. >>>>>>>>> >>>>>>>>> Sure, the migration will add some pain. IMHO the best option is a >>>>>>>>> „big bang“ across the whole code base, because it happens only once. >>>>>>>>> Migrating one module after another is a „repeated series of pains“. >>>>>>>>> >>>>>>>>> Since the result of a `./gradlew spotlessApply` is deterministic, >>>>>>>>> people that have open PRs could: >>>>>>>>> 1. Rebase their PR branch against the commit before the „Big Bang“ >>>>>>>>> 2. Include a commit with the necessary Gradle build change (one >>>>>>>>> the only contains the changes to add Spotless) >>>>>>>>> 3. Do the `./gradlew spotlessApply` >>>>>>>>> 4. Squash all commits in the PR-branch >>>>>>>>> 5. Rebase again - against the HEAD of the master branch >>>>>>>>> 6. Force-push PR-branch >>>>>>>>> Because git is „clever enough“ to eliminate the >>>>>>>>> „duplicated/unrelated changes“, the final result of the above steps >>>>>>>>> is just >>>>>>>>> the diff with the changes for the open PR. >>>>>>>>> >>>>>>>>> >>>>>>>>> Am 08.07.2022 um 00:59 schrieb Ryan Blue <b...@tabular.io>: >>>>>>>>> >>>>>>>>> We were just talking about this proposal internally. I think it >>>>>>>>> would be great to have automatic code formatting, especially since we >>>>>>>>> have >>>>>>>>> to point out a lot of changes manually. The main question is how to >>>>>>>>> get >>>>>>>>> there without too much disruption. This came up in our discussions >>>>>>>>> around >>>>>>>>> the upcoming 1.0 release, since that may be a good opportunity to >>>>>>>>> make all >>>>>>>>> of the code changes. >>>>>>>>> >>>>>>>>> For background, the main concern about adding something like this >>>>>>>>> is applying all of the changes needed to get the existing code to >>>>>>>>> conform >>>>>>>>> to the new style. That is really disruptive because it will cause all >>>>>>>>> of >>>>>>>>> the PRs to need to be rebased and makes it really difficult to >>>>>>>>> cherry-pick >>>>>>>>> changes from after the code formatting happens to branches that were >>>>>>>>> created before code formatting. The 1.0 release makes a good >>>>>>>>> opportunity >>>>>>>>> because we are making other changes (removing deprecations) and will >>>>>>>>> hopefully have people upgrading their branches to the new major >>>>>>>>> version, >>>>>>>>> rather than cherry picking. >>>>>>>>> >>>>>>>>> This is as good a time as any to add automatic code formatting, >>>>>>>>> but it's up to the community: so should we refromat the project and >>>>>>>>> apply >>>>>>>>> spotless code formatting everywhere? I'm interested to hear opinions! >>>>>>>>> >>>>>>>>> Ryan >>>>>>>>> >>>>>>>>> On Thu, Mar 10, 2022 at 3:00 AM Eduard Tudenhoefner < >>>>>>>>> edu...@dremio.com> wrote: >>>>>>>>> >>>>>>>>>> Hello everyone, >>>>>>>>>> >>>>>>>>>> I would like to get the discussion started around automatic code >>>>>>>>>> formatting + enforcing and how we get there. >>>>>>>>>> >>>>>>>>>> Currently we use Checkstyle *check* to enforce formatting. >>>>>>>>>> However, the problem with that is that you still have to manually do >>>>>>>>>> the >>>>>>>>>> actual formatting. >>>>>>>>>> >>>>>>>>>> What I would like to propose is the usage of *Spotless* ( >>>>>>>>>> https://github.com/diffplug/spotless) for *checking* and >>>>>>>>>> *enforcing* Java code style (it can also enforce code style for >>>>>>>>>> Scala, Markdown, ... btw). Spotless is being used by many projects ( >>>>>>>>>> https://github.com/search?l=gradle&q=spotless&type=Code) and >>>>>>>>>> comes essentially with two tasks: >>>>>>>>>> * *spotlessCheck*: Checks that sourcecode satisfies formatting >>>>>>>>>> steps >>>>>>>>>> * *spotlessApply*: Applies code formatting steps to sourcecode >>>>>>>>>> in-place >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *Code format* >>>>>>>>>> >>>>>>>>>> The problem with code format is that there is no single format >>>>>>>>>> that can satisfy the preferences of everybody. However, from my >>>>>>>>>> experience, >>>>>>>>>> once people start to use *any* code format that produces >>>>>>>>>> consistent results across *Eclipse**/IntelliJ/cmd line*, people >>>>>>>>>> stop worrying about code format details. >>>>>>>>>> This is also one of the reasons why the creators of Go decided to >>>>>>>>>> have a code formatter built-in ( >>>>>>>>>> https://go.dev/doc/effective_go#formatting): >>>>>>>>>> >>>>>>>>>> *Formatting issues are the most contentious but the least >>>>>>>>>>> consequential. People can adapt to different formatting styles but >>>>>>>>>>> it's >>>>>>>>>>> better if they don't have to, and less time is devoted to the topic >>>>>>>>>>> if >>>>>>>>>>> everyone adheres to the same style. The problem is how to approach >>>>>>>>>>> this >>>>>>>>>>> Utopia without a long prescriptive style guide.* >>>>>>>>>>> *With Go we take an unusual approach and let the machine take >>>>>>>>>>> care of most formatting issues. The gofmt program (also available >>>>>>>>>>> as go >>>>>>>>>>> fmt, which operates at the package level rather than source file >>>>>>>>>>> level) >>>>>>>>>>> reads a Go program and emits the source in a standard style of >>>>>>>>>>> indentation >>>>>>>>>>> and vertical alignment, retaining and if necessary reformatting >>>>>>>>>>> comments.* >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I would like to propose using the Google Java Format with >>>>>>>>>> Spotless. The reason for this format is essentially that this is a >>>>>>>>>> widely-adopted code format that is designed specifically for code >>>>>>>>>> reviews >>>>>>>>>> (since we're spending more time reviewing code than writing it). >>>>>>>>>> Additionally, it produces consistent formatting results across >>>>>>>>>> *Eclipse**/IntelliJ/cmd line*, which I think is another very >>>>>>>>>> important factor. >>>>>>>>>> >>>>>>>>>> Thus, our initial Gradle spotless configuration could look >>>>>>>>>> similar to the above below: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *pluginManager.withPlugin('com.diffplug.spotless') { spotless >>>>>>>>>> { // don't run spotlessCheck during gradle check task during the >>>>>>>>>> transition phase enforceCheck = false java { target >>>>>>>>>> 'src/main/java/**/*.java', 'src/test/java/**/*.java', >>>>>>>>>> 'src/jmh/java/**/*.java' googleJavaFormat() } }}* >>>>>>>>>> >>>>>>>>>> We don't have to use Google Java Format. Spotless also supports >>>>>>>>>> formatting the code with other formats, but from previous experience >>>>>>>>>> the >>>>>>>>>> Google Java Format seemed to be really the only one to produce >>>>>>>>>> consistent >>>>>>>>>> results across *Eclipse**/IntelliJ/cmd line*. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *How do we get to a point where the entire codebase is properly >>>>>>>>>> formatted (and enforceCheck = false can be removed)?* >>>>>>>>>> >>>>>>>>>> Now this is a difficult question. Obviously we don't want to have >>>>>>>>>> a single *format-everything* commit, as that would affect lots >>>>>>>>>> of in-flight PRs. >>>>>>>>>> >>>>>>>>>> There would have to be some form of gradual formatting, for >>>>>>>>>> example module by module. Spotless offers something called Ratched ( >>>>>>>>>> https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet) >>>>>>>>>> that allows to enforce code format gradually (but I'm not sure this >>>>>>>>>> would >>>>>>>>>> be a good thing either). >>>>>>>>>> >>>>>>>>>> How exactly we'd like to approach this transitioning phase this >>>>>>>>>> is a completely separate discussion, but I feel like at least we >>>>>>>>>> could get >>>>>>>>>> the ball rolling so that we make it also easier for newcomers to >>>>>>>>>> contribute >>>>>>>>>> to the project, since it would be straightforward for them to make >>>>>>>>>> their >>>>>>>>>> PRs adhere to the code format and also save time during PR reviews. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Eduard >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ryan Blue >>>>>>>>> Tabular >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ryan Blue >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Tabular >>>>> >>>> >> >> -- >> Ryan Blue >> Tabular >> >