Austin, I don’t think anyone questions the value of consistent code or the value of automatic formatting. It is quite the opposite, actually. I think everyone on this thread actually cares about those things :)
Was automatic formatting helpful? For sure! Did changing the line limit from 120 to 100 on a 4+ year project brought a lot of value, for example? I am not sure. - Anton > On Jul 29, 2022, at 9:44 AM, Yufei Gu <flyrain...@gmail.com> wrote: > > It is the right direction to remove the trivial part of the coding. Just > wondering why we change the max line length from 120 to 100. > > Best, > > Yufei > > `This is not a contribution` > > > On Fri, Jul 29, 2022 at 9:21 AM Austin Bennett <whatwouldausti...@gmail.com > <mailto:whatwouldausti...@gmail.com>> wrote: > Probably worthwhile to really think about the long-term desired outcomes. > Even a large, one-time, disruption of the codebase for the sake of future > consistency could be quite valuable. Remember you'll also lose fidelity with > history/blame. At some point a 'breaking' ( in a different sense ) change > might be useful rather than anchoring on what exists. > > > > On Fri, Jul 29, 2022 at 8:59 AM Anton Okolnychyi > <aokolnyc...@apple.com.invalid> wrote: > On a side note, having automatic formatting was really nice. I think it is > the right direction. If we only had a way to tweak a few rules like the max > line length, one-line Javadoc and maybe consider using the Google format for > 1.8 cause we are using the 1.7 format at the moment, which was not designed > for lambdas. That way we may have all the benefits of automatic formatting > with less disruption to the code. > > - Anton > >> On Jul 28, 2022, at 7:35 PM, Anton Okolnychyi <aokolnyc...@apple.com.INVALID >> <mailto:aokolnyc...@apple.com.INVALID>> wrote: >> >> I had a chance to rebase one of my PRs. It is really hard to justify the new >> max line length of 100 chars. I think the most unreadable code is usually >> around places when single statements are split into multiple lines. Since >> Java has explicit types and we prefer full and elaborate class/var names, >> the new line limit alone triggered so many changes and requires splitting a >> lot of statements that would previously fit on a single line. >> >> The max line length is one of the reasons why Palantir folks forked the >> Google format. >> >> https://github.com/palantir/palantir-java-format >> <https://github.com/palantir/palantir-java-format> >> >> - Anton >> >> >>> On Jul 28, 2022, at 7:42 AM, Eduard Tudenhoefner <edu...@tabular.io >>> <mailto:edu...@tabular.io>> wrote: >>> >>> Thanks for your honest feedback Anton, which I really appreciate. I agree >>> with the points you brought up. >>> >>> I don't prefer the Google Java Format either, but it is at least something >>> that produces consistent results across tools. >>> If you ask me, I would love to customize the code format and have it as >>> close as possible to what the Iceberg project had. But one of the reasons >>> why the format isn't customizable is because it is difficult to support >>> such customizations across tools. >>> >>> The problem with code format is that there is no single format that can >>> satisfy the preferences of everybody. However, from my experience, once >>> people start to use any code format that produces consistent results >>> across Eclipse/IntelliJ/cmd line, people stop worrying about code format >>> details. >>> >>> This is also one of the reasons why the creators of Go decided to have a >>> code formatter built-in (https://go.dev/doc/effective_go#formatting >>> <https://go.dev/doc/effective_go#formatting>). >>> >>> In the long run I think the decision to use Spotless + code formatting >>> helps, especially for new people that want to contribute and don't want to >>> feel frustrated due to review feedback around code style. >>> >>> Eduard >>> >>> On Thu, Jul 28, 2022 at 6:15 AM Anton Okolnychyi <aokolnyc...@apple.com >>> <mailto:aokolnyc...@apple.com>> wrote: >>> I am late to the discussion, but I would like to express my opinion. >>> >>> As someone who deeply cares about the code consistency and leaves a lot of >>> style-related nits, I am definitely happy to see efforts to simplify the >>> life of the reviewers and automate the formatting process. Explaining the >>> code style to new contributors consumes a lot of time and results in more >>> iterations on PRs. Automating this would be great! >>> >>> However, I am concerned regarding the changes merged in PR #5312. >>> >>> - Many changes that went in that PR were against the original code style of >>> the project that we, committers and contributors, tried to maintain for all >>> these years. I think we had a lot of well-structured and consistent places, >>> which attracted devs to Iceberg. >>> - I feel the code quality dropped as some places were written in the old >>> style and now look really weird. If we followed the new style originally, >>> maybe, it would look much better as we would structure the code differently. >>> - As a someone whose team maintains an internal fork that is very close to >>> master (I know we are not the only ones), I feel such a massive change will >>> be a challenge to cherry-pick. I am not convinced introducing automatic >>> formatting had to be so radical. >>> >>> The first two points concern me the most. Since I am late to the >>> discussion, it would not be fair for me to request changes. However, I >>> strongly suggest the community to reconsider the approach we took while we >>> haven’t merged many changes on top of PR #5312. >>> >>> To sum up, I definitely support automating the formatting to simplify >>> reviews and help new contributors but I would try to come with a way to >>> make it less radical and follow whatever we had all these years. We did >>> discuss some of the formatting guidelines a few years ago and it is all >>> different now. >>> >>> - Anton >>> >>>> On Jul 27, 2022, at 11:40 AM, Steven Wu <stevenz...@gmail.com >>>> <mailto:stevenz...@gmail.com>> wrote: >>>> >>>> Eduard, thank you for driving this effort! Great work! >>>> >>>> On Wed, Jul 27, 2022 at 11:11 AM Eduard Tudenhoefner <edu...@tabular.io >>>> <mailto:edu...@tabular.io>> wrote: >>>> Hey everyone, >>>> >>>> Google Java Format + Spotless integration have been merged as part of >>>> https://github.com/apache/iceberg/pull/5266 >>>> <https://github.com/apache/iceberg/pull/5266>. >>>> >>>> The big bang reformatting (aka spotlessApply) will happen as part of >>>> https://github.com/apache/iceberg/pull/5312 >>>> <https://github.com/apache/iceberg/pull/5312> and once this PR is merged, >>>> please perform the following steps if you have open PRs: >>>> >>>> 1. Rebase your PR branch against the commit before the „Big Bang“ >>>> 2. Run `./gradlew spotlessApply` >>>> 3. Squash all commits >>>> 4. Rebase against the latest HEAD of the master branch >>>> 5. Force-push your branch >>>> >>>> This should hopefully produce a minimum of conflicts on PRs. >>>> >>>> Additionally, make sure to install the google-java-format plugin for your >>>> IDE as outlined in https://github.com/apache/iceberg-docs/pull/125 >>>> <https://github.com/apache/iceberg-docs/pull/125> so that you can format >>>> code inside the IDE as well. >>>> >>>> Eduard >>>> >>>> On Fri, Jul 15, 2022 at 12:45 PM Eduard Tudenhoefner <edu...@tabular.io >>>> <mailto:edu...@tabular.io>> wrote: >>>> That's correct Ryan. The Google style is what can be applied across tools >>>> and Spotless is the tool that we use to check and apply that style (and >>>> also auto-apply the license header). The apply part is the important piece >>>> that Checkstyle for example is missing (as it's tedious to manually apply >>>> what checkstyle:check complains about). >>>> >>>> Additionally, Spotless >>>> <https://github.com/diffplug/spotless/tree/main/plugin-gradle> supports a >>>> bunch of other languages, so it would even be possible to use it for >>>> auto-formatting Gradle files (or Scala files using scalafmt) if we wanted >>>> to, but for now I would probably focus on the Java codebase. >>>> >>>> Eduard >>>> >>>> On Thu, Jul 14, 2022 at 6:58 PM Ryan Blue <b...@tabular.io >>>> <mailto:b...@tabular.io>> wrote: >>>> Okay, I understand. So it isn't that spotless is supported across tools, >>>> it is that google style is supported and we can use Spotless to both check >>>> and apply that style. Is that correct? >>>> >>>> On Thu, Jul 14, 2022 at 9:56 AM Eduard Tudenhoefner <edu...@tabular.io >>>> <mailto:edu...@tabular.io>> wrote: >>>> Yeah it's how Dmitri said, the Google style is most likely on purpose not >>>> configurable, because otherwise it's getting more difficult to achieve the >>>> same formatting results across IDEs & CMD line. >>>> >>>> >>>> On Wed, Jul 13, 2022 at 7:39 PM Dmitri Bourlatchkov >>>> <dmitri.bourlatch...@dremio.com <mailto:dmitri.bourlatch...@dremio.com>> >>>> wrote: >>>> Yeah, the Google style has very specific wrapping limits and javadoc >>>> format. It is not configurable, if we go with it. >>>> >>>> From my POV, using another style would be fine. However, with a custom >>>> style it can be hard and tricky to ensure that all IDEs enforce it the >>>> same way as spotless. >>>> >>>> The Google style has a plugin at least for IntelliJ (and likely for >>>> Eclipse too) so it will behave the same way in IDE and cmd. line build. >>>> >>>> On Wed, Jul 13, 2022 at 1:28 PM Ryan Blue <b...@tabular.io >>>> <mailto:b...@tabular.io>> wrote: >>>> Is there any value in trying to get Spotless to match the current format >>>> as closely as possible? It would be great to have fewer changes. >>>> >>>> For example, when I tested the apply, I see this: >>>> >>>> - /** >>>> - * Returns an initialized {@link AliyunProperties} >>>> - */ >>>> + /** Returns an initialized {@link AliyunProperties} */ >>>> Changes like that don’t seem very valuable to me. Similarly, it looks like >>>> a lot of text is wrapped at a different line length. If we can alter that, >>>> we could easily avoid a lot of changes. >>>> >>>> Ryan >>>> >>>> >>>> On Wed, Jul 13, 2022 at 7:41 AM Eduard Tudenhoefner <edu...@tabular.io >>>> <mailto:edu...@tabular.io>> wrote: >>>> As a first step, I created https://github.com/apache/iceberg/pull/5266 >>>> <https://github.com/apache/iceberg/pull/5266> which configures Spotless to >>>> use the Google Java Format and also apply the correct copyright header for >>>> Java files. >>>> >>>> Once this PR is merged, the next steps would be: >>>> removing conflicting Checkstyle rules that are not in line with the Google >>>> format >>>> formatting the entire code base via `./gradlew spotlessApply` >>>> setting `enforceCheck` to `true` in >>>> https://github.com/apache/iceberg/blob/80318d8cfbeb0d96d0afc27c84bc3dbddde35344/baseline.gradle#L48 >>>> >>>> <https://github.com/apache/iceberg/blob/80318d8cfbeb0d96d0afc27c84bc3dbddde35344/baseline.gradle#L48> >>>> so that validation fails if code isn't properly formatted >>>> updating docs around the current Formatter usage and how to configure >>>> Eclipse/IntelliJ >>>> The first 3 steps should be done together as part of the big bang. >>>> >>>> Eduard >>>> >>>> On Mon, Jul 11, 2022 at 10:54 PM Ryan Blue <b...@apache.org >>>> <mailto:b...@apache.org>> wrote: >>>> Okay, it sounds like there's mostly agreement for going with spotless. >>>> Let's try that out. We'll work on some changes to add spotless so that >>>> `spotlessApply` works. Then we can do the big bang migration (which I also >>>> agree is the best option) just before the 1.0. >>>> >>>> Thanks, everyone! >>>> >>>> On Mon, Jul 11, 2022 at 11:50 AM Dmitri Bourlatchkov >>>> <dmitri.bourlatch...@dremio.com <mailto:dmitri.bourlatch...@dremio.com>> >>>> wrote: >>>> My experience with the Google Code Style + Spotless was positive too. >>>> >>>> I'd be fine with another code style as long as it is "deterministic" (e.g. >>>> does not make changes on repeated execution) and works in IntelliJ IDEA / >>>> Eclipse / etc. >>>> >>>> Regarding cherry-picking into older branches, I think Robert's suggestion >>>> can be tweaked slightly to be helpful there too: >>>> >>>> 1. Checkout old branch >>>> 2. Apply the new style (run gradle ...) >>>> 3. Cherry-pick without committing >>>> 4. Manually revert to old style >>>> 5. Commit >>>> 6. Reset to original branch HEAD >>>> 7. Cherry pick commit 5 again >>>> >>>> It's a bit lengthy and may be a tedious process, but it should allow >>>> applying the git-level changes mostly automatically. >>>> >>>> Cheers, >>>> Dmitri. >>>> >>>> >>>> On Fri, Jul 8, 2022 at 2:53 AM Robert Stupp <sn...@snazy.de >>>> <mailto:sn...@snazy.de>> wrote: >>>> From my experience, it’s a big win to have automatic code formatting. >>>> >>>> In projectnessie we use automatic code formatting for all languages and >>>> haven’t serious issues with Spotless. It is just nice to not have to bike >>>> shed about whitespaces, line breaks, brackets, etc. It was a bit of >>>> discussion, because people had bad memories from past experiences with >>>> automatic code formatting breaking code and introducing subtle bugs. >>>> >>>> I think that using code styles that „do not allow bike shedding“ (Google >>>> Code Style) are a very good option. >>>> >>>> So far none of us has seen issues with any of the Spotless code formatters >>>> that we use: XML, Kotlin/Gradle, Kotlin, Antlr4, Java, Scala - relying on >>>> the „standard“ settings w/o any customizations. We use this piece of code, >>>> externalized into an internal Gradle plugin: >>>> https://github.com/projectnessie/gradle-build-plugins/blob/main/spotless/src/main/kotlin/org/projectnessie/buildtools/spotless/SpotlessHelperPlugin.kt >>>> >>>> <https://github.com/projectnessie/gradle-build-plugins/blob/main/spotless/src/main/kotlin/org/projectnessie/buildtools/spotless/SpotlessHelperPlugin.kt> >>>> For Iceberg, it would probably be nice to have some Groovy code >>>> formatting for the build scripts as well. >>>> >>>> Sure, the migration will add some pain. IMHO the best option is a „big >>>> bang“ across the whole code base, because it happens only once. Migrating >>>> one module after another is a „repeated series of pains“. >>>> >>>> Since the result of a `./gradlew spotlessApply` is deterministic, people >>>> that have open PRs could: >>>> 1. Rebase their PR branch against the commit before the „Big Bang“ >>>> 2. Include a commit with the necessary Gradle build change (one the only >>>> contains the changes to add Spotless) >>>> 3. Do the `./gradlew spotlessApply` >>>> 4. Squash all commits in the PR-branch >>>> 5. Rebase again - against the HEAD of the master branch >>>> 6. Force-push PR-branch >>>> Because git is „clever enough“ to eliminate the „duplicated/unrelated >>>> changes“, the final result of the above steps is just the diff with the >>>> changes for the open PR. >>>> >>>> >>>>> Am 08.07.2022 um 00:59 schrieb Ryan Blue <b...@tabular.io >>>>> <mailto:b...@tabular.io>>: >>>>> >>>>> We were just talking about this proposal internally. I think it would be >>>>> great to have automatic code formatting, especially since we have to >>>>> point out a lot of changes manually. The main question is how to get >>>>> there without too much disruption. This came up in our discussions around >>>>> the upcoming 1.0 release, since that may be a good opportunity to make >>>>> all of the code changes. >>>>> >>>>> For background, the main concern about adding something like this is >>>>> applying all of the changes needed to get the existing code to conform to >>>>> the new style. That is really disruptive because it will cause all of the >>>>> PRs to need to be rebased and makes it really difficult to cherry-pick >>>>> changes from after the code formatting happens to branches that were >>>>> created before code formatting. The 1.0 release makes a good opportunity >>>>> because we are making other changes (removing deprecations) and will >>>>> hopefully have people upgrading their branches to the new major version, >>>>> rather than cherry picking. >>>>> >>>>> This is as good a time as any to add automatic code formatting, but it's >>>>> up to the community: so should we refromat the project and apply spotless >>>>> code formatting everywhere? I'm interested to hear opinions! >>>>> >>>>> Ryan >>>>> >>>>> On Thu, Mar 10, 2022 at 3:00 AM Eduard Tudenhoefner <edu...@dremio.com >>>>> <mailto:edu...@dremio.com>> wrote: >>>>> Hello everyone, >>>>> >>>>> I would like to get the discussion started around automatic code >>>>> formatting + enforcing and how we get there. >>>>> >>>>> Currently we use Checkstyle check to enforce formatting. However, the >>>>> problem with that is that you still have to manually do the actual >>>>> formatting. >>>>> >>>>> What I would like to propose is the usage of Spotless >>>>> (https://github.com/diffplug/spotless >>>>> <https://github.com/diffplug/spotless>) for checking and enforcing Java >>>>> code style (it can also enforce code style for Scala, Markdown, ... btw). >>>>> Spotless is being used by many projects >>>>> (https://github.com/search?l=gradle&q=spotless&type=Code >>>>> <https://github.com/search?l=gradle&q=spotless&type=Code>) and comes >>>>> essentially with two tasks: >>>>> * spotlessCheck: Checks that sourcecode satisfies formatting steps >>>>> * spotlessApply: Applies code formatting steps to sourcecode in-place >>>>> >>>>> >>>>> Code format >>>>> >>>>> The problem with code format is that there is no single format that can >>>>> satisfy the preferences of everybody. However, from my experience, once >>>>> people start to use any code format that produces consistent results >>>>> across Eclipse/IntelliJ/cmd line, people stop worrying about code format >>>>> details. >>>>> This is also one of the reasons why the creators of Go decided to have a >>>>> code formatter built-in (https://go.dev/doc/effective_go#formatting >>>>> <https://go.dev/doc/effective_go#formatting>): >>>>> >>>>> Formatting issues are the most contentious but the least consequential. >>>>> People can adapt to different formatting styles but it's better if they >>>>> don't have to, and less time is devoted to the topic if everyone adheres >>>>> to the same style. The problem is how to approach this Utopia without a >>>>> long prescriptive style guide. >>>>> With Go we take an unusual approach and let the machine take care of most >>>>> formatting issues. The gofmt program (also available as go fmt, which >>>>> operates at the package level rather than source file level) reads a Go >>>>> program and emits the source in a standard style of indentation and >>>>> vertical alignment, retaining and if necessary reformatting comments. >>>>> >>>>> >>>>> I would like to propose using the Google Java Format with Spotless. The >>>>> reason for this format is essentially that this is a widely-adopted code >>>>> format that is designed specifically for code reviews (since we're >>>>> spending more time reviewing code than writing it). >>>>> Additionally, it produces consistent formatting results across >>>>> Eclipse/IntelliJ/cmd line, which I think is another very important factor. >>>>> >>>>> Thus, our initial Gradle spotless configuration could look similar to the >>>>> above below: >>>>> >>>>> >>>>> pluginManager.withPlugin('com.diffplug.spotless') { >>>>> spotless { >>>>> // don't run spotlessCheck during gradle check task during the >>>>> transition phase >>>>> enforceCheck = false >>>>> java { >>>>> target 'src/main/java/**/*.java', 'src/test/java/**/*.java', >>>>> 'src/jmh/java/**/*.java' >>>>> googleJavaFormat() >>>>> } >>>>> } >>>>> } >>>>> >>>>> We don't have to use Google Java Format. Spotless also supports >>>>> formatting the code with other formats, but from previous experience the >>>>> Google Java Format seemed to be really the only one to produce consistent >>>>> results across Eclipse/IntelliJ/cmd line. >>>>> >>>>> >>>>> How do we get to a point where the entire codebase is properly formatted >>>>> (and enforceCheck = false can be removed)? >>>>> >>>>> Now this is a difficult question. Obviously we don't want to have a >>>>> single format-everything commit, as that would affect lots of in-flight >>>>> PRs. >>>>> >>>>> There would have to be some form of gradual formatting, for example >>>>> module by module. Spotless offers something called Ratched >>>>> (https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet >>>>> <https://github.com/diffplug/spotless/tree/main/plugin-gradle#ratchet>) >>>>> that allows to enforce code format gradually (but I'm not sure this would >>>>> be a good thing either). >>>>> >>>>> How exactly we'd like to approach this transitioning phase this is a >>>>> completely separate discussion, but I feel like at least we could get the >>>>> ball rolling so that we make it also easier for newcomers to contribute >>>>> to the project, since it would be straightforward for them to make their >>>>> PRs adhere to the code format and also save time during PR reviews. >>>>> >>>>> >>>>> Eduard >>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Tabular >>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Tabular >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Tabular >>> >> >