After a little tinkering, it seems like the major obstacle in implementing this via a custom formatter module is that sigil formatters are provided the file and line number, but not the column number, of each sigil literal's start; so there's no great way for any multiline sigil formatter to reason about how it should format its contents with respect to indentation.
On Wednesday, December 28, 2022 at 4:49:49 PM UTC-6 Christopher Keele wrote: > This feels like a feature request to me: I understand why sigils are > generally not touched by the formatter without plugins, but I feel like the > sigil_w included in the standard library should have smarter formatting by > default. > > *NOTE: I'm using Elixir 1.14.2 here to observe the behaviour of the > formatter, this may be out-of-date with the mainline branch.* > > > *General formatting of sigils* > > Conceptually, to the compiler, the contents of a sigil is a potentially > multi-line string. However, actually using a multiline string does get > forced into a format as expected: > > @words """ > > ONE > TWO > THREE > FOUR > FIVE > """ > |> String.split > |> Enum.filter(&String.contains?(&1, "F")) > > Elixir *literal* multiline strings have special semantics for stripping > the whitespace on the left, based on the indentation of the closing """. > Try increasing and decreasing the indentation of that lexeme and watch how > the formatter reacts. > > I'd personally intuitively expect the sigil_w case to do the same, but you > can see why we cannot apply multiline string semantics to every sigil—the > sigil macro, called at compile-time, receives the verbatim contents of the > string, extra whitespace and all. Correctly handling that whitespace, > including stripping it, is the job of the sigil itself, which may vary > depending on the intentions of the sigil developer. (Consider: a custom > sigil for parsing the Whitespace esolang > <https://en.wikipedia.org/wiki/Whitespace_(programming_language)> or a > python program has different multiline-stripping semantics than literal > multiline strings or sigil_w.) > > Since the formatter cannot know what whitespace semantics any particular > sigil expects, it cannot modify the contents of the string with the > knowledge that it will not impact the program, unlike multiline string > literals. So it will do absolutely no work on the sigil's contents; leaving > your awkward indentation in place. The good news is that if you correct the > indentation manually, knowing how this particular sigil handles whitespace, > that rewriting will pass formatting and stay unchanged. > > > *Formatting stdlib sigils* > > That being said, the documentation for extending the formatter > <https://hexdocs.pm/mix/main/Mix.Tasks.Format.html#module-plugins> is > very sigil-special-casing-oriented. It should be easy to implement a plugin > that knows how to normalize the currently un-touched > > ~w( > ONE > TWO > THREE > FOUR > FIVE > ) > > However, *I'd really imagine that the stdlib formatter would understand > the special whitespace semantics of the stdlib sigil_w and format it > out-of-the-box*. This is the feature request I see here. > > I also think that other stdlib sigil formatting could be improved; for > example I feel like > > ~D[2022-01-01 > ] > > should automatically be formatted to > > ~D[2022-01-01] > > without any plugins. > > > *Formatting module attributes* > > > Is there reason why when I pipe the module attribute that it gets > intended differently than when I do not pipe it (compare @other with @xs)? > > I believe this is an emergent behaviour of whatever order the formatter > calculates rules for determining: > > - the indentation of the module attribute's argument > - the indentation of the pipeline > - the indentation of the (list) argument to the pipeline > - the indentation of list items within the list > > These determinations add up in an unexpected way I do not understand. > Essentially, Pipelines want to have their indentation flush with the > leftmost character of their argument, so that you get: > > [1, 2, 3] > |> Enum.map(&(&1 * 2)) > |> Enum.reject(&(&1 < 5)) > |> length() > > [ > 1, > 2, > 3 > > ] > |> Enum.map(&(&1 * 2)) > |> Enum.reject(&(&1 < 5)) > |> length() > > > Somehow this interacts with how module attributes want to indent things, > and we get > > @nums [ > 1, > 2, > 3 > > ] > |> Enum.map(&(&1 * 2)) > |> Enum.reject(&(&1 < 5)) > |> length() > > > This does not seem like a bug per se, but I also personally think that > this should format as > > @nums [ > 1, > 2, > 3 > > ] > |> Enum.map(&(&1 * 2)) > |> Enum.reject(&(&1 < 5)) > |> length() > > > This seems like it would be a backwards-compatible enhancement. > > > *Summary* > > The combination of sigil_w not being internally normalized with sigil_w > whitespace semantics, alongside the current behaviour of multi-line > expressions in module attributes, leads to this particularly unexpected > appearance. > > I feel like improvements to both would be welcome in PRs. It may be worth > first discussing the impact of releasing changes to the formatter, though. > Even semantically backwards-compatible changes have the potential to lead > to a lot of syntactic line diff noise and churn when upgrading Elixir, so > I'm not certain if there is a more cautious release policy for such > things—such as only releasing major formatter changes in minor version > bumps. > On Wednesday, December 28, 2022 at 5:59:17 AM UTC-6 dario.h...@gmail.com > wrote: > >> Running `mix format --check-formatted` passes with success on the >> following code: >> >> defmodule Example do >> @xs [ >> 1, >> 2, >> 3, >> 4, >> 5, >> 6, >> 7 >> ] >> >> @other [ >> 1, >> 2, >> 3, >> 4, >> 5, >> 6, >> 7 >> ] >> |> Enum.map(&(&1 * 2)) >> |> Enum.reject(&(&1 < 5)) >> |> length() >> >> @words ~w( >> ONE >> TWO >> THREE >> FOUR >> FIVE >> ) >> |> Enum.filter(&String.contains?(&1, "F")) >> end >> >> I am wondering whether that is intended or if I should open an issue on >> Github and look into fixing it. >> >> `@words` does not seem to be formatted in the same way as `@other` which >> I would kinda expect and the formatting of `@words` looks kinda weird. >> >> Secondly is there reason why when I pipe the module attribute that it >> gets intended differently than when I do not pipe it (compare @other with >> @xs)? >> >> Best regards, >> Dario >> > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/8b675288-0c5c-404d-88cd-bdd5fb93021an%40googlegroups.com.