[elixir-core:11230] Re: Formatting of module attributes

Christopher Keele Wed, 28 Dec 2022 17:13:35 -0800

After a little tinkering, it seems like the major obstacle in implementing 
this via a custom formatter module is that sigil formatters are provided 
the file and line number, but not the column number, of each sigil 
literal's start; so there's no great way for any multiline sigil formatter 
to reason about how it should format its contents with respect to 
indentation.


On Wednesday, December 28, 2022 at 4:49:49 PM UTC-6 Christopher Keele wrote:

> This feels like a feature request to me: I understand why sigils are 
> generally not touched by the formatter without plugins, but I feel like the 
> sigil_w included in the standard library should have smarter formatting by 
> default.
>
> *NOTE: I'm using Elixir 1.14.2 here to observe the behaviour of the 
> formatter, this may be out-of-date with the mainline branch.*
>
>
> *General formatting of sigils*
>
> Conceptually, to the compiler, the contents of a sigil is a potentially 
> multi-line string. However, actually using a multiline string does get 
> forced into a format as expected:
>
> @words """
>
>         ONE
>         TWO
>         THREE
>         FOUR
>         FIVE
>         """
>         |> String.split
>         |> Enum.filter(&String.contains?(&1, "F"))
>
> Elixir *literal* multiline strings have special semantics for stripping 
> the whitespace on the left, based on the indentation of the closing """. 
> Try increasing and decreasing the indentation of that lexeme and watch how 
> the formatter reacts.
>
> I'd personally intuitively expect the sigil_w case to do the same, but you 
> can see why we cannot apply multiline string semantics to every sigil—the 
> sigil macro, called at compile-time, receives the verbatim contents of the 
> string, extra whitespace and all. Correctly handling that whitespace, 
> including stripping it, is the job of the sigil itself, which may vary 
> depending on the intentions of the sigil developer. (Consider: a custom 
> sigil for parsing the Whitespace esolang 
> <https://en.wikipedia.org/wiki/Whitespace_(programming_language)> or a 
> python program has different multiline-stripping semantics than literal 
> multiline strings or sigil_w.)
>
> Since the formatter cannot know what whitespace semantics any particular 
> sigil expects, it cannot modify the contents of the string with the 
> knowledge that it will not impact the program, unlike multiline string 
> literals. So it will do absolutely no work on the sigil's contents; leaving 
> your awkward indentation in place. The good news is that if you correct the 
> indentation manually, knowing how this particular sigil handles whitespace, 
> that rewriting will pass formatting and stay unchanged.
>
>
> *Formatting stdlib sigils*
>
> That being said, the documentation for extending the formatter 
> <https://hexdocs.pm/mix/main/Mix.Tasks.Format.html#module-plugins> is 
> very sigil-special-casing-oriented. It should be easy to implement a plugin 
> that knows how to normalize the currently un-touched
>
> ~w(
>   ONE
>   TWO
>   THREE
>     FOUR
>   FIVE
>     )
>
> However, *I'd really imagine that the stdlib formatter would understand 
> the special whitespace semantics of the stdlib sigil_w and format it 
> out-of-the-box*. This is the feature request I see here.
>
> I also think that other stdlib sigil formatting could be improved; for 
> example I feel like
>
> ~D[2022-01-01
> ]
>
> should automatically be formatted to 
>
> ~D[2022-01-01]
>
> without any plugins.
>
>
> *Formatting module attributes*
>
> > Is there reason why when I pipe the module attribute that it gets 
> intended differently than when I do not pipe it (compare @other with @xs)? 
>
> I believe this is an emergent behaviour of whatever order the formatter 
> calculates rules for determining:
>
> - the indentation of the module attribute's argument
> - the indentation of the pipeline
> - the indentation of the (list) argument to the pipeline
> - the indentation of list items within the list
>
> These determinations add up in an unexpected way I do not understand. 
> Essentially, Pipelines want to have their indentation flush with the 
> leftmost character of their argument, so that you get:
>
> [1, 2, 3]
> |> Enum.map(&(&1 * 2))
> |> Enum.reject(&(&1 < 5))
> |> length()
>
> [
>   1,
>   2,
>   3
>
> ]
> |> Enum.map(&(&1 * 2))
> |> Enum.reject(&(&1 < 5))
> |> length()
>
>
> Somehow this interacts with how module attributes want to indent things, 
> and we get
>
> @nums [
>         1,
>         2,
>         3
>
>       ]
>       |> Enum.map(&(&1 * 2))
>       |> Enum.reject(&(&1 < 5))
>       |> length()
>
>
> This does not seem like a bug per se, but I also personally think that 
> this should format as
>
> @nums [
>   1,
>   2,
>   3
>
> ]
> |> Enum.map(&(&1 * 2))
> |> Enum.reject(&(&1 < 5))
> |> length()
>
>
> This seems like it would be a backwards-compatible enhancement.
>
>
> *Summary*
>
> The combination of sigil_w not being internally normalized with sigil_w 
> whitespace semantics, alongside the current behaviour of multi-line 
> expressions in module attributes, leads to this particularly unexpected 
> appearance.
>
> I feel like improvements to both would be welcome in PRs. It may be worth 
> first discussing the impact of releasing changes to the formatter, though. 
> Even semantically backwards-compatible changes have the potential to lead 
> to a lot of syntactic line diff noise and churn when upgrading Elixir, so 
> I'm not certain if there is a more cautious release policy for such 
> things—such as only releasing major formatter changes in minor version 
> bumps.
> On Wednesday, December 28, 2022 at 5:59:17 AM UTC-6 dario.h...@gmail.com 
> wrote:
>
>> Running `mix format --check-formatted` passes with success on the 
>> following code:
>>
>> defmodule Example do
>>   @xs [
>>     1,
>>     2,
>>     3,
>>     4,
>>     5,
>>     6,
>>     7
>>   ]
>>
>>   @other [
>>            1,
>>            2,
>>            3,
>>            4,
>>            5,
>>            6,
>>            7
>>          ]
>>          |> Enum.map(&(&1 * 2))
>>          |> Enum.reject(&(&1 < 5))
>>          |> length()
>>
>>   @words ~w(
>>   ONE
>>   TWO
>>   THREE
>>   FOUR
>>   FIVE
>>   )
>>          |> Enum.filter(&String.contains?(&1, "F"))
>> end
>>
>> I am wondering whether that is intended or if I should open an issue on 
>> Github and look into fixing it. 
>>
>> `@words` does not seem to be formatted in the same way as `@other` which 
>> I would kinda expect and the formatting of `@words` looks kinda weird.
>>
>> Secondly is there reason why when I pipe the module attribute that it 
>> gets intended differently than when I do not pipe it (compare @other with 
>> @xs)? 
>>
>> Best regards,
>> Dario
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/8b675288-0c5c-404d-88cd-bdd5fb93021an%40googlegroups.com.

[elixir-core:11230] Re: Formatting of module attributes

Reply via email to