Re: [elixir-core:10240] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

i Dorgan Thu, 06 May 2021 20:07:32 -0700

Thanks for your thoughts!

> If anyone is interested, I was playing with these ideas and you can see 
some example use cases here: 
https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But 
exactly because Code.Formatter is private, I had to vendor it and that code 
broke on recent Elixir versions.


I really like what you did there. With "that code broke on recent Elixir 
versions" you mean the addition of stepped ranges? I see that the wrapping 
of literals via the `literal_encoder` hasn't changed since June of 2019.

> There is one issue about quoted_to_algebra though: the AST that the 
formatter works with is a special one, where the literals are wrapped in 
blocks so we can store their metadata. This means that, in order to have 
quotes_to_algebra, we would need to change the formatter to also handle 
“regular AST” or nodes with limited metadata.
> 
> It is doable, but it is work, and a requirement to expose said 
functionality.

I'm exploring ways to achieve this, and I can think of two approaches:
a - traverse the received AST and wrap the literals in blocks
b - change the various functions used to convert the AST into algebra 
fragments so that they can also handle bare literals

After some experimentation I saw that option a is doable, but I feel that I 
am recreating part of the work that the tokenizer already does and that it 
would become a burden to maintain whenever the tokenizer changes.
I haven't tried option b yet, I'm still trying to get familiar with the 
quoted -> algebra conversion(and algebra documents in general), but my 
first thought is that it would be hard to figure out over which literal a 
comment should be placed. I imagine that a quoted expression representing 
this code:
1  def foo() do
2    :hello
3
4    :elixir
5
6
7    :world
8  end
And a comment {5, {_, _}, "Hello!"}
If we don't have line number information, where should the comment be 
placed? We don't know where the literals are, they may be at lines 2, 3 and 
4, or 2, 4 and 7, they could be anywhere. All we can tell is that the 
comment goes somewhere between lines 1 and 8, so any position would be ok. 
So having the formatter accept bare literals would mean having to work with 
insufficient data and it would produce a bad output. This issue is also 
present with option a, but changing the part of the formatter that 
generates the algebra document would essentially mean to *break* the 
formatter in multiple places. In summary, I'm not sure about the usefulness 
of a user supplied ast with no line numbers associated to literals, as it 
is hard to apropiately recreate the source code when comments are involved.
I'm not happy with neither approach and I will try to think of better 
solutions.

On the other hand, the literal encoder is using a public api. A user can 
already generate AST from a string with the literals wrapped in a block via 
the :literal_encoder option of Code.string_to_quoted, so I think the 
literal token metadata should be considered public api(changes to it can 
already break third party code today). Wrapping a literal in a block and 
storing the token metadata in it has proven to be a very simple and 
practical solution to the problem of literals not carrying any metadata 
while still using valid AST, so maybe exposing that can also be considered 
as a viable alternative? This would go against my original position of "nor 
does it need to expose the private version of the AST used internally by 
the formatter", but in practical terms I think it would provide more value 
if it were exposed. The comments data structure would already be exposed 
and we would be requiring users to provide comments in that form, so I 
think requiring an AST with literals shaped in a particular way wouldn't be 
unreasonable. Of course, if that would be done, the fact that 
quoted_to_algebra requires an AST with literals wrapped in that way should 
be properly documented.
El jueves, 6 de mayo de 2021 a las 3:07:33 UTC-3, Wojtek Mach escribió:

> Looking forward to any progress on this front.
>
> If anyone is interested, I was playing with these ideas and you can see 
> some example use cases here: 
> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But 
> exactly because Code.Formatter is private, I had to vendor it and that code 
> broke on recent Elixir versions.
>
> On May 5, 2021, "José Valim" <[email protected]> wrote:
>
> I believe those ideas are promising. There is one issue about 
> quoted_to_algebra though: the AST that the formatter works with is a 
> special one, where the literals are wrapped in blocks so we can store their 
> metadata. This means that, in order to have quotes_to_algebra, we would 
> need to change the formatter to also handle “regular AST” or nodes with 
> limited metadata.
> It is doable, but it is work, and a requirement to expose said 
> functionality.
> About the comments, I like the suggestion. Although we should probably 
> move from a tuple to a map to avoid breaking changes in the future. A PR 
> for this particular issue is very welcome!
> Thank you for the proposal and thinking about these problems.
> On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote:
>
> Hi all,
>>
>> The motivation for this proposal is to make it easier for tools to alter 
>> and format elixir code while preserving the current behavior of the 
>> formatter. Most of the functionality is already there, and a little change 
>> in the APIs would enable a wide variety of new use cases.
>>
>> If we want to transform a piece of code from one form to another, we can 
>> modify the AST and then convert it back to a string. For example, a tool 
>> could detect the usage of `String.to_atom` and not only warn about unsafe 
>> string to atom conversion, but also give an option to automatically fix the 
>> issue, replacing it with `String.to_existing_atom`. The first part is 
>> already covered by tools like credo, but it seems that manipulating the 
>> source code itself is difficult, mostly because the AST does not contain 
>> information about comments and because `Macro.to_string` doesn't produce 
>> text that complies with the elixir coding conventions. For example, this 
>> code:
>> ```elixir
>> def foo(bar) do
>>   # Some comment
>>   :baz
>> end
>> ```
>> Would be printed as this:
>> ```elixir
>> def(fop(bar)) do
>>   :baz
>> end
>> ```
>> Tools like https://github.com/KronicDeth/intellij-elixir implement their 
>> own parsers to circumvent this issue, but I think it would be nice if it 
>> could be achieved with the existing parser in core elixir.
>>
>> I've seen other conversations where it was suggested to change the elixir 
>> AST to support comments, either adding a new comment node(breaking change) 
>> or using the nodes metadata, but the conclusion was that there was no clear 
>> preference on how to do this at the AST level, and being that the elixir 
>> tokenizer allows us to pass a function to process the comments, José 
>> suggested to keep the them on a side data structure instead of in the AST 
>> itself. This is what the Elixir formatter does.
>>
>> Currently the `Code.Formatter` module is private API used by the 
>> `Code.format_string` function. This means that the only way to format 
>> elixir code is by providing a string(or a file) to a function in the `Code` 
>> module. If we are transforming the code, however, what we have is a quoted 
>> expression, thus we don't have a way to turn it back into a string.
>>
>> At a high level, the `Code.Formatter.to_algebra` does three things:
>> 1. It extracts the comments from the source code
>> 2. It parses the source code into a quoted expression
>> 3. It takes the comments and the quoted expressions and merges them to 
>> produce an algebra document
>>
>> What I propose is to split the `Code.Formatter.to_algebra` considering 
>> those steps, and expose the functionality via the `Code` module. The 
>> reasoning is that if a user has access to both the ast and the comments, 
>> they can then transform the ast, and return back both the ast and comments 
>> to the formatter to produce an algebra document. *How they implement 
>> this is up to them* and Elixir doesn't need to give an opinion on how 
>> comments should be handled during those manipulations, nor does it need to 
>> expose the private version of the AST used internally by the formatter. If 
>> they want to merge comments into the metadata or use custom nodes, it's 
>> completely up to them, they just need to return back a valid quoted 
>> expression and a list of comments with their metadata.
>>
>> The other reason I think this should be done by exposing those functions 
>> is that there's a great a amount of work put into the formatter to turn the 
>> quoted expressions into formatted algebra documents, and I think all of 
>> that could be reused, eliminating the need for custom formatters.
>>
>> The workflow would be something like this:
>> ```elixir
>> {:ok, {quoted, comments}} = File.read!(path) |> 
>> Code.string_to_quoted_with_comments()
>> quoted  = do_some_manipulation(quoted , comments)
>> {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments)
>> new_source = Inspect.Algebra.format(doc, 80)
>> ```
>> I'm not married to those function names and return values, but I hope 
>> they serve to convey the idea.
>>
>> I already have some little examples of source code transformations using 
>> an API like the above, I'm working on reducing and tidying them, but in the 
>> meantime I would like to hear you opinions on this proposal.
>>
>> The only downside I can think of is that the `{line, {previous_eol, 
>> next_eol}, text}` format of the comments may be considered a private data 
>> structure, but the formatter hasn't changed much since 1.6(3 years ago) and 
>> I think it could be considered stable enough to be exposed.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>  -- 
>  You received this message because you are subscribed to the Google Groups 
> "elixir-lang-core" group.
>  To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
>
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com
>  
> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com.

Re: [elixir-core:10240] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

Reply via email to