Re: [elixir-core:10244] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

i Dorgan Fri, 07 May 2021 15:12:37 -0700

I'm currently experimenting on the second point you outlined

I've uploaded what I've been trying so far to this 
repository: https://github.com/doorgan/formatter. The relevant file 
is https://github.com/doorgan/formatter/blob/main/lib/normalizer.ex as well 
as the tests.


The approach I'm taking there is to walk the AST and manually wrap literals 
when I find them, according to the context(when possible). I think I'm able 
to produce the kind of AST the formatter would expect, and produces fairly 
adequate code, except for cases like mixed keyword lists ([1, 2, a: b]) for 
which I still have to figure out how to recreate. I had to add a literal? 
field to the literal blocks to distinguish them from regular blocks and 
prevent the normalizer from recursing infinitely. Maybe this could be 
avoided, I need to figure how, though.

I would like to know if you think this is a good aproach, or I should try 
to change the various block_to_algebra and quoted_to_algebra to handle 
literals.
 I'm starting to believe this may be the cleanest aproach, and I would need 
to expand the state map to hold metadata from the parent node(so I can fill 
line numbers, or metadata like format: :keyword to keyword lists).

El viernes, 7 de mayo de 2021 a las 11:07:31 UTC-3, José Valim escribió:

> I think there are two fronts:
>
> 1. Today, all of the API the formatter uses, except the extraction of code 
> comments, is public. So providing a string_to_quoted_with_comments would be 
> a logical next step. We only need to convert the comments from a tuple to a 
> map before.
>
> 2. The second step is more complex and needs more research, which is to 
> figure out how exactly to change the formatter to accept general AST.
>
> Please coordinate between you because it seems some work may have already 
> been done in those areas. I will be glad to provide assistance and answer 
> questions. :)
>
> On Fri, May 7, 2021 at 3:51 PM Tonći Galić <[email protected]> wrote:
>
>> Hi folks,
>>
>> Nice to see someone got the ball rolling, I'm also interested (and would 
>> like to contribute). For me it's a bit unclear what the next steps would 
>> be? 
>> If someone could clarify, it'd be happy to spend some time on PR
>>
>> Best,
>> Tonći
>>
>> On Friday, May 7, 2021 at 6:15:48 AM UTC+2 José Valim wrote:
>>
>>> Correct. Those are the trade-offs. We could for example have 
>>> Code.string_to_quoted_formatter_options which returns the formatter options 
>>> used for parsing the AST but then users must know that we now have a 
>>> special kind of AST. And nodes introduced by user may still not be 
>>> annotated - which means we still need to care about those scenarios.
>>>
>>> On Fri, May 7, 2021 at 5:07 AM i Dorgan <[email protected]> wrote:
>>>
>>>> Thanks for your thoughts!
>>>>
>>>> > If anyone is interested, I was playing with these ideas and you can 
>>>> see some example use cases here: 
>>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But 
>>>> exactly because Code.Formatter is private, I had to vendor it and that 
>>>> code 
>>>> broke on recent Elixir versions.
>>>>
>>>> I really like what you did there. With "that code broke on recent 
>>>> Elixir versions" you mean the addition of stepped ranges? I see that the 
>>>> wrapping of literals via the `literal_encoder` hasn't changed since June 
>>>> of 
>>>> 2019.
>>>>
>>>> > There is one issue about quoted_to_algebra though: the AST that the 
>>>> formatter works with is a special one, where the literals are wrapped in 
>>>> blocks so we can store their metadata. This means that, in order to have 
>>>> quotes_to_algebra, we would need to change the formatter to also handle 
>>>> “regular AST” or nodes with limited metadata.
>>>> > 
>>>> > It is doable, but it is work, and a requirement to expose said 
>>>> functionality.
>>>>
>>>> I'm exploring ways to achieve this, and I can think of two approaches:
>>>> a - traverse the received AST and wrap the literals in blocks
>>>> b - change the various functions used to convert the AST into algebra 
>>>> fragments so that they can also handle bare literals
>>>>
>>>> After some experimentation I saw that option a is doable, but I feel 
>>>> that I am recreating part of the work that the tokenizer already does and 
>>>> that it would become a burden to maintain whenever the tokenizer changes.
>>>> I haven't tried option b yet, I'm still trying to get familiar with the 
>>>> quoted -> algebra conversion(and algebra documents in general), but my 
>>>> first thought is that it would be hard to figure out over which literal a 
>>>> comment should be placed. I imagine that a quoted expression representing 
>>>> this code:
>>>> 1  def foo() do
>>>> 2    :hello
>>>> 3
>>>> 4    :elixir
>>>> 5
>>>> 6
>>>> 7    :world
>>>> 8  end
>>>> And a comment {5, {_, _}, "Hello!"}
>>>> If we don't have line number information, where should the comment be 
>>>> placed? We don't know where the literals are, they may be at lines 2, 3 
>>>> and 
>>>> 4, or 2, 4 and 7, they could be anywhere. All we can tell is that the 
>>>> comment goes somewhere between lines 1 and 8, so any position would be ok. 
>>>> So having the formatter accept bare literals would mean having to work 
>>>> with 
>>>> insufficient data and it would produce a bad output. This issue is also 
>>>> present with option a, but changing the part of the formatter that 
>>>> generates the algebra document would essentially mean to *break* the 
>>>> formatter in multiple places. In summary, I'm not sure about the 
>>>> usefulness 
>>>> of a user supplied ast with no line numbers associated to literals, as it 
>>>> is hard to apropiately recreate the source code when comments are involved.
>>>> I'm not happy with neither approach and I will try to think of better 
>>>> solutions.
>>>>
>>>> On the other hand, the literal encoder is using a public api. A user 
>>>> can already generate AST from a string with the literals wrapped in a 
>>>> block 
>>>> via the :literal_encoder option of Code.string_to_quoted, so I think the 
>>>> literal token metadata should be considered public api(changes to it can 
>>>> already break third party code today). Wrapping a literal in a block and 
>>>> storing the token metadata in it has proven to be a very simple and 
>>>> practical solution to the problem of literals not carrying any metadata 
>>>> while still using valid AST, so maybe exposing that can also be considered 
>>>> as a viable alternative? This would go against my original position of 
>>>> "nor 
>>>> does it need to expose the private version of the AST used internally by 
>>>> the formatter", but in practical terms I think it would provide more value 
>>>> if it were exposed. The comments data structure would already be exposed 
>>>> and we would be requiring users to provide comments in that form, so I 
>>>> think requiring an AST with literals shaped in a particular way wouldn't 
>>>> be 
>>>> unreasonable. Of course, if that would be done, the fact that 
>>>> quoted_to_algebra requires an AST with literals wrapped in that way should 
>>>> be properly documented.
>>>> El jueves, 6 de mayo de 2021 a las 3:07:33 UTC-3, Wojtek Mach escribió:
>>>>
>>>>> Looking forward to any progress on this front.
>>>>>
>>>>> If anyone is interested, I was playing with these ideas and you can 
>>>>> see some example use cases here: 
>>>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But 
>>>>> exactly because Code.Formatter is private, I had to vendor it and that 
>>>>> code 
>>>>> broke on recent Elixir versions.
>>>>>
>>>>> On May 5, 2021, "José Valim" <[email protected]> wrote:
>>>>>
>>>>> I believe those ideas are promising. There is one issue about 
>>>>> quoted_to_algebra though: the AST that the formatter works with is a 
>>>>> special one, where the literals are wrapped in blocks so we can store 
>>>>> their 
>>>>> metadata. This means that, in order to have quotes_to_algebra, we would 
>>>>> need to change the formatter to also handle “regular AST” or nodes with 
>>>>> limited metadata.
>>>>> It is doable, but it is work, and a requirement to expose said 
>>>>> functionality.
>>>>> About the comments, I like the suggestion. Although we should probably 
>>>>> move from a tuple to a map to avoid breaking changes in the future. A PR 
>>>>> for this particular issue is very welcome!
>>>>> Thank you for the proposal and thinking about these problems.
>>>>> On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote:
>>>>>
>>>>> Hi all,
>>>>>>
>>>>>> The motivation for this proposal is to make it easier for tools to 
>>>>>> alter and format elixir code while preserving the current behavior of 
>>>>>> the 
>>>>>> formatter. Most of the functionality is already there, and a little 
>>>>>> change 
>>>>>> in the APIs would enable a wide variety of new use cases.
>>>>>>
>>>>>> If we want to transform a piece of code from one form to another, we 
>>>>>> can modify the AST and then convert it back to a string. For example, a 
>>>>>> tool could detect the usage of `String.to_atom` and not only warn about 
>>>>>> unsafe string to atom conversion, but also give an option to 
>>>>>> automatically 
>>>>>> fix the issue, replacing it with `String.to_existing_atom`. The first 
>>>>>> part 
>>>>>> is already covered by tools like credo, but it seems that manipulating 
>>>>>> the 
>>>>>> source code itself is difficult, mostly because the AST does not contain 
>>>>>> information about comments and because `Macro.to_string` doesn't produce 
>>>>>> text that complies with the elixir coding conventions. For example, this 
>>>>>> code:
>>>>>> ```elixir
>>>>>> def foo(bar) do
>>>>>>   # Some comment
>>>>>>   :baz
>>>>>> end
>>>>>> ```
>>>>>> Would be printed as this:
>>>>>> ```elixir
>>>>>> def(fop(bar)) do
>>>>>>   :baz
>>>>>> end
>>>>>> ```
>>>>>> Tools like https://github.com/KronicDeth/intellij-elixir implement 
>>>>>> their own parsers to circumvent this issue, but I think it would be nice 
>>>>>> if 
>>>>>> it could be achieved with the existing parser in core elixir.
>>>>>>
>>>>>> I've seen other conversations where it was suggested to change the 
>>>>>> elixir AST to support comments, either adding a new comment 
>>>>>> node(breaking 
>>>>>> change) or using the nodes metadata, but the conclusion was that there 
>>>>>> was 
>>>>>> no clear preference on how to do this at the AST level, and being that 
>>>>>> the 
>>>>>> elixir tokenizer allows us to pass a function to process the comments, 
>>>>>> José 
>>>>>> suggested to keep the them on a side data structure instead of in the 
>>>>>> AST 
>>>>>> itself. This is what the Elixir formatter does.
>>>>>>
>>>>>> Currently the `Code.Formatter` module is private API used by the 
>>>>>> `Code.format_string` function. This means that the only way to format 
>>>>>> elixir code is by providing a string(or a file) to a function in the 
>>>>>> `Code` 
>>>>>> module. If we are transforming the code, however, what we have is a 
>>>>>> quoted 
>>>>>> expression, thus we don't have a way to turn it back into a string.
>>>>>>
>>>>>> At a high level, the `Code.Formatter.to_algebra` does three things:
>>>>>> 1. It extracts the comments from the source code
>>>>>> 2. It parses the source code into a quoted expression
>>>>>> 3. It takes the comments and the quoted expressions and merges them 
>>>>>> to produce an algebra document
>>>>>>
>>>>>> What I propose is to split the `Code.Formatter.to_algebra` 
>>>>>> considering those steps, and expose the functionality via the `Code` 
>>>>>> module. The reasoning is that if a user has access to both the ast and 
>>>>>> the 
>>>>>> comments, they can then transform the ast, and return back both the ast 
>>>>>> and 
>>>>>> comments to the formatter to produce an algebra document. *How they 
>>>>>> implement this is up to them* and Elixir doesn't need to give an 
>>>>>> opinion on how comments should be handled during those manipulations, 
>>>>>> nor 
>>>>>> does it need to expose the private version of the AST used internally by 
>>>>>> the formatter. If they want to merge comments into the metadata or use 
>>>>>> custom nodes, it's completely up to them, they just need to return back 
>>>>>> a 
>>>>>> valid quoted expression and a list of comments with their metadata.
>>>>>>
>>>>>> The other reason I think this should be done by exposing those 
>>>>>> functions is that there's a great a amount of work put into the 
>>>>>> formatter 
>>>>>> to turn the quoted expressions into formatted algebra documents, and I 
>>>>>> think all of that could be reused, eliminating the need for custom 
>>>>>> formatters.
>>>>>>
>>>>>> The workflow would be something like this:
>>>>>> ```elixir
>>>>>> {:ok, {quoted, comments}} = File.read!(path) |> 
>>>>>> Code.string_to_quoted_with_comments()
>>>>>> quoted  = do_some_manipulation(quoted , comments)
>>>>>> {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments)
>>>>>> new_source = Inspect.Algebra.format(doc, 80)
>>>>>> ```
>>>>>> I'm not married to those function names and return values, but I hope 
>>>>>> they serve to convey the idea.
>>>>>>
>>>>>> I already have some little examples of source code transformations 
>>>>>> using an API like the above, I'm working on reducing and tidying them, 
>>>>>> but 
>>>>>> in the meantime I would like to hear you opinions on this proposal.
>>>>>>
>>>>>> The only downside I can think of is that the `{line, {previous_eol, 
>>>>>> next_eol}, text}` format of the comments may be considered a private 
>>>>>> data 
>>>>>> structure, but the formatter hasn't changed much since 1.6(3 years ago) 
>>>>>> and 
>>>>>> I think it could be considered stable enough to be exposed.
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elixir-lang-core" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>  -- 
>>>>>  You received this message because you are subscribed to the Google 
>>>>> Groups "elixir-lang-core" group.
>>>>>  To unsubscribe from this group and stop receiving emails from it, 
>>>>> send an email to [email protected].
>>>>>
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elixir-lang-core" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>>
>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/30f42fa5-b8c9-47ce-8f46-e90b800ecb3cn%40googlegroups.com.

Re: [elixir-core:10244] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

Reply via email to