Re: [elixir-core:10242] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

Tonći Galić Fri, 07 May 2021 06:51:53 -0700

Hi folks,

Nice to see someone got the ball rolling, I'm also interested (and would 
like to contribute). For me it's a bit unclear what the next steps would 
be? 
If someone could clarify, it'd be happy to spend some time on PR


Best,
Tonći

On Friday, May 7, 2021 at 6:15:48 AM UTC+2 José Valim wrote:

> Correct. Those are the trade-offs. We could for example have 
> Code.string_to_quoted_formatter_options which returns the formatter options 
> used for parsing the AST but then users must know that we now have a 
> special kind of AST. And nodes introduced by user may still not be 
> annotated - which means we still need to care about those scenarios.
>
> On Fri, May 7, 2021 at 5:07 AM i Dorgan <[email protected]> wrote:
>
>> Thanks for your thoughts!
>>
>> > If anyone is interested, I was playing with these ideas and you can see 
>> some example use cases here: 
>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But 
>> exactly because Code.Formatter is private, I had to vendor it and that code 
>> broke on recent Elixir versions.
>>
>> I really like what you did there. With "that code broke on recent Elixir 
>> versions" you mean the addition of stepped ranges? I see that the wrapping 
>> of literals via the `literal_encoder` hasn't changed since June of 2019.
>>
>> > There is one issue about quoted_to_algebra though: the AST that the 
>> formatter works with is a special one, where the literals are wrapped in 
>> blocks so we can store their metadata. This means that, in order to have 
>> quotes_to_algebra, we would need to change the formatter to also handle 
>> “regular AST” or nodes with limited metadata.
>> > 
>> > It is doable, but it is work, and a requirement to expose said 
>> functionality.
>>
>> I'm exploring ways to achieve this, and I can think of two approaches:
>> a - traverse the received AST and wrap the literals in blocks
>> b - change the various functions used to convert the AST into algebra 
>> fragments so that they can also handle bare literals
>>
>> After some experimentation I saw that option a is doable, but I feel that 
>> I am recreating part of the work that the tokenizer already does and that 
>> it would become a burden to maintain whenever the tokenizer changes.
>> I haven't tried option b yet, I'm still trying to get familiar with the 
>> quoted -> algebra conversion(and algebra documents in general), but my 
>> first thought is that it would be hard to figure out over which literal a 
>> comment should be placed. I imagine that a quoted expression representing 
>> this code:
>> 1  def foo() do
>> 2    :hello
>> 3
>> 4    :elixir
>> 5
>> 6
>> 7    :world
>> 8  end
>> And a comment {5, {_, _}, "Hello!"}
>> If we don't have line number information, where should the comment be 
>> placed? We don't know where the literals are, they may be at lines 2, 3 and 
>> 4, or 2, 4 and 7, they could be anywhere. All we can tell is that the 
>> comment goes somewhere between lines 1 and 8, so any position would be ok. 
>> So having the formatter accept bare literals would mean having to work with 
>> insufficient data and it would produce a bad output. This issue is also 
>> present with option a, but changing the part of the formatter that 
>> generates the algebra document would essentially mean to *break* the 
>> formatter in multiple places. In summary, I'm not sure about the usefulness 
>> of a user supplied ast with no line numbers associated to literals, as it 
>> is hard to apropiately recreate the source code when comments are involved.
>> I'm not happy with neither approach and I will try to think of better 
>> solutions.
>>
>> On the other hand, the literal encoder is using a public api. A user can 
>> already generate AST from a string with the literals wrapped in a block via 
>> the :literal_encoder option of Code.string_to_quoted, so I think the 
>> literal token metadata should be considered public api(changes to it can 
>> already break third party code today). Wrapping a literal in a block and 
>> storing the token metadata in it has proven to be a very simple and 
>> practical solution to the problem of literals not carrying any metadata 
>> while still using valid AST, so maybe exposing that can also be considered 
>> as a viable alternative? This would go against my original position of "nor 
>> does it need to expose the private version of the AST used internally by 
>> the formatter", but in practical terms I think it would provide more value 
>> if it were exposed. The comments data structure would already be exposed 
>> and we would be requiring users to provide comments in that form, so I 
>> think requiring an AST with literals shaped in a particular way wouldn't be 
>> unreasonable. Of course, if that would be done, the fact that 
>> quoted_to_algebra requires an AST with literals wrapped in that way should 
>> be properly documented.
>> El jueves, 6 de mayo de 2021 a las 3:07:33 UTC-3, Wojtek Mach escribió:
>>
>>> Looking forward to any progress on this front.
>>>
>>> If anyone is interested, I was playing with these ideas and you can see 
>>> some example use cases here: 
>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But 
>>> exactly because Code.Formatter is private, I had to vendor it and that code 
>>> broke on recent Elixir versions.
>>>
>>> On May 5, 2021, "José Valim" <[email protected]> wrote:
>>>
>>> I believe those ideas are promising. There is one issue about 
>>> quoted_to_algebra though: the AST that the formatter works with is a 
>>> special one, where the literals are wrapped in blocks so we can store their 
>>> metadata. This means that, in order to have quotes_to_algebra, we would 
>>> need to change the formatter to also handle “regular AST” or nodes with 
>>> limited metadata.
>>> It is doable, but it is work, and a requirement to expose said 
>>> functionality.
>>> About the comments, I like the suggestion. Although we should probably 
>>> move from a tuple to a map to avoid breaking changes in the future. A PR 
>>> for this particular issue is very welcome!
>>> Thank you for the proposal and thinking about these problems.
>>> On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote:
>>>
>>> Hi all,
>>>>
>>>> The motivation for this proposal is to make it easier for tools to 
>>>> alter and format elixir code while preserving the current behavior of the 
>>>> formatter. Most of the functionality is already there, and a little change 
>>>> in the APIs would enable a wide variety of new use cases.
>>>>
>>>> If we want to transform a piece of code from one form to another, we 
>>>> can modify the AST and then convert it back to a string. For example, a 
>>>> tool could detect the usage of `String.to_atom` and not only warn about 
>>>> unsafe string to atom conversion, but also give an option to automatically 
>>>> fix the issue, replacing it with `String.to_existing_atom`. The first part 
>>>> is already covered by tools like credo, but it seems that manipulating the 
>>>> source code itself is difficult, mostly because the AST does not contain 
>>>> information about comments and because `Macro.to_string` doesn't produce 
>>>> text that complies with the elixir coding conventions. For example, this 
>>>> code:
>>>> ```elixir
>>>> def foo(bar) do
>>>>   # Some comment
>>>>   :baz
>>>> end
>>>> ```
>>>> Would be printed as this:
>>>> ```elixir
>>>> def(fop(bar)) do
>>>>   :baz
>>>> end
>>>> ```
>>>> Tools like https://github.com/KronicDeth/intellij-elixir implement 
>>>> their own parsers to circumvent this issue, but I think it would be nice 
>>>> if 
>>>> it could be achieved with the existing parser in core elixir.
>>>>
>>>> I've seen other conversations where it was suggested to change the 
>>>> elixir AST to support comments, either adding a new comment node(breaking 
>>>> change) or using the nodes metadata, but the conclusion was that there was 
>>>> no clear preference on how to do this at the AST level, and being that the 
>>>> elixir tokenizer allows us to pass a function to process the comments, 
>>>> José 
>>>> suggested to keep the them on a side data structure instead of in the AST 
>>>> itself. This is what the Elixir formatter does.
>>>>
>>>> Currently the `Code.Formatter` module is private API used by the 
>>>> `Code.format_string` function. This means that the only way to format 
>>>> elixir code is by providing a string(or a file) to a function in the 
>>>> `Code` 
>>>> module. If we are transforming the code, however, what we have is a quoted 
>>>> expression, thus we don't have a way to turn it back into a string.
>>>>
>>>> At a high level, the `Code.Formatter.to_algebra` does three things:
>>>> 1. It extracts the comments from the source code
>>>> 2. It parses the source code into a quoted expression
>>>> 3. It takes the comments and the quoted expressions and merges them to 
>>>> produce an algebra document
>>>>
>>>> What I propose is to split the `Code.Formatter.to_algebra` considering 
>>>> those steps, and expose the functionality via the `Code` module. The 
>>>> reasoning is that if a user has access to both the ast and the comments, 
>>>> they can then transform the ast, and return back both the ast and comments 
>>>> to the formatter to produce an algebra document. *How they implement 
>>>> this is up to them* and Elixir doesn't need to give an opinion on how 
>>>> comments should be handled during those manipulations, nor does it need to 
>>>> expose the private version of the AST used internally by the formatter. If 
>>>> they want to merge comments into the metadata or use custom nodes, it's 
>>>> completely up to them, they just need to return back a valid quoted 
>>>> expression and a list of comments with their metadata.
>>>>
>>>> The other reason I think this should be done by exposing those 
>>>> functions is that there's a great a amount of work put into the formatter 
>>>> to turn the quoted expressions into formatted algebra documents, and I 
>>>> think all of that could be reused, eliminating the need for custom 
>>>> formatters.
>>>>
>>>> The workflow would be something like this:
>>>> ```elixir
>>>> {:ok, {quoted, comments}} = File.read!(path) |> 
>>>> Code.string_to_quoted_with_comments()
>>>> quoted  = do_some_manipulation(quoted , comments)
>>>> {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments)
>>>> new_source = Inspect.Algebra.format(doc, 80)
>>>> ```
>>>> I'm not married to those function names and return values, but I hope 
>>>> they serve to convey the idea.
>>>>
>>>> I already have some little examples of source code transformations 
>>>> using an API like the above, I'm working on reducing and tidying them, but 
>>>> in the meantime I would like to hear you opinions on this proposal.
>>>>
>>>> The only downside I can think of is that the `{line, {previous_eol, 
>>>> next_eol}, text}` format of the comments may be considered a private data 
>>>> structure, but the formatter hasn't changed much since 1.6(3 years ago) 
>>>> and 
>>>> I think it could be considered stable enough to be exposed.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elixir-lang-core" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>  -- 
>>>  You received this message because you are subscribed to the Google 
>>> Groups "elixir-lang-core" group.
>>>  To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>>
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>>
> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com.

Re: [elixir-core:10242] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

Reply via email to