Re: [elixir-core:10261] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

i Dorgan Tue, 18 May 2021 05:49:42 -0700

Hi all,

Now that https://github.com/elixir-lang/elixir/pull/10986/ is merged I was 
meaning to prepare a PR for the normalizing step. Would you prefer to have 
that code in the Code.Formatter module, or in a separate Code.Normalizer 
module? Personally I would prefer the latter, I think it would make the 
code that does the actual formatting easier to follow.


El domingo, 9 de mayo de 2021 a las 0:20:37 UTC-3, i Dorgan escribió:

> I tried manual traversal and that made things so much easier, the literal? 
> field is no longer needed. I made good progress on the conversion and I 
> think I have covered most of the edge cases, judging from the tests I 
> wrote: 
> https://github.com/doorgan/formatter/blob/main/test/normalize_test.exs
> The approach I took for testing was to run the formatter with both the AST 
> generated internally by `Code.Formatter.to_algebra` and with the one I'm 
> creating, as well as feeding the special AST to my normalizer to make sure 
> literals don't get wrapped twice. I did not test the specific shapes the 
> AST should have in most cases but the most basic ones, but it may be a good 
> idea to do so.
>
> My next step would be to test it with comments, I expect them not land in 
> the correct lines in every case because I'm making literals inherit the 
> line number of their parents and heredocs lose delimiter information, but 
> at the very least it the formatter should not break.
>
> If this works then I think I'd be in a good position to submit a PR, once 
> a PR to expose the comments maps is submitted.
> El sábado, 8 de mayo de 2021 a las 2:54:13 UTC-3, José Valim escribió:
>
>> > I had to add a literal? field to the literal blocks to distinguish them 
>> from regular blocks and prevent the normalizer from recursing infinitely.
>>
>> You can always traverse manually. The number of AST nodes are small and 
>> it will give you more control. Alternatively, give postwalk a try, since 
>> the block node you add on postwalk won't be traversed (IIRC).
>>
>> The approach looks good to me. Note for some of those, I would prefer to 
>> push the fix to the formatter (for example, we can assume :delimiter 
>> defaults to ~s"").
>>
>> On Sat, May 8, 2021 at 12:12 AM i Dorgan <[email protected]> wrote:
>>
>>> I'm currently experimenting on the second point you outlined
>>>
>>> I've uploaded what I've been trying so far to this repository: 
>>> https://github.com/doorgan/formatter. The relevant file is 
>>> https://github.com/doorgan/formatter/blob/main/lib/normalizer.ex as 
>>> well as the tests.
>>>
>>> The approach I'm taking there is to walk the AST and manually wrap 
>>> literals when I find them, according to the context(when possible). I think 
>>> I'm able to produce the kind of AST the formatter would expect, and 
>>> produces fairly adequate code, except for cases like mixed keyword lists 
>>> ([1, 2, a: b]) for which I still have to figure out how to recreate. I had 
>>> to add a literal? field to the literal blocks to distinguish them from 
>>> regular blocks and prevent the normalizer from recursing infinitely. Maybe 
>>> this could be avoided, I need to figure how, though.
>>>
>>> I would like to know if you think this is a good aproach, or I should 
>>> try to change the various block_to_algebra and quoted_to_algebra to handle 
>>> literals.
>>>  I'm starting to believe this may be the cleanest aproach, and I would 
>>> need to expand the state map to hold metadata from the parent node(so I can 
>>> fill line numbers, or metadata like format: :keyword to keyword lists).
>>>
>>> El viernes, 7 de mayo de 2021 a las 11:07:31 UTC-3, José Valim escribió:
>>>
>>>> I think there are two fronts:
>>>>
>>>> 1. Today, all of the API the formatter uses, except the extraction of 
>>>> code comments, is public. So providing a string_to_quoted_with_comments 
>>>> would be a logical next step. We only need to convert the comments from a 
>>>> tuple to a map before.
>>>>
>>>> 2. The second step is more complex and needs more research, which is to 
>>>> figure out how exactly to change the formatter to accept general AST.
>>>>
>>>> Please coordinate between you because it seems some work may have 
>>>> already been done in those areas. I will be glad to provide assistance and 
>>>> answer questions. :)
>>>>
>>>> On Fri, May 7, 2021 at 3:51 PM Tonći Galić <[email protected]> wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> Nice to see someone got the ball rolling, I'm also interested (and 
>>>>> would like to contribute). For me it's a bit unclear what the next steps 
>>>>> would be? 
>>>>> If someone could clarify, it'd be happy to spend some time on PR
>>>>>
>>>>> Best,
>>>>> Tonći
>>>>>
>>>>> On Friday, May 7, 2021 at 6:15:48 AM UTC+2 José Valim wrote:
>>>>>
>>>>>> Correct. Those are the trade-offs. We could for example have 
>>>>>> Code.string_to_quoted_formatter_options which returns the formatter 
>>>>>> options 
>>>>>> used for parsing the AST but then users must know that we now have a 
>>>>>> special kind of AST. And nodes introduced by user may still not be 
>>>>>> annotated - which means we still need to care about those scenarios.
>>>>>>
>>>>>> On Fri, May 7, 2021 at 5:07 AM i Dorgan <[email protected]> wrote:
>>>>>>
>>>>>>> Thanks for your thoughts!
>>>>>>>
>>>>>>> > If anyone is interested, I was playing with these ideas and you 
>>>>>>> can see some example use cases here: 
>>>>>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. 
>>>>>>> But exactly because Code.Formatter is private, I had to vendor it and 
>>>>>>> that 
>>>>>>> code broke on recent Elixir versions.
>>>>>>>
>>>>>>> I really like what you did there. With "that code broke on recent 
>>>>>>> Elixir versions" you mean the addition of stepped ranges? I see that 
>>>>>>> the 
>>>>>>> wrapping of literals via the `literal_encoder` hasn't changed since 
>>>>>>> June of 
>>>>>>> 2019.
>>>>>>>
>>>>>>> > There is one issue about quoted_to_algebra though: the AST that 
>>>>>>> the formatter works with is a special one, where the literals are 
>>>>>>> wrapped 
>>>>>>> in blocks so we can store their metadata. This means that, in order to 
>>>>>>> have 
>>>>>>> quotes_to_algebra, we would need to change the formatter to also handle 
>>>>>>> “regular AST” or nodes with limited metadata.
>>>>>>> > 
>>>>>>> > It is doable, but it is work, and a requirement to expose said 
>>>>>>> functionality.
>>>>>>>
>>>>>>> I'm exploring ways to achieve this, and I can think of two 
>>>>>>> approaches:
>>>>>>> a - traverse the received AST and wrap the literals in blocks
>>>>>>> b - change the various functions used to convert the AST into 
>>>>>>> algebra fragments so that they can also handle bare literals
>>>>>>>
>>>>>>> After some experimentation I saw that option a is doable, but I feel 
>>>>>>> that I am recreating part of the work that the tokenizer already does 
>>>>>>> and 
>>>>>>> that it would become a burden to maintain whenever the tokenizer 
>>>>>>> changes.
>>>>>>> I haven't tried option b yet, I'm still trying to get familiar with 
>>>>>>> the quoted -> algebra conversion(and algebra documents in general), but 
>>>>>>> my 
>>>>>>> first thought is that it would be hard to figure out over which literal 
>>>>>>> a 
>>>>>>> comment should be placed. I imagine that a quoted expression 
>>>>>>> representing 
>>>>>>> this code:
>>>>>>> 1  def foo() do
>>>>>>> 2    :hello
>>>>>>> 3
>>>>>>> 4    :elixir
>>>>>>> 5
>>>>>>> 6
>>>>>>> 7    :world
>>>>>>> 8  end
>>>>>>> And a comment {5, {_, _}, "Hello!"}
>>>>>>> If we don't have line number information, where should the comment 
>>>>>>> be placed? We don't know where the literals are, they may be at lines 
>>>>>>> 2, 3 
>>>>>>> and 4, or 2, 4 and 7, they could be anywhere. All we can tell is that 
>>>>>>> the 
>>>>>>> comment goes somewhere between lines 1 and 8, so any position would be 
>>>>>>> ok. 
>>>>>>> So having the formatter accept bare literals would mean having to work 
>>>>>>> with 
>>>>>>> insufficient data and it would produce a bad output. This issue is also 
>>>>>>> present with option a, but changing the part of the formatter that 
>>>>>>> generates the algebra document would essentially mean to *break* the 
>>>>>>> formatter in multiple places. In summary, I'm not sure about the 
>>>>>>> usefulness 
>>>>>>> of a user supplied ast with no line numbers associated to literals, as 
>>>>>>> it 
>>>>>>> is hard to apropiately recreate the source code when comments are 
>>>>>>> involved.
>>>>>>> I'm not happy with neither approach and I will try to think of 
>>>>>>> better solutions.
>>>>>>>
>>>>>>> On the other hand, the literal encoder is using a public api. A user 
>>>>>>> can already generate AST from a string with the literals wrapped in a 
>>>>>>> block 
>>>>>>> via the :literal_encoder option of Code.string_to_quoted, so I think 
>>>>>>> the 
>>>>>>> literal token metadata should be considered public api(changes to it 
>>>>>>> can 
>>>>>>> already break third party code today). Wrapping a literal in a block 
>>>>>>> and 
>>>>>>> storing the token metadata in it has proven to be a very simple and 
>>>>>>> practical solution to the problem of literals not carrying any metadata 
>>>>>>> while still using valid AST, so maybe exposing that can also be 
>>>>>>> considered 
>>>>>>> as a viable alternative? This would go against my original position of 
>>>>>>> "nor 
>>>>>>> does it need to expose the private version of the AST used internally 
>>>>>>> by 
>>>>>>> the formatter", but in practical terms I think it would provide more 
>>>>>>> value 
>>>>>>> if it were exposed. The comments data structure would already be 
>>>>>>> exposed 
>>>>>>> and we would be requiring users to provide comments in that form, so I 
>>>>>>> think requiring an AST with literals shaped in a particular way 
>>>>>>> wouldn't be 
>>>>>>> unreasonable. Of course, if that would be done, the fact that 
>>>>>>> quoted_to_algebra requires an AST with literals wrapped in that way 
>>>>>>> should 
>>>>>>> be properly documented.
>>>>>>> El jueves, 6 de mayo de 2021 a las 3:07:33 UTC-3, Wojtek Mach 
>>>>>>> escribió:
>>>>>>>
>>>>>>>> Looking forward to any progress on this front.
>>>>>>>>
>>>>>>>> If anyone is interested, I was playing with these ideas and you can 
>>>>>>>> see some example use cases here: 
>>>>>>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. 
>>>>>>>> But exactly because Code.Formatter is private, I had to vendor it and 
>>>>>>>> that 
>>>>>>>> code broke on recent Elixir versions.
>>>>>>>>
>>>>>>>> On May 5, 2021, "José Valim" <[email protected]> wrote:
>>>>>>>>
>>>>>>>> I believe those ideas are promising. There is one issue about 
>>>>>>>> quoted_to_algebra though: the AST that the formatter works with is a 
>>>>>>>> special one, where the literals are wrapped in blocks so we can store 
>>>>>>>> their 
>>>>>>>> metadata. This means that, in order to have quotes_to_algebra, we 
>>>>>>>> would 
>>>>>>>> need to change the formatter to also handle “regular AST” or nodes 
>>>>>>>> with 
>>>>>>>> limited metadata.
>>>>>>>> It is doable, but it is work, and a requirement to expose said 
>>>>>>>> functionality.
>>>>>>>> About the comments, I like the suggestion. Although we should 
>>>>>>>> probably move from a tuple to a map to avoid breaking changes in the 
>>>>>>>> future. A PR for this particular issue is very welcome!
>>>>>>>> Thank you for the proposal and thinking about these problems.
>>>>>>>> On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> The motivation for this proposal is to make it easier for tools to 
>>>>>>>>> alter and format elixir code while preserving the current behavior of 
>>>>>>>>> the 
>>>>>>>>> formatter. Most of the functionality is already there, and a little 
>>>>>>>>> change 
>>>>>>>>> in the APIs would enable a wide variety of new use cases.
>>>>>>>>>
>>>>>>>>> If we want to transform a piece of code from one form to another, 
>>>>>>>>> we can modify the AST and then convert it back to a string. For 
>>>>>>>>> example, a 
>>>>>>>>> tool could detect the usage of `String.to_atom` and not only warn 
>>>>>>>>> about 
>>>>>>>>> unsafe string to atom conversion, but also give an option to 
>>>>>>>>> automatically 
>>>>>>>>> fix the issue, replacing it with `String.to_existing_atom`. The first 
>>>>>>>>> part 
>>>>>>>>> is already covered by tools like credo, but it seems that 
>>>>>>>>> manipulating the 
>>>>>>>>> source code itself is difficult, mostly because the AST does not 
>>>>>>>>> contain 
>>>>>>>>> information about comments and because `Macro.to_string` doesn't 
>>>>>>>>> produce 
>>>>>>>>> text that complies with the elixir coding conventions. For example, 
>>>>>>>>> this 
>>>>>>>>> code:
>>>>>>>>> ```elixir
>>>>>>>>> def foo(bar) do
>>>>>>>>>   # Some comment
>>>>>>>>>   :baz
>>>>>>>>> end
>>>>>>>>> ```
>>>>>>>>> Would be printed as this:
>>>>>>>>> ```elixir
>>>>>>>>> def(fop(bar)) do
>>>>>>>>>   :baz
>>>>>>>>> end
>>>>>>>>> ```
>>>>>>>>> Tools like https://github.com/KronicDeth/intellij-elixir 
>>>>>>>>> implement their own parsers to circumvent this issue, but I think it 
>>>>>>>>> would 
>>>>>>>>> be nice if it could be achieved with the existing parser in core 
>>>>>>>>> elixir.
>>>>>>>>>
>>>>>>>>> I've seen other conversations where it was suggested to change the 
>>>>>>>>> elixir AST to support comments, either adding a new comment 
>>>>>>>>> node(breaking 
>>>>>>>>> change) or using the nodes metadata, but the conclusion was that 
>>>>>>>>> there was 
>>>>>>>>> no clear preference on how to do this at the AST level, and being 
>>>>>>>>> that the 
>>>>>>>>> elixir tokenizer allows us to pass a function to process the 
>>>>>>>>> comments, José 
>>>>>>>>> suggested to keep the them on a side data structure instead of in the 
>>>>>>>>> AST 
>>>>>>>>> itself. This is what the Elixir formatter does.
>>>>>>>>>
>>>>>>>>> Currently the `Code.Formatter` module is private API used by the 
>>>>>>>>> `Code.format_string` function. This means that the only way to format 
>>>>>>>>> elixir code is by providing a string(or a file) to a function in the 
>>>>>>>>> `Code` 
>>>>>>>>> module. If we are transforming the code, however, what we have is a 
>>>>>>>>> quoted 
>>>>>>>>> expression, thus we don't have a way to turn it back into a string.
>>>>>>>>>
>>>>>>>>> At a high level, the `Code.Formatter.to_algebra` does three things:
>>>>>>>>> 1. It extracts the comments from the source code
>>>>>>>>> 2. It parses the source code into a quoted expression
>>>>>>>>> 3. It takes the comments and the quoted expressions and merges 
>>>>>>>>> them to produce an algebra document
>>>>>>>>>
>>>>>>>>> What I propose is to split the `Code.Formatter.to_algebra` 
>>>>>>>>> considering those steps, and expose the functionality via the `Code` 
>>>>>>>>> module. The reasoning is that if a user has access to both the ast 
>>>>>>>>> and the 
>>>>>>>>> comments, they can then transform the ast, and return back both the 
>>>>>>>>> ast and 
>>>>>>>>> comments to the formatter to produce an algebra document. *How 
>>>>>>>>> they implement this is up to them* and Elixir doesn't need to 
>>>>>>>>> give an opinion on how comments should be handled during those 
>>>>>>>>> manipulations, nor does it need to expose the private version of the 
>>>>>>>>> AST 
>>>>>>>>> used internally by the formatter. If they want to merge comments into 
>>>>>>>>> the 
>>>>>>>>> metadata or use custom nodes, it's completely up to them, they just 
>>>>>>>>> need to 
>>>>>>>>> return back a valid quoted expression and a list of comments with 
>>>>>>>>> their 
>>>>>>>>> metadata.
>>>>>>>>>
>>>>>>>>> The other reason I think this should be done by exposing those 
>>>>>>>>> functions is that there's a great a amount of work put into the 
>>>>>>>>> formatter 
>>>>>>>>> to turn the quoted expressions into formatted algebra documents, and 
>>>>>>>>> I 
>>>>>>>>> think all of that could be reused, eliminating the need for custom 
>>>>>>>>> formatters.
>>>>>>>>>
>>>>>>>>> The workflow would be something like this:
>>>>>>>>> ```elixir
>>>>>>>>> {:ok, {quoted, comments}} = File.read!(path) |> 
>>>>>>>>> Code.string_to_quoted_with_comments()
>>>>>>>>> quoted  = do_some_manipulation(quoted , comments)
>>>>>>>>> {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments)
>>>>>>>>> new_source = Inspect.Algebra.format(doc, 80)
>>>>>>>>> ```
>>>>>>>>> I'm not married to those function names and return values, but I 
>>>>>>>>> hope they serve to convey the idea.
>>>>>>>>>
>>>>>>>>> I already have some little examples of source code transformations 
>>>>>>>>> using an API like the above, I'm working on reducing and tidying 
>>>>>>>>> them, but 
>>>>>>>>> in the meantime I would like to hear you opinions on this proposal.
>>>>>>>>>
>>>>>>>>> The only downside I can think of is that the `{line, 
>>>>>>>>> {previous_eol, next_eol}, text}` format of the comments may be 
>>>>>>>>> considered a 
>>>>>>>>> private data structure, but the formatter hasn't changed much since 
>>>>>>>>> 1.6(3 
>>>>>>>>> years ago) and I think it could be considered stable enough to be 
>>>>>>>>> exposed.
>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "elixir-lang-core" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>  -- 
>>>>>>>>  You received this message because you are subscribed to the Google 
>>>>>>>> Groups "elixir-lang-core" group.
>>>>>>>>  To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to [email protected].
>>>>>>>>
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com
>>>>>>>>  
>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "elixir-lang-core" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>>
>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elixir-lang-core" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>>
>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elixir-lang-core" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>>
>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elixir-lang-core/30f42fa5-b8c9-47ce-8f46-e90b800ecb3cn%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elixir-lang-core/30f42fa5-b8c9-47ce-8f46-e90b800ecb3cn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/c4fbbfa3-0253-46d0-b90a-679117e8dc39n%40googlegroups.com.

Re: [elixir-core:10261] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

Reply via email to