Re: [elixir-core:10261] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

José Valim Tue, 18 May 2021 06:03:30 -0700

We don't need to add new modules to the user API but internally we can
split them in whatever way we want. Splitting also depends on how large it
is, but probably best to avoid adding it to the formatter since that's
already 2k LOC.



On Tue, May 18, 2021 at 2:49 PM i Dorgan <[email protected]> wrote:

> Hi all,
>
> Now that https://github.com/elixir-lang/elixir/pull/10986/ is merged I
> was meaning to prepare a PR for the normalizing step. Would you prefer to
> have that code in the Code.Formatter module, or in a separate
> Code.Normalizer module? Personally I would prefer the latter, I think it
> would make the code that does the actual formatting easier to follow.
>
> El domingo, 9 de mayo de 2021 a las 0:20:37 UTC-3, i Dorgan escribió:
>
>> I tried manual traversal and that made things so much easier, the
>> literal? field is no longer needed. I made good progress on the conversion
>> and I think I have covered most of the edge cases, judging from the tests I
>> wrote:
>> https://github.com/doorgan/formatter/blob/main/test/normalize_test.exs
>> The approach I took for testing was to run the formatter with both the
>> AST generated internally by `Code.Formatter.to_algebra` and with the one
>> I'm creating, as well as feeding the special AST to my normalizer to make
>> sure literals don't get wrapped twice. I did not test the specific shapes
>> the AST should have in most cases but the most basic ones, but it may be a
>> good idea to do so.
>>
>> My next step would be to test it with comments, I expect them not land in
>> the correct lines in every case because I'm making literals inherit the
>> line number of their parents and heredocs lose delimiter information, but
>> at the very least it the formatter should not break.
>>
>> If this works then I think I'd be in a good position to submit a PR, once
>> a PR to expose the comments maps is submitted.
>> El sábado, 8 de mayo de 2021 a las 2:54:13 UTC-3, José Valim escribió:
>>
>>> > I had to add a literal? field to the literal blocks to distinguish
>>> them from regular blocks and prevent the normalizer from recursing
>>> infinitely.
>>>
>>> You can always traverse manually. The number of AST nodes are small and
>>> it will give you more control. Alternatively, give postwalk a try, since
>>> the block node you add on postwalk won't be traversed (IIRC).
>>>
>>> The approach looks good to me. Note for some of those, I would prefer to
>>> push the fix to the formatter (for example, we can assume :delimiter
>>> defaults to ~s"").
>>>
>>> On Sat, May 8, 2021 at 12:12 AM i Dorgan <[email protected]> wrote:
>>>
>>>> I'm currently experimenting on the second point you outlined
>>>>
>>>> I've uploaded what I've been trying so far to this repository:
>>>> https://github.com/doorgan/formatter. The relevant file is
>>>> https://github.com/doorgan/formatter/blob/main/lib/normalizer.ex as
>>>> well as the tests.
>>>>
>>>> The approach I'm taking there is to walk the AST and manually wrap
>>>> literals when I find them, according to the context(when possible). I think
>>>> I'm able to produce the kind of AST the formatter would expect, and
>>>> produces fairly adequate code, except for cases like mixed keyword lists
>>>> ([1, 2, a: b]) for which I still have to figure out how to recreate. I had
>>>> to add a literal? field to the literal blocks to distinguish them from
>>>> regular blocks and prevent the normalizer from recursing infinitely. Maybe
>>>> this could be avoided, I need to figure how, though.
>>>>
>>>> I would like to know if you think this is a good aproach, or I should
>>>> try to change the various block_to_algebra and quoted_to_algebra to handle
>>>> literals.
>>>>  I'm starting to believe this may be the cleanest aproach, and I would
>>>> need to expand the state map to hold metadata from the parent node(so I can
>>>> fill line numbers, or metadata like format: :keyword to keyword lists).
>>>>
>>>> El viernes, 7 de mayo de 2021 a las 11:07:31 UTC-3, José Valim escribió:
>>>>
>>>>> I think there are two fronts:
>>>>>
>>>>> 1. Today, all of the API the formatter uses, except the extraction of
>>>>> code comments, is public. So providing a string_to_quoted_with_comments
>>>>> would be a logical next step. We only need to convert the comments from a
>>>>> tuple to a map before.
>>>>>
>>>>> 2. The second step is more complex and needs more research, which is
>>>>> to figure out how exactly to change the formatter to accept general AST.
>>>>>
>>>>> Please coordinate between you because it seems some work may have
>>>>> already been done in those areas. I will be glad to provide assistance and
>>>>> answer questions. :)
>>>>>
>>>>> On Fri, May 7, 2021 at 3:51 PM Tonći Galić <[email protected]> wrote:
>>>>>
>>>>>> Hi folks,
>>>>>>
>>>>>> Nice to see someone got the ball rolling, I'm also interested (and
>>>>>> would like to contribute). For me it's a bit unclear what the next steps
>>>>>> would be?
>>>>>> If someone could clarify, it'd be happy to spend some time on PR
>>>>>>
>>>>>> Best,
>>>>>> Tonći
>>>>>>
>>>>>> On Friday, May 7, 2021 at 6:15:48 AM UTC+2 José Valim wrote:
>>>>>>
>>>>>>> Correct. Those are the trade-offs. We could for example have
>>>>>>> Code.string_to_quoted_formatter_options which returns the formatter 
>>>>>>> options
>>>>>>> used for parsing the AST but then users must know that we now have a
>>>>>>> special kind of AST. And nodes introduced by user may still not be
>>>>>>> annotated - which means we still need to care about those scenarios.
>>>>>>>
>>>>>>> On Fri, May 7, 2021 at 5:07 AM i Dorgan <[email protected]> wrote:
>>>>>>>
>>>>>>>> Thanks for your thoughts!
>>>>>>>>
>>>>>>>> > If anyone is interested, I was playing with these ideas and you
>>>>>>>> can see some example use cases here:
>>>>>>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs.
>>>>>>>> But exactly because Code.Formatter is private, I had to vendor it and 
>>>>>>>> that
>>>>>>>> code broke on recent Elixir versions.
>>>>>>>>
>>>>>>>> I really like what you did there. With "that code broke on recent
>>>>>>>> Elixir versions" you mean the addition of stepped ranges? I see that 
>>>>>>>> the
>>>>>>>> wrapping of literals via the `literal_encoder` hasn't changed since 
>>>>>>>> June of
>>>>>>>> 2019.
>>>>>>>>
>>>>>>>> > There is one issue about quoted_to_algebra though: the AST that
>>>>>>>> the formatter works with is a special one, where the literals are 
>>>>>>>> wrapped
>>>>>>>> in blocks so we can store their metadata. This means that, in order to 
>>>>>>>> have
>>>>>>>> quotes_to_algebra, we would need to change the formatter to also handle
>>>>>>>> “regular AST” or nodes with limited metadata.
>>>>>>>> >
>>>>>>>> > It is doable, but it is work, and a requirement to expose said
>>>>>>>> functionality.
>>>>>>>>
>>>>>>>> I'm exploring ways to achieve this, and I can think of two
>>>>>>>> approaches:
>>>>>>>> a - traverse the received AST and wrap the literals in blocks
>>>>>>>> b - change the various functions used to convert the AST into
>>>>>>>> algebra fragments so that they can also handle bare literals
>>>>>>>>
>>>>>>>> After some experimentation I saw that option a is doable, but I
>>>>>>>> feel that I am recreating part of the work that the tokenizer already 
>>>>>>>> does
>>>>>>>> and that it would become a burden to maintain whenever the tokenizer
>>>>>>>> changes.
>>>>>>>> I haven't tried option b yet, I'm still trying to get familiar with
>>>>>>>> the quoted -> algebra conversion(and algebra documents in general), 
>>>>>>>> but my
>>>>>>>> first thought is that it would be hard to figure out over which 
>>>>>>>> literal a
>>>>>>>> comment should be placed. I imagine that a quoted expression 
>>>>>>>> representing
>>>>>>>> this code:
>>>>>>>> 1  def foo() do
>>>>>>>> 2    :hello
>>>>>>>> 3
>>>>>>>> 4    :elixir
>>>>>>>> 5
>>>>>>>> 6
>>>>>>>> 7    :world
>>>>>>>> 8  end
>>>>>>>> And a comment {5, {_, _}, "Hello!"}
>>>>>>>> If we don't have line number information, where should the comment
>>>>>>>> be placed? We don't know where the literals are, they may be at lines 
>>>>>>>> 2, 3
>>>>>>>> and 4, or 2, 4 and 7, they could be anywhere. All we can tell is that 
>>>>>>>> the
>>>>>>>> comment goes somewhere between lines 1 and 8, so any position would be 
>>>>>>>> ok.
>>>>>>>> So having the formatter accept bare literals would mean having to work 
>>>>>>>> with
>>>>>>>> insufficient data and it would produce a bad output. This issue is also
>>>>>>>> present with option a, but changing the part of the formatter that
>>>>>>>> generates the algebra document would essentially mean to *break* the
>>>>>>>> formatter in multiple places. In summary, I'm not sure about the 
>>>>>>>> usefulness
>>>>>>>> of a user supplied ast with no line numbers associated to literals, as 
>>>>>>>> it
>>>>>>>> is hard to apropiately recreate the source code when comments are 
>>>>>>>> involved.
>>>>>>>> I'm not happy with neither approach and I will try to think of
>>>>>>>> better solutions.
>>>>>>>>
>>>>>>>> On the other hand, the literal encoder is using a public api. A
>>>>>>>> user can already generate AST from a string with the literals wrapped 
>>>>>>>> in a
>>>>>>>> block via the :literal_encoder option of Code.string_to_quoted, so I 
>>>>>>>> think
>>>>>>>> the literal token metadata should be considered public api(changes to 
>>>>>>>> it
>>>>>>>> can already break third party code today). Wrapping a literal in a 
>>>>>>>> block
>>>>>>>> and storing the token metadata in it has proven to be a very simple and
>>>>>>>> practical solution to the problem of literals not carrying any metadata
>>>>>>>> while still using valid AST, so maybe exposing that can also be 
>>>>>>>> considered
>>>>>>>> as a viable alternative? This would go against my original position of 
>>>>>>>> "nor
>>>>>>>> does it need to expose the private version of the AST used internally 
>>>>>>>> by
>>>>>>>> the formatter", but in practical terms I think it would provide more 
>>>>>>>> value
>>>>>>>> if it were exposed. The comments data structure would already be 
>>>>>>>> exposed
>>>>>>>> and we would be requiring users to provide comments in that form, so I
>>>>>>>> think requiring an AST with literals shaped in a particular way 
>>>>>>>> wouldn't be
>>>>>>>> unreasonable. Of course, if that would be done, the fact that
>>>>>>>> quoted_to_algebra requires an AST with literals wrapped in that way 
>>>>>>>> should
>>>>>>>> be properly documented.
>>>>>>>> El jueves, 6 de mayo de 2021 a las 3:07:33 UTC-3, Wojtek Mach
>>>>>>>> escribió:
>>>>>>>>
>>>>>>>>> Looking forward to any progress on this front.
>>>>>>>>>
>>>>>>>>> If anyone is interested, I was playing with these ideas and you
>>>>>>>>> can see some example use cases here:
>>>>>>>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs.
>>>>>>>>> But exactly because Code.Formatter is private, I had to vendor it and 
>>>>>>>>> that
>>>>>>>>> code broke on recent Elixir versions.
>>>>>>>>>
>>>>>>>>> On May 5, 2021, "José Valim" <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> I believe those ideas are promising. There is one issue about
>>>>>>>>> quoted_to_algebra though: the AST that the formatter works with is a
>>>>>>>>> special one, where the literals are wrapped in blocks so we can store 
>>>>>>>>> their
>>>>>>>>> metadata. This means that, in order to have quotes_to_algebra, we 
>>>>>>>>> would
>>>>>>>>> need to change the formatter to also handle “regular AST” or nodes 
>>>>>>>>> with
>>>>>>>>> limited metadata.
>>>>>>>>> It is doable, but it is work, and a requirement to expose said
>>>>>>>>> functionality.
>>>>>>>>> About the comments, I like the suggestion. Although we should
>>>>>>>>> probably move from a tuple to a map to avoid breaking changes in the
>>>>>>>>> future. A PR for this particular issue is very welcome!
>>>>>>>>> Thank you for the proposal and thinking about these problems.
>>>>>>>>> On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> The motivation for this proposal is to make it easier for tools
>>>>>>>>>> to alter and format elixir code while preserving the current 
>>>>>>>>>> behavior of
>>>>>>>>>> the formatter. Most of the functionality is already there, and a 
>>>>>>>>>> little
>>>>>>>>>> change in the APIs would enable a wide variety of new use cases.
>>>>>>>>>>
>>>>>>>>>> If we want to transform a piece of code from one form to another,
>>>>>>>>>> we can modify the AST and then convert it back to a string. For 
>>>>>>>>>> example, a
>>>>>>>>>> tool could detect the usage of `String.to_atom` and not only warn 
>>>>>>>>>> about
>>>>>>>>>> unsafe string to atom conversion, but also give an option to 
>>>>>>>>>> automatically
>>>>>>>>>> fix the issue, replacing it with `String.to_existing_atom`. The 
>>>>>>>>>> first part
>>>>>>>>>> is already covered by tools like credo, but it seems that 
>>>>>>>>>> manipulating the
>>>>>>>>>> source code itself is difficult, mostly because the AST does not 
>>>>>>>>>> contain
>>>>>>>>>> information about comments and because `Macro.to_string` doesn't 
>>>>>>>>>> produce
>>>>>>>>>> text that complies with the elixir coding conventions. For example, 
>>>>>>>>>> this
>>>>>>>>>> code:
>>>>>>>>>> ```elixir
>>>>>>>>>> def foo(bar) do
>>>>>>>>>>   # Some comment
>>>>>>>>>>   :baz
>>>>>>>>>> end
>>>>>>>>>> ```
>>>>>>>>>> Would be printed as this:
>>>>>>>>>> ```elixir
>>>>>>>>>> def(fop(bar)) do
>>>>>>>>>>   :baz
>>>>>>>>>> end
>>>>>>>>>> ```
>>>>>>>>>> Tools like https://github.com/KronicDeth/intellij-elixir
>>>>>>>>>> implement their own parsers to circumvent this issue, but I think it 
>>>>>>>>>> would
>>>>>>>>>> be nice if it could be achieved with the existing parser in core 
>>>>>>>>>> elixir.
>>>>>>>>>>
>>>>>>>>>> I've seen other conversations where it was suggested to change
>>>>>>>>>> the elixir AST to support comments, either adding a new comment
>>>>>>>>>> node(breaking change) or using the nodes metadata, but the 
>>>>>>>>>> conclusion was
>>>>>>>>>> that there was no clear preference on how to do this at the AST 
>>>>>>>>>> level, and
>>>>>>>>>> being that the elixir tokenizer allows us to pass a function to 
>>>>>>>>>> process the
>>>>>>>>>> comments, José suggested to keep the them on a side data structure 
>>>>>>>>>> instead
>>>>>>>>>> of in the AST itself. This is what the Elixir formatter does.
>>>>>>>>>>
>>>>>>>>>> Currently the `Code.Formatter` module is private API used by the
>>>>>>>>>> `Code.format_string` function. This means that the only way to format
>>>>>>>>>> elixir code is by providing a string(or a file) to a function in the 
>>>>>>>>>> `Code`
>>>>>>>>>> module. If we are transforming the code, however, what we have is a 
>>>>>>>>>> quoted
>>>>>>>>>> expression, thus we don't have a way to turn it back into a string.
>>>>>>>>>>
>>>>>>>>>> At a high level, the `Code.Formatter.to_algebra` does three
>>>>>>>>>> things:
>>>>>>>>>> 1. It extracts the comments from the source code
>>>>>>>>>> 2. It parses the source code into a quoted expression
>>>>>>>>>> 3. It takes the comments and the quoted expressions and merges
>>>>>>>>>> them to produce an algebra document
>>>>>>>>>>
>>>>>>>>>> What I propose is to split the `Code.Formatter.to_algebra`
>>>>>>>>>> considering those steps, and expose the functionality via the `Code`
>>>>>>>>>> module. The reasoning is that if a user has access to both the ast 
>>>>>>>>>> and the
>>>>>>>>>> comments, they can then transform the ast, and return back both the 
>>>>>>>>>> ast and
>>>>>>>>>> comments to the formatter to produce an algebra document. *How
>>>>>>>>>> they implement this is up to them* and Elixir doesn't need to
>>>>>>>>>> give an opinion on how comments should be handled during those
>>>>>>>>>> manipulations, nor does it need to expose the private version of the 
>>>>>>>>>> AST
>>>>>>>>>> used internally by the formatter. If they want to merge comments 
>>>>>>>>>> into the
>>>>>>>>>> metadata or use custom nodes, it's completely up to them, they just 
>>>>>>>>>> need to
>>>>>>>>>> return back a valid quoted expression and a list of comments with 
>>>>>>>>>> their
>>>>>>>>>> metadata.
>>>>>>>>>>
>>>>>>>>>> The other reason I think this should be done by exposing those
>>>>>>>>>> functions is that there's a great a amount of work put into the 
>>>>>>>>>> formatter
>>>>>>>>>> to turn the quoted expressions into formatted algebra documents, and 
>>>>>>>>>> I
>>>>>>>>>> think all of that could be reused, eliminating the need for custom
>>>>>>>>>> formatters.
>>>>>>>>>>
>>>>>>>>>> The workflow would be something like this:
>>>>>>>>>> ```elixir
>>>>>>>>>> {:ok, {quoted, comments}} = File.read!(path) |>
>>>>>>>>>> Code.string_to_quoted_with_comments()
>>>>>>>>>> quoted  = do_some_manipulation(quoted , comments)
>>>>>>>>>> {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments)
>>>>>>>>>> new_source = Inspect.Algebra.format(doc, 80)
>>>>>>>>>> ```
>>>>>>>>>> I'm not married to those function names and return values, but I
>>>>>>>>>> hope they serve to convey the idea.
>>>>>>>>>>
>>>>>>>>>> I already have some little examples of source code
>>>>>>>>>> transformations using an API like the above, I'm working on reducing 
>>>>>>>>>> and
>>>>>>>>>> tidying them, but in the meantime I would like to hear you opinions 
>>>>>>>>>> on this
>>>>>>>>>> proposal.
>>>>>>>>>>
>>>>>>>>>> The only downside I can think of is that the `{line,
>>>>>>>>>> {previous_eol, next_eol}, text}` format of the comments may be 
>>>>>>>>>> considered a
>>>>>>>>>> private data structure, but the formatter hasn't changed much since 
>>>>>>>>>> 1.6(3
>>>>>>>>>> years ago) and I think it could be considered stable enough to be 
>>>>>>>>>> exposed.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "elixir-lang-core" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to [email protected].
>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com
>>>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>>  You received this message because you are subscribed to the
>>>>>>>>> Google Groups "elixir-lang-core" group.
>>>>>>>>>  To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>>
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com
>>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "elixir-lang-core" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>>
>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "elixir-lang-core" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>>
>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elixir-lang-core" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/elixir-lang-core/30f42fa5-b8c9-47ce-8f46-e90b800ecb3cn%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/elixir-lang-core/30f42fa5-b8c9-47ce-8f46-e90b800ecb3cn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "elixir-lang-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elixir-lang-core/c4fbbfa3-0253-46d0-b90a-679117e8dc39n%40googlegroups.com
> <https://groups.google.com/d/msgid/elixir-lang-core/c4fbbfa3-0253-46d0-b90a-679117e8dc39n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4K7Z%3DTogQGOsMsDbnomvQN5d%3DiHthx5TXYyFdr%2B6_EOVg%40mail.gmail.com.

Re: [elixir-core:10261] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

Reply via email to