Re: [elixir-core:10240] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

José Valim Thu, 06 May 2021 21:16:13 -0700

Correct. Those are the trade-offs. We could for example have
Code.string_to_quoted_formatter_options which returns the formatter options
used for parsing the AST but then users must know that we now have a
special kind of AST. And nodes introduced by user may still not be
annotated - which means we still need to care about those scenarios.


On Fri, May 7, 2021 at 5:07 AM i Dorgan <[email protected]> wrote:

> Thanks for your thoughts!
>
> > If anyone is interested, I was playing with these ideas and you can see
> some example use cases here:
> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But
> exactly because Code.Formatter is private, I had to vendor it and that code
> broke on recent Elixir versions.
>
> I really like what you did there. With "that code broke on recent Elixir
> versions" you mean the addition of stepped ranges? I see that the wrapping
> of literals via the `literal_encoder` hasn't changed since June of 2019.
>
> > There is one issue about quoted_to_algebra though: the AST that the
> formatter works with is a special one, where the literals are wrapped in
> blocks so we can store their metadata. This means that, in order to have
> quotes_to_algebra, we would need to change the formatter to also handle
> “regular AST” or nodes with limited metadata.
> >
> > It is doable, but it is work, and a requirement to expose said
> functionality.
>
> I'm exploring ways to achieve this, and I can think of two approaches:
> a - traverse the received AST and wrap the literals in blocks
> b - change the various functions used to convert the AST into algebra
> fragments so that they can also handle bare literals
>
> After some experimentation I saw that option a is doable, but I feel that
> I am recreating part of the work that the tokenizer already does and that
> it would become a burden to maintain whenever the tokenizer changes.
> I haven't tried option b yet, I'm still trying to get familiar with the
> quoted -> algebra conversion(and algebra documents in general), but my
> first thought is that it would be hard to figure out over which literal a
> comment should be placed. I imagine that a quoted expression representing
> this code:
> 1  def foo() do
> 2    :hello
> 3
> 4    :elixir
> 5
> 6
> 7    :world
> 8  end
> And a comment {5, {_, _}, "Hello!"}
> If we don't have line number information, where should the comment be
> placed? We don't know where the literals are, they may be at lines 2, 3 and
> 4, or 2, 4 and 7, they could be anywhere. All we can tell is that the
> comment goes somewhere between lines 1 and 8, so any position would be ok.
> So having the formatter accept bare literals would mean having to work with
> insufficient data and it would produce a bad output. This issue is also
> present with option a, but changing the part of the formatter that
> generates the algebra document would essentially mean to *break* the
> formatter in multiple places. In summary, I'm not sure about the usefulness
> of a user supplied ast with no line numbers associated to literals, as it
> is hard to apropiately recreate the source code when comments are involved.
> I'm not happy with neither approach and I will try to think of better
> solutions.
>
> On the other hand, the literal encoder is using a public api. A user can
> already generate AST from a string with the literals wrapped in a block via
> the :literal_encoder option of Code.string_to_quoted, so I think the
> literal token metadata should be considered public api(changes to it can
> already break third party code today). Wrapping a literal in a block and
> storing the token metadata in it has proven to be a very simple and
> practical solution to the problem of literals not carrying any metadata
> while still using valid AST, so maybe exposing that can also be considered
> as a viable alternative? This would go against my original position of "nor
> does it need to expose the private version of the AST used internally by
> the formatter", but in practical terms I think it would provide more value
> if it were exposed. The comments data structure would already be exposed
> and we would be requiring users to provide comments in that form, so I
> think requiring an AST with literals shaped in a particular way wouldn't be
> unreasonable. Of course, if that would be done, the fact that
> quoted_to_algebra requires an AST with literals wrapped in that way should
> be properly documented.
> El jueves, 6 de mayo de 2021 a las 3:07:33 UTC-3, Wojtek Mach escribió:
>
>> Looking forward to any progress on this front.
>>
>> If anyone is interested, I was playing with these ideas and you can see
>> some example use cases here:
>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But
>> exactly because Code.Formatter is private, I had to vendor it and that code
>> broke on recent Elixir versions.
>>
>> On May 5, 2021, "José Valim" <[email protected]> wrote:
>>
>> I believe those ideas are promising. There is one issue about
>> quoted_to_algebra though: the AST that the formatter works with is a
>> special one, where the literals are wrapped in blocks so we can store their
>> metadata. This means that, in order to have quotes_to_algebra, we would
>> need to change the formatter to also handle “regular AST” or nodes with
>> limited metadata.
>> It is doable, but it is work, and a requirement to expose said
>> functionality.
>> About the comments, I like the suggestion. Although we should probably
>> move from a tuple to a map to avoid breaking changes in the future. A PR
>> for this particular issue is very welcome!
>> Thank you for the proposal and thinking about these problems.
>> On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote:
>>
>> Hi all,
>>>
>>> The motivation for this proposal is to make it easier for tools to alter
>>> and format elixir code while preserving the current behavior of the
>>> formatter. Most of the functionality is already there, and a little change
>>> in the APIs would enable a wide variety of new use cases.
>>>
>>> If we want to transform a piece of code from one form to another, we can
>>> modify the AST and then convert it back to a string. For example, a tool
>>> could detect the usage of `String.to_atom` and not only warn about unsafe
>>> string to atom conversion, but also give an option to automatically fix the
>>> issue, replacing it with `String.to_existing_atom`. The first part is
>>> already covered by tools like credo, but it seems that manipulating the
>>> source code itself is difficult, mostly because the AST does not contain
>>> information about comments and because `Macro.to_string` doesn't produce
>>> text that complies with the elixir coding conventions. For example, this
>>> code:
>>> ```elixir
>>> def foo(bar) do
>>>   # Some comment
>>>   :baz
>>> end
>>> ```
>>> Would be printed as this:
>>> ```elixir
>>> def(fop(bar)) do
>>>   :baz
>>> end
>>> ```
>>> Tools like https://github.com/KronicDeth/intellij-elixir implement
>>> their own parsers to circumvent this issue, but I think it would be nice if
>>> it could be achieved with the existing parser in core elixir.
>>>
>>> I've seen other conversations where it was suggested to change the
>>> elixir AST to support comments, either adding a new comment node(breaking
>>> change) or using the nodes metadata, but the conclusion was that there was
>>> no clear preference on how to do this at the AST level, and being that the
>>> elixir tokenizer allows us to pass a function to process the comments, José
>>> suggested to keep the them on a side data structure instead of in the AST
>>> itself. This is what the Elixir formatter does.
>>>
>>> Currently the `Code.Formatter` module is private API used by the
>>> `Code.format_string` function. This means that the only way to format
>>> elixir code is by providing a string(or a file) to a function in the `Code`
>>> module. If we are transforming the code, however, what we have is a quoted
>>> expression, thus we don't have a way to turn it back into a string.
>>>
>>> At a high level, the `Code.Formatter.to_algebra` does three things:
>>> 1. It extracts the comments from the source code
>>> 2. It parses the source code into a quoted expression
>>> 3. It takes the comments and the quoted expressions and merges them to
>>> produce an algebra document
>>>
>>> What I propose is to split the `Code.Formatter.to_algebra` considering
>>> those steps, and expose the functionality via the `Code` module. The
>>> reasoning is that if a user has access to both the ast and the comments,
>>> they can then transform the ast, and return back both the ast and comments
>>> to the formatter to produce an algebra document. *How they implement
>>> this is up to them* and Elixir doesn't need to give an opinion on how
>>> comments should be handled during those manipulations, nor does it need to
>>> expose the private version of the AST used internally by the formatter. If
>>> they want to merge comments into the metadata or use custom nodes, it's
>>> completely up to them, they just need to return back a valid quoted
>>> expression and a list of comments with their metadata.
>>>
>>> The other reason I think this should be done by exposing those functions
>>> is that there's a great a amount of work put into the formatter to turn the
>>> quoted expressions into formatted algebra documents, and I think all of
>>> that could be reused, eliminating the need for custom formatters.
>>>
>>> The workflow would be something like this:
>>> ```elixir
>>> {:ok, {quoted, comments}} = File.read!(path) |>
>>> Code.string_to_quoted_with_comments()
>>> quoted  = do_some_manipulation(quoted , comments)
>>> {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments)
>>> new_source = Inspect.Algebra.format(doc, 80)
>>> ```
>>> I'm not married to those function names and return values, but I hope
>>> they serve to convey the idea.
>>>
>>> I already have some little examples of source code transformations using
>>> an API like the above, I'm working on reducing and tidying them, but in the
>>> meantime I would like to hear you opinions on this proposal.
>>>
>>> The only downside I can think of is that the `{line, {previous_eol,
>>> next_eol}, text}` format of the comments may be considered a private data
>>> structure, but the formatter hasn't changed much since 1.6(3 years ago) and
>>> I think it could be considered stable enough to be exposed.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elixir-lang-core" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>  --
>>  You received this message because you are subscribed to the Google
>> Groups "elixir-lang-core" group.
>>  To unsubscribe from this group and stop receiving emails from it, send
>> an email to [email protected].
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "elixir-lang-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com
> <https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4L75YmGv6x3xZSFDAHgwg7zCjc2mQ%3Dbwev-nLkNLgxc9w%40mail.gmail.com.

Re: [elixir-core:10240] [Proposal] Expose functions from `Code.Formatter` via the `Code` module

Reply via email to