I'm currently experimenting on the second point you outlined I've uploaded what I've been trying so far to this repository: https://github.com/doorgan/formatter. The relevant file is https://github.com/doorgan/formatter/blob/main/lib/normalizer.ex as well as the tests.
The approach I'm taking there is to walk the AST and manually wrap literals when I find them, according to the context(when possible). I think I'm able to produce the kind of AST the formatter would expect, and produces fairly adequate code, except for cases like mixed keyword lists ([1, 2, a: b]) for which I still have to figure out how to recreate. I had to add a literal? field to the literal blocks to distinguish them from regular blocks and prevent the normalizer from recursing infinitely. Maybe this could be avoided, I need to figure how, though. I would like to know if you think this is a good aproach, or I should try to change the various block_to_algebra and quoted_to_algebra to handle literals. I'm starting to believe this may be the cleanest aproach, and I would need to expand the state map to hold metadata from the parent node(so I can fill line numbers, or metadata like format: :keyword to keyword lists). El viernes, 7 de mayo de 2021 a las 11:07:31 UTC-3, José Valim escribió: > I think there are two fronts: > > 1. Today, all of the API the formatter uses, except the extraction of code > comments, is public. So providing a string_to_quoted_with_comments would be > a logical next step. We only need to convert the comments from a tuple to a > map before. > > 2. The second step is more complex and needs more research, which is to > figure out how exactly to change the formatter to accept general AST. > > Please coordinate between you because it seems some work may have already > been done in those areas. I will be glad to provide assistance and answer > questions. :) > > On Fri, May 7, 2021 at 3:51 PM Tonći Galić <[email protected]> wrote: > >> Hi folks, >> >> Nice to see someone got the ball rolling, I'm also interested (and would >> like to contribute). For me it's a bit unclear what the next steps would >> be? >> If someone could clarify, it'd be happy to spend some time on PR >> >> Best, >> Tonći >> >> On Friday, May 7, 2021 at 6:15:48 AM UTC+2 José Valim wrote: >> >>> Correct. Those are the trade-offs. We could for example have >>> Code.string_to_quoted_formatter_options which returns the formatter options >>> used for parsing the AST but then users must know that we now have a >>> special kind of AST. And nodes introduced by user may still not be >>> annotated - which means we still need to care about those scenarios. >>> >>> On Fri, May 7, 2021 at 5:07 AM i Dorgan <[email protected]> wrote: >>> >>>> Thanks for your thoughts! >>>> >>>> > If anyone is interested, I was playing with these ideas and you can >>>> see some example use cases here: >>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But >>>> exactly because Code.Formatter is private, I had to vendor it and that >>>> code >>>> broke on recent Elixir versions. >>>> >>>> I really like what you did there. With "that code broke on recent >>>> Elixir versions" you mean the addition of stepped ranges? I see that the >>>> wrapping of literals via the `literal_encoder` hasn't changed since June >>>> of >>>> 2019. >>>> >>>> > There is one issue about quoted_to_algebra though: the AST that the >>>> formatter works with is a special one, where the literals are wrapped in >>>> blocks so we can store their metadata. This means that, in order to have >>>> quotes_to_algebra, we would need to change the formatter to also handle >>>> “regular AST” or nodes with limited metadata. >>>> > >>>> > It is doable, but it is work, and a requirement to expose said >>>> functionality. >>>> >>>> I'm exploring ways to achieve this, and I can think of two approaches: >>>> a - traverse the received AST and wrap the literals in blocks >>>> b - change the various functions used to convert the AST into algebra >>>> fragments so that they can also handle bare literals >>>> >>>> After some experimentation I saw that option a is doable, but I feel >>>> that I am recreating part of the work that the tokenizer already does and >>>> that it would become a burden to maintain whenever the tokenizer changes. >>>> I haven't tried option b yet, I'm still trying to get familiar with the >>>> quoted -> algebra conversion(and algebra documents in general), but my >>>> first thought is that it would be hard to figure out over which literal a >>>> comment should be placed. I imagine that a quoted expression representing >>>> this code: >>>> 1 def foo() do >>>> 2 :hello >>>> 3 >>>> 4 :elixir >>>> 5 >>>> 6 >>>> 7 :world >>>> 8 end >>>> And a comment {5, {_, _}, "Hello!"} >>>> If we don't have line number information, where should the comment be >>>> placed? We don't know where the literals are, they may be at lines 2, 3 >>>> and >>>> 4, or 2, 4 and 7, they could be anywhere. All we can tell is that the >>>> comment goes somewhere between lines 1 and 8, so any position would be ok. >>>> So having the formatter accept bare literals would mean having to work >>>> with >>>> insufficient data and it would produce a bad output. This issue is also >>>> present with option a, but changing the part of the formatter that >>>> generates the algebra document would essentially mean to *break* the >>>> formatter in multiple places. In summary, I'm not sure about the >>>> usefulness >>>> of a user supplied ast with no line numbers associated to literals, as it >>>> is hard to apropiately recreate the source code when comments are involved. >>>> I'm not happy with neither approach and I will try to think of better >>>> solutions. >>>> >>>> On the other hand, the literal encoder is using a public api. A user >>>> can already generate AST from a string with the literals wrapped in a >>>> block >>>> via the :literal_encoder option of Code.string_to_quoted, so I think the >>>> literal token metadata should be considered public api(changes to it can >>>> already break third party code today). Wrapping a literal in a block and >>>> storing the token metadata in it has proven to be a very simple and >>>> practical solution to the problem of literals not carrying any metadata >>>> while still using valid AST, so maybe exposing that can also be considered >>>> as a viable alternative? This would go against my original position of >>>> "nor >>>> does it need to expose the private version of the AST used internally by >>>> the formatter", but in practical terms I think it would provide more value >>>> if it were exposed. The comments data structure would already be exposed >>>> and we would be requiring users to provide comments in that form, so I >>>> think requiring an AST with literals shaped in a particular way wouldn't >>>> be >>>> unreasonable. Of course, if that would be done, the fact that >>>> quoted_to_algebra requires an AST with literals wrapped in that way should >>>> be properly documented. >>>> El jueves, 6 de mayo de 2021 a las 3:07:33 UTC-3, Wojtek Mach escribió: >>>> >>>>> Looking forward to any progress on this front. >>>>> >>>>> If anyone is interested, I was playing with these ideas and you can >>>>> see some example use cases here: >>>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But >>>>> exactly because Code.Formatter is private, I had to vendor it and that >>>>> code >>>>> broke on recent Elixir versions. >>>>> >>>>> On May 5, 2021, "José Valim" <[email protected]> wrote: >>>>> >>>>> I believe those ideas are promising. There is one issue about >>>>> quoted_to_algebra though: the AST that the formatter works with is a >>>>> special one, where the literals are wrapped in blocks so we can store >>>>> their >>>>> metadata. This means that, in order to have quotes_to_algebra, we would >>>>> need to change the formatter to also handle “regular AST” or nodes with >>>>> limited metadata. >>>>> It is doable, but it is work, and a requirement to expose said >>>>> functionality. >>>>> About the comments, I like the suggestion. Although we should probably >>>>> move from a tuple to a map to avoid breaking changes in the future. A PR >>>>> for this particular issue is very welcome! >>>>> Thank you for the proposal and thinking about these problems. >>>>> On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote: >>>>> >>>>> Hi all, >>>>>> >>>>>> The motivation for this proposal is to make it easier for tools to >>>>>> alter and format elixir code while preserving the current behavior of >>>>>> the >>>>>> formatter. Most of the functionality is already there, and a little >>>>>> change >>>>>> in the APIs would enable a wide variety of new use cases. >>>>>> >>>>>> If we want to transform a piece of code from one form to another, we >>>>>> can modify the AST and then convert it back to a string. For example, a >>>>>> tool could detect the usage of `String.to_atom` and not only warn about >>>>>> unsafe string to atom conversion, but also give an option to >>>>>> automatically >>>>>> fix the issue, replacing it with `String.to_existing_atom`. The first >>>>>> part >>>>>> is already covered by tools like credo, but it seems that manipulating >>>>>> the >>>>>> source code itself is difficult, mostly because the AST does not contain >>>>>> information about comments and because `Macro.to_string` doesn't produce >>>>>> text that complies with the elixir coding conventions. For example, this >>>>>> code: >>>>>> ```elixir >>>>>> def foo(bar) do >>>>>> # Some comment >>>>>> :baz >>>>>> end >>>>>> ``` >>>>>> Would be printed as this: >>>>>> ```elixir >>>>>> def(fop(bar)) do >>>>>> :baz >>>>>> end >>>>>> ``` >>>>>> Tools like https://github.com/KronicDeth/intellij-elixir implement >>>>>> their own parsers to circumvent this issue, but I think it would be nice >>>>>> if >>>>>> it could be achieved with the existing parser in core elixir. >>>>>> >>>>>> I've seen other conversations where it was suggested to change the >>>>>> elixir AST to support comments, either adding a new comment >>>>>> node(breaking >>>>>> change) or using the nodes metadata, but the conclusion was that there >>>>>> was >>>>>> no clear preference on how to do this at the AST level, and being that >>>>>> the >>>>>> elixir tokenizer allows us to pass a function to process the comments, >>>>>> José >>>>>> suggested to keep the them on a side data structure instead of in the >>>>>> AST >>>>>> itself. This is what the Elixir formatter does. >>>>>> >>>>>> Currently the `Code.Formatter` module is private API used by the >>>>>> `Code.format_string` function. This means that the only way to format >>>>>> elixir code is by providing a string(or a file) to a function in the >>>>>> `Code` >>>>>> module. If we are transforming the code, however, what we have is a >>>>>> quoted >>>>>> expression, thus we don't have a way to turn it back into a string. >>>>>> >>>>>> At a high level, the `Code.Formatter.to_algebra` does three things: >>>>>> 1. It extracts the comments from the source code >>>>>> 2. It parses the source code into a quoted expression >>>>>> 3. It takes the comments and the quoted expressions and merges them >>>>>> to produce an algebra document >>>>>> >>>>>> What I propose is to split the `Code.Formatter.to_algebra` >>>>>> considering those steps, and expose the functionality via the `Code` >>>>>> module. The reasoning is that if a user has access to both the ast and >>>>>> the >>>>>> comments, they can then transform the ast, and return back both the ast >>>>>> and >>>>>> comments to the formatter to produce an algebra document. *How they >>>>>> implement this is up to them* and Elixir doesn't need to give an >>>>>> opinion on how comments should be handled during those manipulations, >>>>>> nor >>>>>> does it need to expose the private version of the AST used internally by >>>>>> the formatter. If they want to merge comments into the metadata or use >>>>>> custom nodes, it's completely up to them, they just need to return back >>>>>> a >>>>>> valid quoted expression and a list of comments with their metadata. >>>>>> >>>>>> The other reason I think this should be done by exposing those >>>>>> functions is that there's a great a amount of work put into the >>>>>> formatter >>>>>> to turn the quoted expressions into formatted algebra documents, and I >>>>>> think all of that could be reused, eliminating the need for custom >>>>>> formatters. >>>>>> >>>>>> The workflow would be something like this: >>>>>> ```elixir >>>>>> {:ok, {quoted, comments}} = File.read!(path) |> >>>>>> Code.string_to_quoted_with_comments() >>>>>> quoted = do_some_manipulation(quoted , comments) >>>>>> {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments) >>>>>> new_source = Inspect.Algebra.format(doc, 80) >>>>>> ``` >>>>>> I'm not married to those function names and return values, but I hope >>>>>> they serve to convey the idea. >>>>>> >>>>>> I already have some little examples of source code transformations >>>>>> using an API like the above, I'm working on reducing and tidying them, >>>>>> but >>>>>> in the meantime I would like to hear you opinions on this proposal. >>>>>> >>>>>> The only downside I can think of is that the `{line, {previous_eol, >>>>>> next_eol}, text}` format of the comments may be considered a private >>>>>> data >>>>>> structure, but the formatter hasn't changed much since 1.6(3 years ago) >>>>>> and >>>>>> I think it could be considered stable enough to be exposed. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elixir-lang-core" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elixir-lang-core" group. >>>>> To unsubscribe from this group and stop receiving emails from it, >>>>> send an email to [email protected]. >>>>> >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com >>>>> >>>>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elixir-lang-core" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elixir-lang-core" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/30f42fa5-b8c9-47ce-8f46-e90b800ecb3cn%40googlegroups.com.
