Hi all, Now that https://github.com/elixir-lang/elixir/pull/10986/ is merged I was meaning to prepare a PR for the normalizing step. Would you prefer to have that code in the Code.Formatter module, or in a separate Code.Normalizer module? Personally I would prefer the latter, I think it would make the code that does the actual formatting easier to follow.
El domingo, 9 de mayo de 2021 a las 0:20:37 UTC-3, i Dorgan escribió: > I tried manual traversal and that made things so much easier, the literal? > field is no longer needed. I made good progress on the conversion and I > think I have covered most of the edge cases, judging from the tests I > wrote: > https://github.com/doorgan/formatter/blob/main/test/normalize_test.exs > The approach I took for testing was to run the formatter with both the AST > generated internally by `Code.Formatter.to_algebra` and with the one I'm > creating, as well as feeding the special AST to my normalizer to make sure > literals don't get wrapped twice. I did not test the specific shapes the > AST should have in most cases but the most basic ones, but it may be a good > idea to do so. > > My next step would be to test it with comments, I expect them not land in > the correct lines in every case because I'm making literals inherit the > line number of their parents and heredocs lose delimiter information, but > at the very least it the formatter should not break. > > If this works then I think I'd be in a good position to submit a PR, once > a PR to expose the comments maps is submitted. > El sábado, 8 de mayo de 2021 a las 2:54:13 UTC-3, José Valim escribió: > >> > I had to add a literal? field to the literal blocks to distinguish them >> from regular blocks and prevent the normalizer from recursing infinitely. >> >> You can always traverse manually. The number of AST nodes are small and >> it will give you more control. Alternatively, give postwalk a try, since >> the block node you add on postwalk won't be traversed (IIRC). >> >> The approach looks good to me. Note for some of those, I would prefer to >> push the fix to the formatter (for example, we can assume :delimiter >> defaults to ~s""). >> >> On Sat, May 8, 2021 at 12:12 AM i Dorgan <[email protected]> wrote: >> >>> I'm currently experimenting on the second point you outlined >>> >>> I've uploaded what I've been trying so far to this repository: >>> https://github.com/doorgan/formatter. The relevant file is >>> https://github.com/doorgan/formatter/blob/main/lib/normalizer.ex as >>> well as the tests. >>> >>> The approach I'm taking there is to walk the AST and manually wrap >>> literals when I find them, according to the context(when possible). I think >>> I'm able to produce the kind of AST the formatter would expect, and >>> produces fairly adequate code, except for cases like mixed keyword lists >>> ([1, 2, a: b]) for which I still have to figure out how to recreate. I had >>> to add a literal? field to the literal blocks to distinguish them from >>> regular blocks and prevent the normalizer from recursing infinitely. Maybe >>> this could be avoided, I need to figure how, though. >>> >>> I would like to know if you think this is a good aproach, or I should >>> try to change the various block_to_algebra and quoted_to_algebra to handle >>> literals. >>> I'm starting to believe this may be the cleanest aproach, and I would >>> need to expand the state map to hold metadata from the parent node(so I can >>> fill line numbers, or metadata like format: :keyword to keyword lists). >>> >>> El viernes, 7 de mayo de 2021 a las 11:07:31 UTC-3, José Valim escribió: >>> >>>> I think there are two fronts: >>>> >>>> 1. Today, all of the API the formatter uses, except the extraction of >>>> code comments, is public. So providing a string_to_quoted_with_comments >>>> would be a logical next step. We only need to convert the comments from a >>>> tuple to a map before. >>>> >>>> 2. The second step is more complex and needs more research, which is to >>>> figure out how exactly to change the formatter to accept general AST. >>>> >>>> Please coordinate between you because it seems some work may have >>>> already been done in those areas. I will be glad to provide assistance and >>>> answer questions. :) >>>> >>>> On Fri, May 7, 2021 at 3:51 PM Tonći Galić <[email protected]> wrote: >>>> >>>>> Hi folks, >>>>> >>>>> Nice to see someone got the ball rolling, I'm also interested (and >>>>> would like to contribute). For me it's a bit unclear what the next steps >>>>> would be? >>>>> If someone could clarify, it'd be happy to spend some time on PR >>>>> >>>>> Best, >>>>> Tonći >>>>> >>>>> On Friday, May 7, 2021 at 6:15:48 AM UTC+2 José Valim wrote: >>>>> >>>>>> Correct. Those are the trade-offs. We could for example have >>>>>> Code.string_to_quoted_formatter_options which returns the formatter >>>>>> options >>>>>> used for parsing the AST but then users must know that we now have a >>>>>> special kind of AST. And nodes introduced by user may still not be >>>>>> annotated - which means we still need to care about those scenarios. >>>>>> >>>>>> On Fri, May 7, 2021 at 5:07 AM i Dorgan <[email protected]> wrote: >>>>>> >>>>>>> Thanks for your thoughts! >>>>>>> >>>>>>> > If anyone is interested, I was playing with these ideas and you >>>>>>> can see some example use cases here: >>>>>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. >>>>>>> But exactly because Code.Formatter is private, I had to vendor it and >>>>>>> that >>>>>>> code broke on recent Elixir versions. >>>>>>> >>>>>>> I really like what you did there. With "that code broke on recent >>>>>>> Elixir versions" you mean the addition of stepped ranges? I see that >>>>>>> the >>>>>>> wrapping of literals via the `literal_encoder` hasn't changed since >>>>>>> June of >>>>>>> 2019. >>>>>>> >>>>>>> > There is one issue about quoted_to_algebra though: the AST that >>>>>>> the formatter works with is a special one, where the literals are >>>>>>> wrapped >>>>>>> in blocks so we can store their metadata. This means that, in order to >>>>>>> have >>>>>>> quotes_to_algebra, we would need to change the formatter to also handle >>>>>>> “regular AST” or nodes with limited metadata. >>>>>>> > >>>>>>> > It is doable, but it is work, and a requirement to expose said >>>>>>> functionality. >>>>>>> >>>>>>> I'm exploring ways to achieve this, and I can think of two >>>>>>> approaches: >>>>>>> a - traverse the received AST and wrap the literals in blocks >>>>>>> b - change the various functions used to convert the AST into >>>>>>> algebra fragments so that they can also handle bare literals >>>>>>> >>>>>>> After some experimentation I saw that option a is doable, but I feel >>>>>>> that I am recreating part of the work that the tokenizer already does >>>>>>> and >>>>>>> that it would become a burden to maintain whenever the tokenizer >>>>>>> changes. >>>>>>> I haven't tried option b yet, I'm still trying to get familiar with >>>>>>> the quoted -> algebra conversion(and algebra documents in general), but >>>>>>> my >>>>>>> first thought is that it would be hard to figure out over which literal >>>>>>> a >>>>>>> comment should be placed. I imagine that a quoted expression >>>>>>> representing >>>>>>> this code: >>>>>>> 1 def foo() do >>>>>>> 2 :hello >>>>>>> 3 >>>>>>> 4 :elixir >>>>>>> 5 >>>>>>> 6 >>>>>>> 7 :world >>>>>>> 8 end >>>>>>> And a comment {5, {_, _}, "Hello!"} >>>>>>> If we don't have line number information, where should the comment >>>>>>> be placed? We don't know where the literals are, they may be at lines >>>>>>> 2, 3 >>>>>>> and 4, or 2, 4 and 7, they could be anywhere. All we can tell is that >>>>>>> the >>>>>>> comment goes somewhere between lines 1 and 8, so any position would be >>>>>>> ok. >>>>>>> So having the formatter accept bare literals would mean having to work >>>>>>> with >>>>>>> insufficient data and it would produce a bad output. This issue is also >>>>>>> present with option a, but changing the part of the formatter that >>>>>>> generates the algebra document would essentially mean to *break* the >>>>>>> formatter in multiple places. In summary, I'm not sure about the >>>>>>> usefulness >>>>>>> of a user supplied ast with no line numbers associated to literals, as >>>>>>> it >>>>>>> is hard to apropiately recreate the source code when comments are >>>>>>> involved. >>>>>>> I'm not happy with neither approach and I will try to think of >>>>>>> better solutions. >>>>>>> >>>>>>> On the other hand, the literal encoder is using a public api. A user >>>>>>> can already generate AST from a string with the literals wrapped in a >>>>>>> block >>>>>>> via the :literal_encoder option of Code.string_to_quoted, so I think >>>>>>> the >>>>>>> literal token metadata should be considered public api(changes to it >>>>>>> can >>>>>>> already break third party code today). Wrapping a literal in a block >>>>>>> and >>>>>>> storing the token metadata in it has proven to be a very simple and >>>>>>> practical solution to the problem of literals not carrying any metadata >>>>>>> while still using valid AST, so maybe exposing that can also be >>>>>>> considered >>>>>>> as a viable alternative? This would go against my original position of >>>>>>> "nor >>>>>>> does it need to expose the private version of the AST used internally >>>>>>> by >>>>>>> the formatter", but in practical terms I think it would provide more >>>>>>> value >>>>>>> if it were exposed. The comments data structure would already be >>>>>>> exposed >>>>>>> and we would be requiring users to provide comments in that form, so I >>>>>>> think requiring an AST with literals shaped in a particular way >>>>>>> wouldn't be >>>>>>> unreasonable. Of course, if that would be done, the fact that >>>>>>> quoted_to_algebra requires an AST with literals wrapped in that way >>>>>>> should >>>>>>> be properly documented. >>>>>>> El jueves, 6 de mayo de 2021 a las 3:07:33 UTC-3, Wojtek Mach >>>>>>> escribió: >>>>>>> >>>>>>>> Looking forward to any progress on this front. >>>>>>>> >>>>>>>> If anyone is interested, I was playing with these ideas and you can >>>>>>>> see some example use cases here: >>>>>>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. >>>>>>>> But exactly because Code.Formatter is private, I had to vendor it and >>>>>>>> that >>>>>>>> code broke on recent Elixir versions. >>>>>>>> >>>>>>>> On May 5, 2021, "José Valim" <[email protected]> wrote: >>>>>>>> >>>>>>>> I believe those ideas are promising. There is one issue about >>>>>>>> quoted_to_algebra though: the AST that the formatter works with is a >>>>>>>> special one, where the literals are wrapped in blocks so we can store >>>>>>>> their >>>>>>>> metadata. This means that, in order to have quotes_to_algebra, we >>>>>>>> would >>>>>>>> need to change the formatter to also handle “regular AST” or nodes >>>>>>>> with >>>>>>>> limited metadata. >>>>>>>> It is doable, but it is work, and a requirement to expose said >>>>>>>> functionality. >>>>>>>> About the comments, I like the suggestion. Although we should >>>>>>>> probably move from a tuple to a map to avoid breaking changes in the >>>>>>>> future. A PR for this particular issue is very welcome! >>>>>>>> Thank you for the proposal and thinking about these problems. >>>>>>>> On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> The motivation for this proposal is to make it easier for tools to >>>>>>>>> alter and format elixir code while preserving the current behavior of >>>>>>>>> the >>>>>>>>> formatter. Most of the functionality is already there, and a little >>>>>>>>> change >>>>>>>>> in the APIs would enable a wide variety of new use cases. >>>>>>>>> >>>>>>>>> If we want to transform a piece of code from one form to another, >>>>>>>>> we can modify the AST and then convert it back to a string. For >>>>>>>>> example, a >>>>>>>>> tool could detect the usage of `String.to_atom` and not only warn >>>>>>>>> about >>>>>>>>> unsafe string to atom conversion, but also give an option to >>>>>>>>> automatically >>>>>>>>> fix the issue, replacing it with `String.to_existing_atom`. The first >>>>>>>>> part >>>>>>>>> is already covered by tools like credo, but it seems that >>>>>>>>> manipulating the >>>>>>>>> source code itself is difficult, mostly because the AST does not >>>>>>>>> contain >>>>>>>>> information about comments and because `Macro.to_string` doesn't >>>>>>>>> produce >>>>>>>>> text that complies with the elixir coding conventions. For example, >>>>>>>>> this >>>>>>>>> code: >>>>>>>>> ```elixir >>>>>>>>> def foo(bar) do >>>>>>>>> # Some comment >>>>>>>>> :baz >>>>>>>>> end >>>>>>>>> ``` >>>>>>>>> Would be printed as this: >>>>>>>>> ```elixir >>>>>>>>> def(fop(bar)) do >>>>>>>>> :baz >>>>>>>>> end >>>>>>>>> ``` >>>>>>>>> Tools like https://github.com/KronicDeth/intellij-elixir >>>>>>>>> implement their own parsers to circumvent this issue, but I think it >>>>>>>>> would >>>>>>>>> be nice if it could be achieved with the existing parser in core >>>>>>>>> elixir. >>>>>>>>> >>>>>>>>> I've seen other conversations where it was suggested to change the >>>>>>>>> elixir AST to support comments, either adding a new comment >>>>>>>>> node(breaking >>>>>>>>> change) or using the nodes metadata, but the conclusion was that >>>>>>>>> there was >>>>>>>>> no clear preference on how to do this at the AST level, and being >>>>>>>>> that the >>>>>>>>> elixir tokenizer allows us to pass a function to process the >>>>>>>>> comments, José >>>>>>>>> suggested to keep the them on a side data structure instead of in the >>>>>>>>> AST >>>>>>>>> itself. This is what the Elixir formatter does. >>>>>>>>> >>>>>>>>> Currently the `Code.Formatter` module is private API used by the >>>>>>>>> `Code.format_string` function. This means that the only way to format >>>>>>>>> elixir code is by providing a string(or a file) to a function in the >>>>>>>>> `Code` >>>>>>>>> module. If we are transforming the code, however, what we have is a >>>>>>>>> quoted >>>>>>>>> expression, thus we don't have a way to turn it back into a string. >>>>>>>>> >>>>>>>>> At a high level, the `Code.Formatter.to_algebra` does three things: >>>>>>>>> 1. It extracts the comments from the source code >>>>>>>>> 2. It parses the source code into a quoted expression >>>>>>>>> 3. It takes the comments and the quoted expressions and merges >>>>>>>>> them to produce an algebra document >>>>>>>>> >>>>>>>>> What I propose is to split the `Code.Formatter.to_algebra` >>>>>>>>> considering those steps, and expose the functionality via the `Code` >>>>>>>>> module. The reasoning is that if a user has access to both the ast >>>>>>>>> and the >>>>>>>>> comments, they can then transform the ast, and return back both the >>>>>>>>> ast and >>>>>>>>> comments to the formatter to produce an algebra document. *How >>>>>>>>> they implement this is up to them* and Elixir doesn't need to >>>>>>>>> give an opinion on how comments should be handled during those >>>>>>>>> manipulations, nor does it need to expose the private version of the >>>>>>>>> AST >>>>>>>>> used internally by the formatter. If they want to merge comments into >>>>>>>>> the >>>>>>>>> metadata or use custom nodes, it's completely up to them, they just >>>>>>>>> need to >>>>>>>>> return back a valid quoted expression and a list of comments with >>>>>>>>> their >>>>>>>>> metadata. >>>>>>>>> >>>>>>>>> The other reason I think this should be done by exposing those >>>>>>>>> functions is that there's a great a amount of work put into the >>>>>>>>> formatter >>>>>>>>> to turn the quoted expressions into formatted algebra documents, and >>>>>>>>> I >>>>>>>>> think all of that could be reused, eliminating the need for custom >>>>>>>>> formatters. >>>>>>>>> >>>>>>>>> The workflow would be something like this: >>>>>>>>> ```elixir >>>>>>>>> {:ok, {quoted, comments}} = File.read!(path) |> >>>>>>>>> Code.string_to_quoted_with_comments() >>>>>>>>> quoted = do_some_manipulation(quoted , comments) >>>>>>>>> {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments) >>>>>>>>> new_source = Inspect.Algebra.format(doc, 80) >>>>>>>>> ``` >>>>>>>>> I'm not married to those function names and return values, but I >>>>>>>>> hope they serve to convey the idea. >>>>>>>>> >>>>>>>>> I already have some little examples of source code transformations >>>>>>>>> using an API like the above, I'm working on reducing and tidying >>>>>>>>> them, but >>>>>>>>> in the meantime I would like to hear you opinions on this proposal. >>>>>>>>> >>>>>>>>> The only downside I can think of is that the `{line, >>>>>>>>> {previous_eol, next_eol}, text}` format of the comments may be >>>>>>>>> considered a >>>>>>>>> private data structure, but the formatter hasn't changed much since >>>>>>>>> 1.6(3 >>>>>>>>> years ago) and I think it could be considered stable enough to be >>>>>>>>> exposed. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "elixir-lang-core" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com >>>>>>>>> >>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "elixir-lang-core" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com >>>>>>>> >>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elixir-lang-core" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> >>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elixir-lang-core" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elixir-lang-core" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elixir-lang-core/30f42fa5-b8c9-47ce-8f46-e90b800ecb3cn%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elixir-lang-core/30f42fa5-b8c9-47ce-8f46-e90b800ecb3cn%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/c4fbbfa3-0253-46d0-b90a-679117e8dc39n%40googlegroups.com.
