Hi folks, Nice to see someone got the ball rolling, I'm also interested (and would like to contribute). For me it's a bit unclear what the next steps would be? If someone could clarify, it'd be happy to spend some time on PR
Best, Tonći On Friday, May 7, 2021 at 6:15:48 AM UTC+2 José Valim wrote: > Correct. Those are the trade-offs. We could for example have > Code.string_to_quoted_formatter_options which returns the formatter options > used for parsing the AST but then users must know that we now have a > special kind of AST. And nodes introduced by user may still not be > annotated - which means we still need to care about those scenarios. > > On Fri, May 7, 2021 at 5:07 AM i Dorgan <[email protected]> wrote: > >> Thanks for your thoughts! >> >> > If anyone is interested, I was playing with these ideas and you can see >> some example use cases here: >> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But >> exactly because Code.Formatter is private, I had to vendor it and that code >> broke on recent Elixir versions. >> >> I really like what you did there. With "that code broke on recent Elixir >> versions" you mean the addition of stepped ranges? I see that the wrapping >> of literals via the `literal_encoder` hasn't changed since June of 2019. >> >> > There is one issue about quoted_to_algebra though: the AST that the >> formatter works with is a special one, where the literals are wrapped in >> blocks so we can store their metadata. This means that, in order to have >> quotes_to_algebra, we would need to change the formatter to also handle >> “regular AST” or nodes with limited metadata. >> > >> > It is doable, but it is work, and a requirement to expose said >> functionality. >> >> I'm exploring ways to achieve this, and I can think of two approaches: >> a - traverse the received AST and wrap the literals in blocks >> b - change the various functions used to convert the AST into algebra >> fragments so that they can also handle bare literals >> >> After some experimentation I saw that option a is doable, but I feel that >> I am recreating part of the work that the tokenizer already does and that >> it would become a burden to maintain whenever the tokenizer changes. >> I haven't tried option b yet, I'm still trying to get familiar with the >> quoted -> algebra conversion(and algebra documents in general), but my >> first thought is that it would be hard to figure out over which literal a >> comment should be placed. I imagine that a quoted expression representing >> this code: >> 1 def foo() do >> 2 :hello >> 3 >> 4 :elixir >> 5 >> 6 >> 7 :world >> 8 end >> And a comment {5, {_, _}, "Hello!"} >> If we don't have line number information, where should the comment be >> placed? We don't know where the literals are, they may be at lines 2, 3 and >> 4, or 2, 4 and 7, they could be anywhere. All we can tell is that the >> comment goes somewhere between lines 1 and 8, so any position would be ok. >> So having the formatter accept bare literals would mean having to work with >> insufficient data and it would produce a bad output. This issue is also >> present with option a, but changing the part of the formatter that >> generates the algebra document would essentially mean to *break* the >> formatter in multiple places. In summary, I'm not sure about the usefulness >> of a user supplied ast with no line numbers associated to literals, as it >> is hard to apropiately recreate the source code when comments are involved. >> I'm not happy with neither approach and I will try to think of better >> solutions. >> >> On the other hand, the literal encoder is using a public api. A user can >> already generate AST from a string with the literals wrapped in a block via >> the :literal_encoder option of Code.string_to_quoted, so I think the >> literal token metadata should be considered public api(changes to it can >> already break third party code today). Wrapping a literal in a block and >> storing the token metadata in it has proven to be a very simple and >> practical solution to the problem of literals not carrying any metadata >> while still using valid AST, so maybe exposing that can also be considered >> as a viable alternative? This would go against my original position of "nor >> does it need to expose the private version of the AST used internally by >> the formatter", but in practical terms I think it would provide more value >> if it were exposed. The comments data structure would already be exposed >> and we would be requiring users to provide comments in that form, so I >> think requiring an AST with literals shaped in a particular way wouldn't be >> unreasonable. Of course, if that would be done, the fact that >> quoted_to_algebra requires an AST with literals wrapped in that way should >> be properly documented. >> El jueves, 6 de mayo de 2021 a las 3:07:33 UTC-3, Wojtek Mach escribió: >> >>> Looking forward to any progress on this front. >>> >>> If anyone is interested, I was playing with these ideas and you can see >>> some example use cases here: >>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But >>> exactly because Code.Formatter is private, I had to vendor it and that code >>> broke on recent Elixir versions. >>> >>> On May 5, 2021, "José Valim" <[email protected]> wrote: >>> >>> I believe those ideas are promising. There is one issue about >>> quoted_to_algebra though: the AST that the formatter works with is a >>> special one, where the literals are wrapped in blocks so we can store their >>> metadata. This means that, in order to have quotes_to_algebra, we would >>> need to change the formatter to also handle “regular AST” or nodes with >>> limited metadata. >>> It is doable, but it is work, and a requirement to expose said >>> functionality. >>> About the comments, I like the suggestion. Although we should probably >>> move from a tuple to a map to avoid breaking changes in the future. A PR >>> for this particular issue is very welcome! >>> Thank you for the proposal and thinking about these problems. >>> On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote: >>> >>> Hi all, >>>> >>>> The motivation for this proposal is to make it easier for tools to >>>> alter and format elixir code while preserving the current behavior of the >>>> formatter. Most of the functionality is already there, and a little change >>>> in the APIs would enable a wide variety of new use cases. >>>> >>>> If we want to transform a piece of code from one form to another, we >>>> can modify the AST and then convert it back to a string. For example, a >>>> tool could detect the usage of `String.to_atom` and not only warn about >>>> unsafe string to atom conversion, but also give an option to automatically >>>> fix the issue, replacing it with `String.to_existing_atom`. The first part >>>> is already covered by tools like credo, but it seems that manipulating the >>>> source code itself is difficult, mostly because the AST does not contain >>>> information about comments and because `Macro.to_string` doesn't produce >>>> text that complies with the elixir coding conventions. For example, this >>>> code: >>>> ```elixir >>>> def foo(bar) do >>>> # Some comment >>>> :baz >>>> end >>>> ``` >>>> Would be printed as this: >>>> ```elixir >>>> def(fop(bar)) do >>>> :baz >>>> end >>>> ``` >>>> Tools like https://github.com/KronicDeth/intellij-elixir implement >>>> their own parsers to circumvent this issue, but I think it would be nice >>>> if >>>> it could be achieved with the existing parser in core elixir. >>>> >>>> I've seen other conversations where it was suggested to change the >>>> elixir AST to support comments, either adding a new comment node(breaking >>>> change) or using the nodes metadata, but the conclusion was that there was >>>> no clear preference on how to do this at the AST level, and being that the >>>> elixir tokenizer allows us to pass a function to process the comments, >>>> José >>>> suggested to keep the them on a side data structure instead of in the AST >>>> itself. This is what the Elixir formatter does. >>>> >>>> Currently the `Code.Formatter` module is private API used by the >>>> `Code.format_string` function. This means that the only way to format >>>> elixir code is by providing a string(or a file) to a function in the >>>> `Code` >>>> module. If we are transforming the code, however, what we have is a quoted >>>> expression, thus we don't have a way to turn it back into a string. >>>> >>>> At a high level, the `Code.Formatter.to_algebra` does three things: >>>> 1. It extracts the comments from the source code >>>> 2. It parses the source code into a quoted expression >>>> 3. It takes the comments and the quoted expressions and merges them to >>>> produce an algebra document >>>> >>>> What I propose is to split the `Code.Formatter.to_algebra` considering >>>> those steps, and expose the functionality via the `Code` module. The >>>> reasoning is that if a user has access to both the ast and the comments, >>>> they can then transform the ast, and return back both the ast and comments >>>> to the formatter to produce an algebra document. *How they implement >>>> this is up to them* and Elixir doesn't need to give an opinion on how >>>> comments should be handled during those manipulations, nor does it need to >>>> expose the private version of the AST used internally by the formatter. If >>>> they want to merge comments into the metadata or use custom nodes, it's >>>> completely up to them, they just need to return back a valid quoted >>>> expression and a list of comments with their metadata. >>>> >>>> The other reason I think this should be done by exposing those >>>> functions is that there's a great a amount of work put into the formatter >>>> to turn the quoted expressions into formatted algebra documents, and I >>>> think all of that could be reused, eliminating the need for custom >>>> formatters. >>>> >>>> The workflow would be something like this: >>>> ```elixir >>>> {:ok, {quoted, comments}} = File.read!(path) |> >>>> Code.string_to_quoted_with_comments() >>>> quoted = do_some_manipulation(quoted , comments) >>>> {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments) >>>> new_source = Inspect.Algebra.format(doc, 80) >>>> ``` >>>> I'm not married to those function names and return values, but I hope >>>> they serve to convey the idea. >>>> >>>> I already have some little examples of source code transformations >>>> using an API like the above, I'm working on reducing and tidying them, but >>>> in the meantime I would like to hear you opinions on this proposal. >>>> >>>> The only downside I can think of is that the `{line, {previous_eol, >>>> next_eol}, text}` format of the comments may be considered a private data >>>> structure, but the formatter hasn't changed much since 1.6(3 years ago) >>>> and >>>> I think it could be considered stable enough to be exposed. >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elixir-lang-core" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elixir-lang-core" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com >>> >>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elixir-lang-core" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com.
