Correct. Those are the trade-offs. We could for example have Code.string_to_quoted_formatter_options which returns the formatter options used for parsing the AST but then users must know that we now have a special kind of AST. And nodes introduced by user may still not be annotated - which means we still need to care about those scenarios.
On Fri, May 7, 2021 at 5:07 AM i Dorgan <[email protected]> wrote: > Thanks for your thoughts! > > > If anyone is interested, I was playing with these ideas and you can see > some example use cases here: > https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But > exactly because Code.Formatter is private, I had to vendor it and that code > broke on recent Elixir versions. > > I really like what you did there. With "that code broke on recent Elixir > versions" you mean the addition of stepped ranges? I see that the wrapping > of literals via the `literal_encoder` hasn't changed since June of 2019. > > > There is one issue about quoted_to_algebra though: the AST that the > formatter works with is a special one, where the literals are wrapped in > blocks so we can store their metadata. This means that, in order to have > quotes_to_algebra, we would need to change the formatter to also handle > “regular AST” or nodes with limited metadata. > > > > It is doable, but it is work, and a requirement to expose said > functionality. > > I'm exploring ways to achieve this, and I can think of two approaches: > a - traverse the received AST and wrap the literals in blocks > b - change the various functions used to convert the AST into algebra > fragments so that they can also handle bare literals > > After some experimentation I saw that option a is doable, but I feel that > I am recreating part of the work that the tokenizer already does and that > it would become a burden to maintain whenever the tokenizer changes. > I haven't tried option b yet, I'm still trying to get familiar with the > quoted -> algebra conversion(and algebra documents in general), but my > first thought is that it would be hard to figure out over which literal a > comment should be placed. I imagine that a quoted expression representing > this code: > 1 def foo() do > 2 :hello > 3 > 4 :elixir > 5 > 6 > 7 :world > 8 end > And a comment {5, {_, _}, "Hello!"} > If we don't have line number information, where should the comment be > placed? We don't know where the literals are, they may be at lines 2, 3 and > 4, or 2, 4 and 7, they could be anywhere. All we can tell is that the > comment goes somewhere between lines 1 and 8, so any position would be ok. > So having the formatter accept bare literals would mean having to work with > insufficient data and it would produce a bad output. This issue is also > present with option a, but changing the part of the formatter that > generates the algebra document would essentially mean to *break* the > formatter in multiple places. In summary, I'm not sure about the usefulness > of a user supplied ast with no line numbers associated to literals, as it > is hard to apropiately recreate the source code when comments are involved. > I'm not happy with neither approach and I will try to think of better > solutions. > > On the other hand, the literal encoder is using a public api. A user can > already generate AST from a string with the literals wrapped in a block via > the :literal_encoder option of Code.string_to_quoted, so I think the > literal token metadata should be considered public api(changes to it can > already break third party code today). Wrapping a literal in a block and > storing the token metadata in it has proven to be a very simple and > practical solution to the problem of literals not carrying any metadata > while still using valid AST, so maybe exposing that can also be considered > as a viable alternative? This would go against my original position of "nor > does it need to expose the private version of the AST used internally by > the formatter", but in practical terms I think it would provide more value > if it were exposed. The comments data structure would already be exposed > and we would be requiring users to provide comments in that form, so I > think requiring an AST with literals shaped in a particular way wouldn't be > unreasonable. Of course, if that would be done, the fact that > quoted_to_algebra requires an AST with literals wrapped in that way should > be properly documented. > El jueves, 6 de mayo de 2021 a las 3:07:33 UTC-3, Wojtek Mach escribió: > >> Looking forward to any progress on this front. >> >> If anyone is interested, I was playing with these ideas and you can see >> some example use cases here: >> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. But >> exactly because Code.Formatter is private, I had to vendor it and that code >> broke on recent Elixir versions. >> >> On May 5, 2021, "José Valim" <[email protected]> wrote: >> >> I believe those ideas are promising. There is one issue about >> quoted_to_algebra though: the AST that the formatter works with is a >> special one, where the literals are wrapped in blocks so we can store their >> metadata. This means that, in order to have quotes_to_algebra, we would >> need to change the formatter to also handle “regular AST” or nodes with >> limited metadata. >> It is doable, but it is work, and a requirement to expose said >> functionality. >> About the comments, I like the suggestion. Although we should probably >> move from a tuple to a map to avoid breaking changes in the future. A PR >> for this particular issue is very welcome! >> Thank you for the proposal and thinking about these problems. >> On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote: >> >> Hi all, >>> >>> The motivation for this proposal is to make it easier for tools to alter >>> and format elixir code while preserving the current behavior of the >>> formatter. Most of the functionality is already there, and a little change >>> in the APIs would enable a wide variety of new use cases. >>> >>> If we want to transform a piece of code from one form to another, we can >>> modify the AST and then convert it back to a string. For example, a tool >>> could detect the usage of `String.to_atom` and not only warn about unsafe >>> string to atom conversion, but also give an option to automatically fix the >>> issue, replacing it with `String.to_existing_atom`. The first part is >>> already covered by tools like credo, but it seems that manipulating the >>> source code itself is difficult, mostly because the AST does not contain >>> information about comments and because `Macro.to_string` doesn't produce >>> text that complies with the elixir coding conventions. For example, this >>> code: >>> ```elixir >>> def foo(bar) do >>> # Some comment >>> :baz >>> end >>> ``` >>> Would be printed as this: >>> ```elixir >>> def(fop(bar)) do >>> :baz >>> end >>> ``` >>> Tools like https://github.com/KronicDeth/intellij-elixir implement >>> their own parsers to circumvent this issue, but I think it would be nice if >>> it could be achieved with the existing parser in core elixir. >>> >>> I've seen other conversations where it was suggested to change the >>> elixir AST to support comments, either adding a new comment node(breaking >>> change) or using the nodes metadata, but the conclusion was that there was >>> no clear preference on how to do this at the AST level, and being that the >>> elixir tokenizer allows us to pass a function to process the comments, José >>> suggested to keep the them on a side data structure instead of in the AST >>> itself. This is what the Elixir formatter does. >>> >>> Currently the `Code.Formatter` module is private API used by the >>> `Code.format_string` function. This means that the only way to format >>> elixir code is by providing a string(or a file) to a function in the `Code` >>> module. If we are transforming the code, however, what we have is a quoted >>> expression, thus we don't have a way to turn it back into a string. >>> >>> At a high level, the `Code.Formatter.to_algebra` does three things: >>> 1. It extracts the comments from the source code >>> 2. It parses the source code into a quoted expression >>> 3. It takes the comments and the quoted expressions and merges them to >>> produce an algebra document >>> >>> What I propose is to split the `Code.Formatter.to_algebra` considering >>> those steps, and expose the functionality via the `Code` module. The >>> reasoning is that if a user has access to both the ast and the comments, >>> they can then transform the ast, and return back both the ast and comments >>> to the formatter to produce an algebra document. *How they implement >>> this is up to them* and Elixir doesn't need to give an opinion on how >>> comments should be handled during those manipulations, nor does it need to >>> expose the private version of the AST used internally by the formatter. If >>> they want to merge comments into the metadata or use custom nodes, it's >>> completely up to them, they just need to return back a valid quoted >>> expression and a list of comments with their metadata. >>> >>> The other reason I think this should be done by exposing those functions >>> is that there's a great a amount of work put into the formatter to turn the >>> quoted expressions into formatted algebra documents, and I think all of >>> that could be reused, eliminating the need for custom formatters. >>> >>> The workflow would be something like this: >>> ```elixir >>> {:ok, {quoted, comments}} = File.read!(path) |> >>> Code.string_to_quoted_with_comments() >>> quoted = do_some_manipulation(quoted , comments) >>> {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments) >>> new_source = Inspect.Algebra.format(doc, 80) >>> ``` >>> I'm not married to those function names and return values, but I hope >>> they serve to convey the idea. >>> >>> I already have some little examples of source code transformations using >>> an API like the above, I'm working on reducing and tidying them, but in the >>> meantime I would like to hear you opinions on this proposal. >>> >>> The only downside I can think of is that the `{line, {previous_eol, >>> next_eol}, text}` format of the comments may be considered a private data >>> structure, but the formatter hasn't changed much since 1.6(3 years ago) and >>> I think it could be considered stable enough to be exposed. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elixir-lang-core" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com >>> <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> You received this message because you are subscribed to the Google >> Groups "elixir-lang-core" group. >> To unsubscribe from this group and stop receiving emails from it, send >> an email to [email protected]. >> >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com >> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> -- > You received this message because you are subscribed to the Google Groups > "elixir-lang-core" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com > <https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4L75YmGv6x3xZSFDAHgwg7zCjc2mQ%3Dbwev-nLkNLgxc9w%40mail.gmail.com.
