I believe those ideas are promising. There is one issue about quoted_to_algebra though: the AST that the formatter works with is a special one, where the literals are wrapped in blocks so we can store their metadata. This means that, in order to have quotes_to_algebra, we would need to change the formatter to also handle “regular AST” or nodes with limited metadata.
It is doable, but it is work, and a requirement to expose said functionality. About the comments, I like the suggestion. Although we should probably move from a tuple to a map to avoid breaking changes in the future. A PR for this particular issue is very welcome! Thank you for the proposal and thinking about these problems. On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote: > Hi all, > > The motivation for this proposal is to make it easier for tools to alter > and format elixir code while preserving the current behavior of the > formatter. Most of the functionality is already there, and a little change > in the APIs would enable a wide variety of new use cases. > > If we want to transform a piece of code from one form to another, we can > modify the AST and then convert it back to a string. For example, a tool > could detect the usage of `String.to_atom` and not only warn about unsafe > string to atom conversion, but also give an option to automatically fix the > issue, replacing it with `String.to_existing_atom`. The first part is > already covered by tools like credo, but it seems that manipulating the > source code itself is difficult, mostly because the AST does not contain > information about comments and because `Macro.to_string` doesn't produce > text that complies with the elixir coding conventions. For example, this > code: > ```elixir > def foo(bar) do > # Some comment > :baz > end > ``` > Would be printed as this: > ```elixir > def(fop(bar)) do > :baz > end > ``` > Tools like https://github.com/KronicDeth/intellij-elixir implement their > own parsers to circumvent this issue, but I think it would be nice if it > could be achieved with the existing parser in core elixir. > > I've seen other conversations where it was suggested to change the elixir > AST to support comments, either adding a new comment node(breaking change) > or using the nodes metadata, but the conclusion was that there was no clear > preference on how to do this at the AST level, and being that the elixir > tokenizer allows us to pass a function to process the comments, José > suggested to keep the them on a side data structure instead of in the AST > itself. This is what the Elixir formatter does. > > Currently the `Code.Formatter` module is private API used by the > `Code.format_string` function. This means that the only way to format > elixir code is by providing a string(or a file) to a function in the `Code` > module. If we are transforming the code, however, what we have is a quoted > expression, thus we don't have a way to turn it back into a string. > > At a high level, the `Code.Formatter.to_algebra` does three things: > 1. It extracts the comments from the source code > 2. It parses the source code into a quoted expression > 3. It takes the comments and the quoted expressions and merges them to > produce an algebra document > > What I propose is to split the `Code.Formatter.to_algebra` considering > those steps, and expose the functionality via the `Code` module. The > reasoning is that if a user has access to both the ast and the comments, > they can then transform the ast, and return back both the ast and comments > to the formatter to produce an algebra document. *How they implement this > is up to them* and Elixir doesn't need to give an opinion on how comments > should be handled during those manipulations, nor does it need to expose > the private version of the AST used internally by the formatter. If they > want to merge comments into the metadata or use custom nodes, it's > completely up to them, they just need to return back a valid quoted > expression and a list of comments with their metadata. > > The other reason I think this should be done by exposing those functions > is that there's a great a amount of work put into the formatter to turn the > quoted expressions into formatted algebra documents, and I think all of > that could be reused, eliminating the need for custom formatters. > > The workflow would be something like this: > ```elixir > {:ok, {quoted, comments}} = File.read!(path) |> > Code.string_to_quoted_with_comments() > quoted = do_some_manipulation(quoted , comments) > {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments) > new_source = Inspect.Algebra.format(doc, 80) > ``` > I'm not married to those function names and return values, but I hope they > serve to convey the idea. > > I already have some little examples of source code transformations using > an API like the above, I'm working on reducing and tidying them, but in the > meantime I would like to hear you opinions on this proposal. > > The only downside I can think of is that the `{line, {previous_eol, > next_eol}, text}` format of the comments may be considered a private data > structure, but the formatter hasn't changed much since 1.6(3 years ago) and > I think it could be considered stable enough to be exposed. > > -- > You received this message because you are subscribed to the Google Groups > "elixir-lang-core" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com > <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com.
