Hi all, The motivation for this proposal is to make it easier for tools to alter and format elixir code while preserving the current behavior of the formatter. Most of the functionality is already there, and a little change in the APIs would enable a wide variety of new use cases.
If we want to transform a piece of code from one form to another, we can modify the AST and then convert it back to a string. For example, a tool could detect the usage of `String.to_atom` and not only warn about unsafe string to atom conversion, but also give an option to automatically fix the issue, replacing it with `String.to_existing_atom`. The first part is already covered by tools like credo, but it seems that manipulating the source code itself is difficult, mostly because the AST does not contain information about comments and because `Macro.to_string` doesn't produce text that complies with the elixir coding conventions. For example, this code: ```elixir def foo(bar) do # Some comment :baz end ``` Would be printed as this: ```elixir def(fop(bar)) do :baz end ``` Tools like https://github.com/KronicDeth/intellij-elixir implement their own parsers to circumvent this issue, but I think it would be nice if it could be achieved with the existing parser in core elixir. I've seen other conversations where it was suggested to change the elixir AST to support comments, either adding a new comment node(breaking change) or using the nodes metadata, but the conclusion was that there was no clear preference on how to do this at the AST level, and being that the elixir tokenizer allows us to pass a function to process the comments, José suggested to keep the them on a side data structure instead of in the AST itself. This is what the Elixir formatter does. Currently the `Code.Formatter` module is private API used by the `Code.format_string` function. This means that the only way to format elixir code is by providing a string(or a file) to a function in the `Code` module. If we are transforming the code, however, what we have is a quoted expression, thus we don't have a way to turn it back into a string. At a high level, the `Code.Formatter.to_algebra` does three things: 1. It extracts the comments from the source code 2. It parses the source code into a quoted expression 3. It takes the comments and the quoted expressions and merges them to produce an algebra document What I propose is to split the `Code.Formatter.to_algebra` considering those steps, and expose the functionality via the `Code` module. The reasoning is that if a user has access to both the ast and the comments, they can then transform the ast, and return back both the ast and comments to the formatter to produce an algebra document. *How they implement this is up to them* and Elixir doesn't need to give an opinion on how comments should be handled during those manipulations, nor does it need to expose the private version of the AST used internally by the formatter. If they want to merge comments into the metadata or use custom nodes, it's completely up to them, they just need to return back a valid quoted expression and a list of comments with their metadata. The other reason I think this should be done by exposing those functions is that there's a great a amount of work put into the formatter to turn the quoted expressions into formatted algebra documents, and I think all of that could be reused, eliminating the need for custom formatters. The workflow would be something like this: ```elixir {:ok, {quoted, comments}} = File.read!(path) |> Code.string_to_quoted_with_comments() quoted = do_some_manipulation(quoted , comments) {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments) new_source = Inspect.Algebra.format(doc, 80) ``` I'm not married to those function names and return values, but I hope they serve to convey the idea. I already have some little examples of source code transformations using an API like the above, I'm working on reducing and tidying them, but in the meantime I would like to hear you opinions on this proposal. The only downside I can think of is that the `{line, {previous_eol, next_eol}, text}` format of the comments may be considered a private data structure, but the formatter hasn't changed much since 1.6(3 years ago) and I think it could be considered stable enough to be exposed. -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com.
