Hi all,

The motivation for this proposal is to make it easier for tools to alter 
and format elixir code while preserving the current behavior of the 
formatter. Most of the functionality is already there, and a little change 
in the APIs would enable a wide variety of new use cases.

If we want to transform a piece of code from one form to another, we can 
modify the AST and then convert it back to a string. For example, a tool 
could detect the usage of `String.to_atom` and not only warn about unsafe 
string to atom conversion, but also give an option to automatically fix the 
issue, replacing it with `String.to_existing_atom`. The first part is 
already covered by tools like credo, but it seems that manipulating the 
source code itself is difficult, mostly because the AST does not contain 
information about comments and because `Macro.to_string` doesn't produce 
text that complies with the elixir coding conventions. For example, this 
code:
```elixir
def foo(bar) do
  # Some comment
  :baz
end
```
Would be printed as this:
```elixir
def(fop(bar)) do
  :baz
end
```
Tools like https://github.com/KronicDeth/intellij-elixir implement their 
own parsers to circumvent this issue, but I think it would be nice if it 
could be achieved with the existing parser in core elixir.

I've seen other conversations where it was suggested to change the elixir 
AST to support comments, either adding a new comment node(breaking change) 
or using the nodes metadata, but the conclusion was that there was no clear 
preference on how to do this at the AST level, and being that the elixir 
tokenizer allows us to pass a function to process the comments, José 
suggested to keep the them on a side data structure instead of in the AST 
itself. This is what the Elixir formatter does.

Currently the `Code.Formatter` module is private API used by the 
`Code.format_string` function. This means that the only way to format 
elixir code is by providing a string(or a file) to a function in the `Code` 
module. If we are transforming the code, however, what we have is a quoted 
expression, thus we don't have a way to turn it back into a string.

At a high level, the `Code.Formatter.to_algebra` does three things:
1. It extracts the comments from the source code
2. It parses the source code into a quoted expression
3. It takes the comments and the quoted expressions and merges them to 
produce an algebra document

What I propose is to split the `Code.Formatter.to_algebra` considering 
those steps, and expose the functionality via the `Code` module. The 
reasoning is that if a user has access to both the ast and the comments, 
they can then transform the ast, and return back both the ast and comments 
to the formatter to produce an algebra document. *How they implement this 
is up to them* and Elixir doesn't need to give an opinion on how comments 
should be handled during those manipulations, nor does it need to expose 
the private version of the AST used internally by the formatter. If they 
want to merge comments into the metadata or use custom nodes, it's 
completely up to them, they just need to return back a valid quoted 
expression and a list of comments with their metadata.

The other reason I think this should be done by exposing those functions is 
that there's a great a amount of work put into the formatter to turn the 
quoted expressions into formatted algebra documents, and I think all of 
that could be reused, eliminating the need for custom formatters.

The workflow would be something like this:
```elixir
{:ok, {quoted, comments}} = File.read!(path) |> 
Code.string_to_quoted_with_comments()
quoted  = do_some_manipulation(quoted , comments)
{:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments)
new_source = Inspect.Algebra.format(doc, 80)
```
I'm not married to those function names and return values, but I hope they 
serve to convey the idea.

I already have some little examples of source code transformations using an 
API like the above, I'm working on reducing and tidying them, but in the 
meantime I would like to hear you opinions on this proposal.

The only downside I can think of is that the `{line, {previous_eol, 
next_eol}, text}` format of the comments may be considered a private data 
structure, but the formatter hasn't changed much since 1.6(3 years ago) and 
I think it could be considered stable enough to be exposed.

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com.

Reply via email to