Yes, the split is meant for the internal API. User API would remain unchanged except for the addition of quoted_to_algebra and string_to_quoted_with_comments to the Code module
El martes, 18 de mayo de 2021 a las 10:03:26 UTC-3, José Valim escribió: > We don't need to add new modules to the user API but internally we can > split them in whatever way we want. Splitting also depends on how large it > is, but probably best to avoid adding it to the formatter since that's > already 2k LOC. > > > On Tue, May 18, 2021 at 2:49 PM i Dorgan <[email protected]> wrote: > >> Hi all, >> >> Now that https://github.com/elixir-lang/elixir/pull/10986/ is merged I >> was meaning to prepare a PR for the normalizing step. Would you prefer to >> have that code in the Code.Formatter module, or in a separate >> Code.Normalizer module? Personally I would prefer the latter, I think it >> would make the code that does the actual formatting easier to follow. >> >> El domingo, 9 de mayo de 2021 a las 0:20:37 UTC-3, i Dorgan escribió: >> >>> I tried manual traversal and that made things so much easier, the >>> literal? field is no longer needed. I made good progress on the conversion >>> and I think I have covered most of the edge cases, judging from the tests I >>> wrote: >>> https://github.com/doorgan/formatter/blob/main/test/normalize_test.exs >>> The approach I took for testing was to run the formatter with both the >>> AST generated internally by `Code.Formatter.to_algebra` and with the one >>> I'm creating, as well as feeding the special AST to my normalizer to make >>> sure literals don't get wrapped twice. I did not test the specific shapes >>> the AST should have in most cases but the most basic ones, but it may be a >>> good idea to do so. >>> >>> My next step would be to test it with comments, I expect them not land >>> in the correct lines in every case because I'm making literals inherit the >>> line number of their parents and heredocs lose delimiter information, but >>> at the very least it the formatter should not break. >>> >>> If this works then I think I'd be in a good position to submit a PR, >>> once a PR to expose the comments maps is submitted. >>> El sábado, 8 de mayo de 2021 a las 2:54:13 UTC-3, José Valim escribió: >>> >>>> > I had to add a literal? field to the literal blocks to distinguish >>>> them from regular blocks and prevent the normalizer from recursing >>>> infinitely. >>>> >>>> You can always traverse manually. The number of AST nodes are small and >>>> it will give you more control. Alternatively, give postwalk a try, since >>>> the block node you add on postwalk won't be traversed (IIRC). >>>> >>>> The approach looks good to me. Note for some of those, I would prefer >>>> to push the fix to the formatter (for example, we can assume :delimiter >>>> defaults to ~s""). >>>> >>>> On Sat, May 8, 2021 at 12:12 AM i Dorgan <[email protected]> wrote: >>>> >>>>> I'm currently experimenting on the second point you outlined >>>>> >>>>> I've uploaded what I've been trying so far to this repository: >>>>> https://github.com/doorgan/formatter. The relevant file is >>>>> https://github.com/doorgan/formatter/blob/main/lib/normalizer.ex as >>>>> well as the tests. >>>>> >>>>> The approach I'm taking there is to walk the AST and manually wrap >>>>> literals when I find them, according to the context(when possible). I >>>>> think >>>>> I'm able to produce the kind of AST the formatter would expect, and >>>>> produces fairly adequate code, except for cases like mixed keyword lists >>>>> ([1, 2, a: b]) for which I still have to figure out how to recreate. I >>>>> had >>>>> to add a literal? field to the literal blocks to distinguish them from >>>>> regular blocks and prevent the normalizer from recursing infinitely. >>>>> Maybe >>>>> this could be avoided, I need to figure how, though. >>>>> >>>>> I would like to know if you think this is a good aproach, or I should >>>>> try to change the various block_to_algebra and quoted_to_algebra to >>>>> handle >>>>> literals. >>>>> I'm starting to believe this may be the cleanest aproach, and I would >>>>> need to expand the state map to hold metadata from the parent node(so I >>>>> can >>>>> fill line numbers, or metadata like format: :keyword to keyword lists). >>>>> >>>>> El viernes, 7 de mayo de 2021 a las 11:07:31 UTC-3, José Valim >>>>> escribió: >>>>> >>>>>> I think there are two fronts: >>>>>> >>>>>> 1. Today, all of the API the formatter uses, except the extraction of >>>>>> code comments, is public. So providing a string_to_quoted_with_comments >>>>>> would be a logical next step. We only need to convert the comments from >>>>>> a >>>>>> tuple to a map before. >>>>>> >>>>>> 2. The second step is more complex and needs more research, which is >>>>>> to figure out how exactly to change the formatter to accept general AST. >>>>>> >>>>>> Please coordinate between you because it seems some work may have >>>>>> already been done in those areas. I will be glad to provide assistance >>>>>> and >>>>>> answer questions. :) >>>>>> >>>>>> On Fri, May 7, 2021 at 3:51 PM Tonći Galić <[email protected]> wrote: >>>>>> >>>>>>> Hi folks, >>>>>>> >>>>>>> Nice to see someone got the ball rolling, I'm also interested (and >>>>>>> would like to contribute). For me it's a bit unclear what the next >>>>>>> steps >>>>>>> would be? >>>>>>> If someone could clarify, it'd be happy to spend some time on PR >>>>>>> >>>>>>> Best, >>>>>>> Tonći >>>>>>> >>>>>>> On Friday, May 7, 2021 at 6:15:48 AM UTC+2 José Valim wrote: >>>>>>> >>>>>>>> Correct. Those are the trade-offs. We could for example have >>>>>>>> Code.string_to_quoted_formatter_options which returns the formatter >>>>>>>> options >>>>>>>> used for parsing the AST but then users must know that we now have a >>>>>>>> special kind of AST. And nodes introduced by user may still not be >>>>>>>> annotated - which means we still need to care about those scenarios. >>>>>>>> >>>>>>>> On Fri, May 7, 2021 at 5:07 AM i Dorgan <[email protected]> wrote: >>>>>>>> >>>>>>>>> Thanks for your thoughts! >>>>>>>>> >>>>>>>>> > If anyone is interested, I was playing with these ideas and you >>>>>>>>> can see some example use cases here: >>>>>>>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. >>>>>>>>> But exactly because Code.Formatter is private, I had to vendor it and >>>>>>>>> that >>>>>>>>> code broke on recent Elixir versions. >>>>>>>>> >>>>>>>>> I really like what you did there. With "that code broke on recent >>>>>>>>> Elixir versions" you mean the addition of stepped ranges? I see that >>>>>>>>> the >>>>>>>>> wrapping of literals via the `literal_encoder` hasn't changed since >>>>>>>>> June of >>>>>>>>> 2019. >>>>>>>>> >>>>>>>>> > There is one issue about quoted_to_algebra though: the AST that >>>>>>>>> the formatter works with is a special one, where the literals are >>>>>>>>> wrapped >>>>>>>>> in blocks so we can store their metadata. This means that, in order >>>>>>>>> to have >>>>>>>>> quotes_to_algebra, we would need to change the formatter to also >>>>>>>>> handle >>>>>>>>> “regular AST” or nodes with limited metadata. >>>>>>>>> > >>>>>>>>> > It is doable, but it is work, and a requirement to expose said >>>>>>>>> functionality. >>>>>>>>> >>>>>>>>> I'm exploring ways to achieve this, and I can think of two >>>>>>>>> approaches: >>>>>>>>> a - traverse the received AST and wrap the literals in blocks >>>>>>>>> b - change the various functions used to convert the AST into >>>>>>>>> algebra fragments so that they can also handle bare literals >>>>>>>>> >>>>>>>>> After some experimentation I saw that option a is doable, but I >>>>>>>>> feel that I am recreating part of the work that the tokenizer already >>>>>>>>> does >>>>>>>>> and that it would become a burden to maintain whenever the tokenizer >>>>>>>>> changes. >>>>>>>>> I haven't tried option b yet, I'm still trying to get familiar >>>>>>>>> with the quoted -> algebra conversion(and algebra documents in >>>>>>>>> general), >>>>>>>>> but my first thought is that it would be hard to figure out over >>>>>>>>> which >>>>>>>>> literal a comment should be placed. I imagine that a quoted >>>>>>>>> expression >>>>>>>>> representing this code: >>>>>>>>> 1 def foo() do >>>>>>>>> 2 :hello >>>>>>>>> 3 >>>>>>>>> 4 :elixir >>>>>>>>> 5 >>>>>>>>> 6 >>>>>>>>> 7 :world >>>>>>>>> 8 end >>>>>>>>> And a comment {5, {_, _}, "Hello!"} >>>>>>>>> If we don't have line number information, where should the comment >>>>>>>>> be placed? We don't know where the literals are, they may be at lines >>>>>>>>> 2, 3 >>>>>>>>> and 4, or 2, 4 and 7, they could be anywhere. All we can tell is that >>>>>>>>> the >>>>>>>>> comment goes somewhere between lines 1 and 8, so any position would >>>>>>>>> be ok. >>>>>>>>> So having the formatter accept bare literals would mean having to >>>>>>>>> work with >>>>>>>>> insufficient data and it would produce a bad output. This issue is >>>>>>>>> also >>>>>>>>> present with option a, but changing the part of the formatter that >>>>>>>>> generates the algebra document would essentially mean to *break* the >>>>>>>>> formatter in multiple places. In summary, I'm not sure about the >>>>>>>>> usefulness >>>>>>>>> of a user supplied ast with no line numbers associated to literals, >>>>>>>>> as it >>>>>>>>> is hard to apropiately recreate the source code when comments are >>>>>>>>> involved. >>>>>>>>> I'm not happy with neither approach and I will try to think of >>>>>>>>> better solutions. >>>>>>>>> >>>>>>>>> On the other hand, the literal encoder is using a public api. A >>>>>>>>> user can already generate AST from a string with the literals wrapped >>>>>>>>> in a >>>>>>>>> block via the :literal_encoder option of Code.string_to_quoted, so I >>>>>>>>> think >>>>>>>>> the literal token metadata should be considered public api(changes to >>>>>>>>> it >>>>>>>>> can already break third party code today). Wrapping a literal in a >>>>>>>>> block >>>>>>>>> and storing the token metadata in it has proven to be a very simple >>>>>>>>> and >>>>>>>>> practical solution to the problem of literals not carrying any >>>>>>>>> metadata >>>>>>>>> while still using valid AST, so maybe exposing that can also be >>>>>>>>> considered >>>>>>>>> as a viable alternative? This would go against my original position >>>>>>>>> of "nor >>>>>>>>> does it need to expose the private version of the AST used internally >>>>>>>>> by >>>>>>>>> the formatter", but in practical terms I think it would provide more >>>>>>>>> value >>>>>>>>> if it were exposed. The comments data structure would already be >>>>>>>>> exposed >>>>>>>>> and we would be requiring users to provide comments in that form, so >>>>>>>>> I >>>>>>>>> think requiring an AST with literals shaped in a particular way >>>>>>>>> wouldn't be >>>>>>>>> unreasonable. Of course, if that would be done, the fact that >>>>>>>>> quoted_to_algebra requires an AST with literals wrapped in that way >>>>>>>>> should >>>>>>>>> be properly documented. >>>>>>>>> El jueves, 6 de mayo de 2021 a las 3:07:33 UTC-3, Wojtek Mach >>>>>>>>> escribió: >>>>>>>>> >>>>>>>>>> Looking forward to any progress on this front. >>>>>>>>>> >>>>>>>>>> If anyone is interested, I was playing with these ideas and you >>>>>>>>>> can see some example use cases here: >>>>>>>>>> https://github.com/wojtekmach/fix/blob/master/test/fix_test.exs. >>>>>>>>>> But exactly because Code.Formatter is private, I had to vendor it >>>>>>>>>> and that >>>>>>>>>> code broke on recent Elixir versions. >>>>>>>>>> >>>>>>>>>> On May 5, 2021, "José Valim" <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> I believe those ideas are promising. There is one issue about >>>>>>>>>> quoted_to_algebra though: the AST that the formatter works with is a >>>>>>>>>> special one, where the literals are wrapped in blocks so we can >>>>>>>>>> store their >>>>>>>>>> metadata. This means that, in order to have quotes_to_algebra, we >>>>>>>>>> would >>>>>>>>>> need to change the formatter to also handle “regular AST” or nodes >>>>>>>>>> with >>>>>>>>>> limited metadata. >>>>>>>>>> It is doable, but it is work, and a requirement to expose said >>>>>>>>>> functionality. >>>>>>>>>> About the comments, I like the suggestion. Although we should >>>>>>>>>> probably move from a tuple to a map to avoid breaking changes in the >>>>>>>>>> future. A PR for this particular issue is very welcome! >>>>>>>>>> Thank you for the proposal and thinking about these problems. >>>>>>>>>> On Wed, May 5, 2021 at 21:16 i Dorgan <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> The motivation for this proposal is to make it easier for tools >>>>>>>>>>> to alter and format elixir code while preserving the current >>>>>>>>>>> behavior of >>>>>>>>>>> the formatter. Most of the functionality is already there, and a >>>>>>>>>>> little >>>>>>>>>>> change in the APIs would enable a wide variety of new use cases. >>>>>>>>>>> >>>>>>>>>>> If we want to transform a piece of code from one form to >>>>>>>>>>> another, we can modify the AST and then convert it back to a >>>>>>>>>>> string. For >>>>>>>>>>> example, a tool could detect the usage of `String.to_atom` and not >>>>>>>>>>> only >>>>>>>>>>> warn about unsafe string to atom conversion, but also give an >>>>>>>>>>> option to >>>>>>>>>>> automatically fix the issue, replacing it with >>>>>>>>>>> `String.to_existing_atom`. >>>>>>>>>>> The first part is already covered by tools like credo, but it seems >>>>>>>>>>> that >>>>>>>>>>> manipulating the source code itself is difficult, mostly because >>>>>>>>>>> the AST >>>>>>>>>>> does not contain information about comments and because >>>>>>>>>>> `Macro.to_string` >>>>>>>>>>> doesn't produce text that complies with the elixir coding >>>>>>>>>>> conventions. For >>>>>>>>>>> example, this code: >>>>>>>>>>> ```elixir >>>>>>>>>>> def foo(bar) do >>>>>>>>>>> # Some comment >>>>>>>>>>> :baz >>>>>>>>>>> end >>>>>>>>>>> ``` >>>>>>>>>>> Would be printed as this: >>>>>>>>>>> ```elixir >>>>>>>>>>> def(fop(bar)) do >>>>>>>>>>> :baz >>>>>>>>>>> end >>>>>>>>>>> ``` >>>>>>>>>>> Tools like https://github.com/KronicDeth/intellij-elixir >>>>>>>>>>> implement their own parsers to circumvent this issue, but I think >>>>>>>>>>> it would >>>>>>>>>>> be nice if it could be achieved with the existing parser in core >>>>>>>>>>> elixir. >>>>>>>>>>> >>>>>>>>>>> I've seen other conversations where it was suggested to change >>>>>>>>>>> the elixir AST to support comments, either adding a new comment >>>>>>>>>>> node(breaking change) or using the nodes metadata, but the >>>>>>>>>>> conclusion was >>>>>>>>>>> that there was no clear preference on how to do this at the AST >>>>>>>>>>> level, and >>>>>>>>>>> being that the elixir tokenizer allows us to pass a function to >>>>>>>>>>> process the >>>>>>>>>>> comments, José suggested to keep the them on a side data structure >>>>>>>>>>> instead >>>>>>>>>>> of in the AST itself. This is what the Elixir formatter does. >>>>>>>>>>> >>>>>>>>>>> Currently the `Code.Formatter` module is private API used by the >>>>>>>>>>> `Code.format_string` function. This means that the only way to >>>>>>>>>>> format >>>>>>>>>>> elixir code is by providing a string(or a file) to a function in >>>>>>>>>>> the `Code` >>>>>>>>>>> module. If we are transforming the code, however, what we have is a >>>>>>>>>>> quoted >>>>>>>>>>> expression, thus we don't have a way to turn it back into a string. >>>>>>>>>>> >>>>>>>>>>> At a high level, the `Code.Formatter.to_algebra` does three >>>>>>>>>>> things: >>>>>>>>>>> 1. It extracts the comments from the source code >>>>>>>>>>> 2. It parses the source code into a quoted expression >>>>>>>>>>> 3. It takes the comments and the quoted expressions and merges >>>>>>>>>>> them to produce an algebra document >>>>>>>>>>> >>>>>>>>>>> What I propose is to split the `Code.Formatter.to_algebra` >>>>>>>>>>> considering those steps, and expose the functionality via the >>>>>>>>>>> `Code` >>>>>>>>>>> module. The reasoning is that if a user has access to both the ast >>>>>>>>>>> and the >>>>>>>>>>> comments, they can then transform the ast, and return back both the >>>>>>>>>>> ast and >>>>>>>>>>> comments to the formatter to produce an algebra document. *How >>>>>>>>>>> they implement this is up to them* and Elixir doesn't need to >>>>>>>>>>> give an opinion on how comments should be handled during those >>>>>>>>>>> manipulations, nor does it need to expose the private version of >>>>>>>>>>> the AST >>>>>>>>>>> used internally by the formatter. If they want to merge comments >>>>>>>>>>> into the >>>>>>>>>>> metadata or use custom nodes, it's completely up to them, they just >>>>>>>>>>> need to >>>>>>>>>>> return back a valid quoted expression and a list of comments with >>>>>>>>>>> their >>>>>>>>>>> metadata. >>>>>>>>>>> >>>>>>>>>>> The other reason I think this should be done by exposing those >>>>>>>>>>> functions is that there's a great a amount of work put into the >>>>>>>>>>> formatter >>>>>>>>>>> to turn the quoted expressions into formatted algebra documents, >>>>>>>>>>> and I >>>>>>>>>>> think all of that could be reused, eliminating the need for custom >>>>>>>>>>> formatters. >>>>>>>>>>> >>>>>>>>>>> The workflow would be something like this: >>>>>>>>>>> ```elixir >>>>>>>>>>> {:ok, {quoted, comments}} = File.read!(path) |> >>>>>>>>>>> Code.string_to_quoted_with_comments() >>>>>>>>>>> quoted = do_some_manipulation(quoted , comments) >>>>>>>>>>> {:ok, doc} = Code.quoted_to_algebra(quoted, comments: comments) >>>>>>>>>>> new_source = Inspect.Algebra.format(doc, 80) >>>>>>>>>>> ``` >>>>>>>>>>> I'm not married to those function names and return values, but I >>>>>>>>>>> hope they serve to convey the idea. >>>>>>>>>>> >>>>>>>>>>> I already have some little examples of source code >>>>>>>>>>> transformations using an API like the above, I'm working on >>>>>>>>>>> reducing and >>>>>>>>>>> tidying them, but in the meantime I would like to hear you opinions >>>>>>>>>>> on this >>>>>>>>>>> proposal. >>>>>>>>>>> >>>>>>>>>>> The only downside I can think of is that the `{line, >>>>>>>>>>> {previous_eol, next_eol}, text}` format of the comments may be >>>>>>>>>>> considered a >>>>>>>>>>> private data structure, but the formatter hasn't changed much since >>>>>>>>>>> 1.6(3 >>>>>>>>>>> years ago) and I think it could be considered stable enough to be >>>>>>>>>>> exposed. >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google Groups "elixir-lang-core" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com >>>>>>>>>>> >>>>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/5f431518-f555-48bb-a999-ec49f6423463n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>> Google Groups "elixir-lang-core" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from >>>>>>>>>> it, send an email to [email protected]. >>>>>>>>>> >>>>>>>>>> To view this discussion on the web visit >>>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com >>>>>>>>>> >>>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Jxt-nAbwYKYkTv0wUAamm%2Bhm24B3DdKatD%2Bdgn0six1g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>>> . >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "elixir-lang-core" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> >>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com >>>>>>>>> >>>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/10a1735e-ed95-4c2d-8eb4-beabf193c214n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elixir-lang-core" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> >>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/43be368f-7383-43ad-886c-0e63d8bca3a9n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elixir-lang-core" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> >>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elixir-lang-core/30f42fa5-b8c9-47ce-8f46-e90b800ecb3cn%40googlegroups.com >>>>> >>>>> <https://groups.google.com/d/msgid/elixir-lang-core/30f42fa5-b8c9-47ce-8f46-e90b800ecb3cn%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "elixir-lang-core" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/elixir-lang-core/c4fbbfa3-0253-46d0-b90a-679117e8dc39n%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elixir-lang-core/c4fbbfa3-0253-46d0-b90a-679117e8dc39n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/f8ae03ab-ccf2-4973-8a3b-d47846daf947n%40googlegroups.com.
