[elixir-core:11435] [Proposal] Overload capture operator to support tagged variable captures

Christopher Keele Wed, 28 Jun 2023 16:56:23 -0700

This is a formalization of my concept here 
<https://groups.google.com/g/elixir-lang-core/c/oFbaOT7rTeU/m/BWF24zoAAgAJ>, 
as a first-class proposal for explicit discussion/feedback, since I now 
have a working prototype 
<https://github.com/elixir-lang/elixir/compare/main...christhekeele:elixir:tagged-variable-capture>
.


*Goal*

The aim of this proposal is to support a commonly-requested feature: 
*short-hand 
construction and pattern matching of key/value pairs of associative data 
structures, based on variable names* in the current scope.

*Context*

Similar shorthand syntax sugar exists in many programming languages today, 
known variously as:

   - Field Punning <https://dev.realworldocaml.org/records.html> — OCaml
   - Record Puns 
   <https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/record_puns.html> 
   — Haskell
   - Object Property Value Shorthand 
   
<https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Object_initializer#property_definitions>
 
   — ES6 Javascript

This feature has been in discussion for a decade, on this mailing list (1 
<https://groups.google.com/g/elixir-lang-core/c/4w9eOeLvt-8/m/WOkoPSMm6kEJ>, 
2 
<https://groups.google.com/g/elixir-lang-core/c/NoUo2gqQR3I/m/WTpArTGMKSIJ>, 
3 
<https://groups.google.com/g/elixir-lang-core/c/3XrVXEVSixc/m/NHU2M4QFAQAJ>, 
4 
<https://groups.google.com/g/elixir-lang-core/c/OvSQkvXxsmk/m/bKKHbBxiCwAJ>, 
5 
<https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/1W-d_XAlBgAJ>
, 6 <https://groups.google.com/g/elixir-lang-core/c/oFbaOT7rTeU>) and the 
Elixir forum (1 
<https://elixirforum.com/t/proposal-add-field-puns-map-shorthand-to-elixir/15452>,
 
2 <https://elixirforum.com/t/shorthand-for-passing-variables-by-name/30583>, 
3 
<https://elixirforum.com/t/if-you-could-change-one-thing-in-elixir-language-what-you-would-change/19902/17>,
 
4 
<https://elixirforum.com/t/has-map-shorthand-syntax-in-other-languages-caused-you-any-problems/15403>,
 
5 
<https://elixirforum.com/t/es6-ish-property-value-shorthands-for-maps/1524>, 
6 
<https://elixirforum.com/t/struct-creation-pattern-matching-short-hand/7544>), 
and has motivated many libraries (1 
<https://github.com/whatyouhide/short_maps>, 2 
<https://github.com/meyercm/shorter_maps>, 3 
<https://hex.pm/packages/shorthand>, 4 <https://hex.pm/packages/synex>). 
These narrow margins cannot fit the full history of possibilities, 
proposals, and problems with this feature, and I will not attempt to 
summarize them all. For context, I suggest reading this mailing list 
proposal 
<https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/1W-d_XAlBgAJ> 
and this community discussion 
<https://elixirforum.com/t/proposal-add-field-puns-map-shorthand-to-elixir/15452>
 in 
particular.

However, in summary, this particular proposal tries to solve a couple of 
past sticking points:

   1. Atom vs String 
   <https://groups.google.com/g/elixir-lang-core/c/NoUo2gqQR3I/m/IpZQHbZk4xEJ> 
   key support
   2. Visual clarity 
   <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/NBkAVto0BAAJ> 
   that atom/string matching is occurring
   3. Limitations of string-based sigil parsing 
   <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/TiZw6xM3BAAJ>
   4. Easy confusion 
   <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/WRhXxHDfBAAJ> 
   with tuples

I have a working fork of Elixir here 
<https://github.com/christhekeele/elixir/tree/tagged-variable-capture> 
where this proposed syntax can be experimented with. Be warned, it is buggy.

*Proposal: Tagged Variable Captures*

I propose we overload the unary capture operator (*&*) to accept 
compile-time atoms and strings as arguments, for example *&:foo* and 
*&"bar"*. This would *expand at compile time* into *a tagged tuple with the 
atom/string and a variable reference*. For now, I am calling this a 
*"tagged-variable 
capture"*  to differentiate it from a function capture.

For the purposes of this proposal, assume:

{foo, bar} = {1, 2}

Additionally,

   - Lines beginning with # ==  indicate what the compiler expands an 
   expression to.
   - Lines beginning with # =>  represent the result of evaluating that 
   expression.
   - Lines beginning with *# !> * represent an exception.

*Bare Captures*

I'm not sure if we should support *bare* tagged-variable capture, but it is 
illustrative for this proposal, so I left it in my prototype. It would look 
like:

&:foo
# == *{:foo, foo}*
# => {:foo, 1}
&"foo"
# == *{"foo", foo}*
# => {"foo", 1}

If bare usage is supported, this expansion would work as expected in match 
and guard contexts as well, since it expands before variable references are 
resolved:

{:foo, baz} = &:foo
*# == {:foo, baz} = {:foo, foo}*
# => {:foo, 1}
baz
# => 1

*List Captures*

Since capture expressions are allowed in lists, this can be used to 
construct Keyword lists from the local variable scope elegantly:

list = [&:foo, &:bar]
# == *list = [{:foo, foo}, {:bar, bar}]*
# => [foo: 1, bar: 2]

This would work with other list operators like *|*:

baz = 3
list = [&:baz | list]
# == *list = [**{:baz, baz} **| **list**]*
# => [baz: 3, foo: 1, bar: 2]

And list destructuring:

{foo, bar, baz} = {nil, nil, nil}
[&:baz, &:foo, &:bar] = list
*# == [{:baz, baz}, {:foo, foo}, {:bar, bar}] = list*
# => [baz: 3, foo: 1, bar: 2]
{foo, bar, baz}
# => {1, 2, 3}

*Map Captures*

With a small change to the parser, 
<https://github.com/elixir-lang/elixir/commit/0a4f5376c0f9b4db7d71514d05df6b8b6abc96a9>
 
we can allow this expression inside map literals. Because this expression 
individually gets expanded into a tagged-tuple before the map associations 
list as a whole are processed, it allow this syntax to work in all existing 
map/struct constructs, like map construction:

map = %{&:foo, &"bar"}
*# == %{:foo => foo, "bar" => bar}*
# => %{:foo => 1, "bar" => 2}

Map updates:

foo = 3
map = %{map | &:foo}
*# == %{map | :foo => foo}*
# => %{:foo => 3, "bar" => 2}

And map destructuring:

{foo, bar} = {nil, nil}
%{&:foo, &"bar"} = map
*# == %{:foo => foo, "bar" => bar} = map*
# => %{:foo => 3, "bar" => 2}
{foo, bar}
# => {3, 2}

*Considerations*

Though just based on an errant thought 
<https://groups.google.com/g/elixir-lang-core/c/oFbaOT7rTeU/m/BWF24zoAAgAJ> 
that popped into my head yesterday, I'm unreasonably pleased with how well 
this works and reads in practice. I will present my thoughts here, though 
again I encourage you to grab my branch 
<https://github.com/christhekeele/elixir/tree/tagged-variable-capture>, compile 
it from source 
<https://github.com/christhekeele/elixir/tree/tagged-variable-capture#compiling-from-source>,
 and 
play with it yourself!

*Pro: solves existing pain points*

As mentioned, this solves flaws previous proposals suffer from:

   1. Atom vs String 
   <https://groups.google.com/g/elixir-lang-core/c/NoUo2gqQR3I/m/IpZQHbZk4xEJ> 
key 
   support
   This supports both.
   2. Visual clarity 
   <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/NBkAVto0BAAJ> 
that 
   atom/string matching is occurring
   This leverages the appropriate literal in question within the syntax 
   sugar.
   3. Limitations of string-based sigil parsing 
   <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/TiZw6xM3BAAJ>
   This is compiler-expansion-native.
   4. Easy confusion 
   <https://groups.google.com/g/elixir-lang-core/c/XxnrGgZsyVc/m/WRhXxHDfBAAJ> 
with 
   tuples
   %{&:foo, &"bar"} is very different from {foo, bar}, instead of 
   1-character different.
   
Additionally, it solves my main complaint with historical proposals: syntax 
to combine a variable identifier with a literal must either obscure that we 
are building an identifier, or obscure the key/string typing of the literal.

I'm proposing overloading the capture operator rather than introducing a 
new operator because the capture operator already has a semantic 
association with messing with variable scope, via the nested integer-based 
positional function argument syntax (ex *& &1*).

By using the capture operator we indicate that we are messing with an 
identifier in scope, but via a literal atom/string we want to associate 
with, to get the best of both worlds.

*Pro: works with existing code*

The capture today operator has well-defined compile-time-error semantics if 
you try to pass it an atom or a string. All compiling Elixir code today 
will continue to compile as before.

*Pro: works with existing tooling*

By overloading an existing operator, this approach works seamlessly for me 
with the syntax highlighters I have tried it with so far, and reasonable 
with the formatter.

In my experimentation I've found that the formatter wants to rewrite *&:baz 
*to *(&:baz)* pretty often. That's good, because there are several edge 
cases in my prototype where not doing so causes it to behave strangely; I'm 
sure it's resolving ambiguities that would occur in function captures that 
impact my proposal in ways I have yet fully anticipated.

*Pros: minimizes surface area of the language*

By overriding the capture operator instead of introducing a new operator or 
sigil, we are able to keep the surface area of this feature slim.

*Cons: overloads the capture operator*

Of course, much of the virtues of this proposal comes from overloading the 
capture operator. But it is an already semantically fraught syntactic sugar 
construct that causes confusion to newcomers, and this would place more 
strain on it.

We would need to augment it with more than the meager error message 
modification 
<https://github.com/elixir-lang/elixir/commit/3d83d21ada860d03cece8c6f90dbcf7bf9e737ec#diff-92b98063d1e86837fae15261896c265ab502b8d556141aaf1c34e67a3ef3717cL199-R207>
 in 
my prototype, as well as documentation and anticipate a new wave of 
questions from the community upon release.

This inelegance really shows when considering embedding a tagged variable 
capture inside an anonymous function capture, ex *& &1 = &:foo*. In my 
prototype I've chosen to allow this rather than error on "nested captures 
not allowed" (would probably become: "nested *function* captures not 
allowed"), but I'm not sure I found all the edge-cases of mixing them in 
all possible constructions.

Additionally, since my proposal now allows the capture operator as an 
associative element inside map literal parsing, that would change the 
syntax error reported by providing a function capture as an associative 
element to be generated during expansion rather than during parsing. I am 
not fluent enough in leex to have have updated the parser to preserve the 
exact old error, but serendipitously what it reports in my prototype today 
is pretty good regardless, but I prefer the old behaviour:

Old:
%{& &1}
# !> ** (SyntaxError) syntax error before '}'
# !> |
# !> 1 | %{& &1}
# !> | ^
New:
%{& &1}

*# => error: expected key-value pairs in a map, got: & &1*
*# => ** (CompileError) cannot compile code (errors have been logged)*

*Cons: here there be dragons I cannot see*

I'm quite sure a full implementation would require a lot more knowledge of 
the compiler than I am able to provide. For example, *&:foo = &:foo *raises 
an exception where *(&:foo) = &:foo* behaves as expected. I also find the 
variable/context/binding environment implementation in the erlang part of 
the compiler during expansion to be impenetrable, and I'm sure my prototype 
fails on edge cases there.

*Open Question: the pin operator*

As this feature constructs a variable ref for you, it is not clear if/how 
we should support attempts to pin the generated variable to avoid new 
bindings. In my prototype, I have tried to support the pin operator via the 
*&^:atom *syntax, though I'm pretty sure it's super buggy on bare 
out-of-data-structure cases and I only got it far enough to work in 
function heads for basic function head map pattern matching.

*Open Question: charlists*

I did not add support for charlist tagged variable captures in my 
prototype, as it would be more involved to differentiate a capture of list 
mean to become a tagged tuple from a list representing the AST of a 
function capture. I would not lose a lot of sleep over this.

*Open Question: allowed contexts*

Would we even want to allow this syntax construct outside of map literals? 
Or list literals?

I can certainly see people abusing the 
bare-outside-of-associative-datastructure syntax to make some neigh 
impenetrable code where it's really unclear where assignment and pattern 
matching is occuring, and relatedly this is where I see a lot of odd 
edge-case behaviour in my prototype. I allowed it to speed up the 
implementation, but it merits more discussion.

On the other hand, this does seem like an... interesting use-case:

error = "rate limit exceeded"
&:error # return error tuple

*Thanks for reading! What do you think?*

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/ad7e0313-4207-4cb7-a5f3-d824735830abn%40googlegroups.com.

[elixir-core:11435] [Proposal] Overload capture operator to support tagged variable captures

Reply via email to