*An accompanying livebook is attached if you'd prefer to read this proposal interactively.* Introduction
Enum.split_while/2 (“split_while/2“) is one of the few (only?) functions in
the Enum suite that does not throw away the rest of the Enumerable.t() when
accumulation halts.
split_while/2 is useful for processing part of a list, but passing the rest
somewhere else. For example, if I have a list of integers and I only want
to collect integers within a certain range from the beginning, I can use
split_while/2 like this:
ints = [0, 4, 3, 18, 20, -7, -4, 6]
{first, rest} = Enum.split_while(ints, &(&1 >= 0 && &1 <= 10))
=> {[0, 4, 3], [18, 20, -7, -4, 6]}
And process them individually:
Enum.sum(first)
=> 7
Enum.product(rest)
=> 60480
Problem Description
However, there is no way to use split_while/2 with state. It will only run
the predicate function it receives on the current element. Take the
following pseudocode example:
split_while(ints, 0, fn
_int, acc when acc > 10 -> {:halt, acc}
int, acc -> {:cont, acc + int}
end)
Here, it is advantageous to use an accumulation of previous results to know
if we should halt. The remaining list should be returned for further
processing.
Without state in split_while/2, we have to do something rather inelegant:
result =
Enum.reduce_while(ints, %{ints: [], sum: 0, encountered: 0}, fn
_int, %{sum: sum} = acc when sum > 10 ->
{:halt, %{acc | ints: Enum.reverse(acc.ints)}}
int, acc ->
{:cont,
%{
acc
| ints: [int | acc.ints],
sum: acc.sum + int,
encountered: acc.encountered + 1
}}
end)
=> %{encountered: 4, ints: [0, 4, 3, 18], sum: 25}
{ints, rest} = Enum.split(ints, result.encountered)
=> {[0, 4, 3, 18], [20, -7, -4, 6]}
{:ok, %{ints: ints, sum: result.sum}, rest}
=> {:ok, %{ints: [0, 4, 3, 18], sum: 25}, [20, -7, -4, 6]}
Or write boilerplate code every time this sort of operation is needed. A
“stateful split while” should be a part of the standard library.
I propose a version of split_while/2 that takes the Enumerable.t(),
collecting an accumulator until the discriminator function returns {:halt,
any}. (*Or something like it, see below.*) Here’s the first draft:
EnumExt
Here is an example of how that might be achieved by modifying
Enum.take_while/2:
defmodule EnumExt do
@spec split_while_reduce(
Enumerable.t(),
any,
(Enum.element(), any ->
{:cont, any} | {:halt, any})
) ::
{{[Enum.element()], any}, [Enum.element()]}
def split_while_reduce(enumerable, initial_state, fun) when
is_list(enumerable) do
split_while_reduce_list(enumerable, fun, {[], initial_state})
end
def split_while_reduce(enumerable, initial_state, fun) do
{{list1, state}, list2} =
Enum.reduce(enumerable, {{[], initial_state}, []}, fn
entry, {{acc1, state}, []} ->
case fun.(entry, state) do
{:cont, new_state} -> {{[entry | acc1], new_state}, []}
{:halt, new_state} -> {{acc1, new_state}, [entry]}
end
entry, {{acc1, state}, acc2} ->
{{acc1, state}, [entry | acc2]}
end)
{{:lists.reverse(list1), state}, :lists.reverse(list2)}
end
defp split_while_reduce_list([head | tail], fun, {acc, state}) do
case fun.(head, state) do
{:cont, new_state} -> split_while_reduce_list(tail, fun, {[head |
acc], new_state})
{:halt, new_state} -> {{:lists.reverse(acc), new_state}, [head |
tail]}
end
end
defp split_while_reduce_list([], _fun, {acc, state}) do
{{:lists.reverse(acc), state}, []}
end
end
=> {:module, EnumExt, <<70, 79, 82, 49, 0, 0, 12, ...>>,
{:split_while_reduce_list, 3}}
Let’s try it out. We have a list of binaries. We want to halt accumulation
as soon as our data reaches a size or length of 100:
items =
[
<<226, 188, 146, 226, 164, 184, 226, 135, 138, 226, 135, 162, 226, 183,
169, 226, 168, 166,
226, 178, 165>>,
<<226, 161, 181, 226, 134, 162, 226, 168, 189, 226, 133, 128, 226, 149,
157>>,
<<226, 140, 176, 226, 156, 180, 226, 151, 182, 226, 190, 150, 226, 138,
156, 226, 155, 186>>,
<<226, 173, 134, 226, 185, 134, 226, 179, 176, 226, 156, 163, 226, 139,
128, 226, 189, 185>>,
<<226, 156, 188, 226, 171, 187, 226, 163, 182, 226, 133, 145>>,
<<226, 174, 185, 226, 186, 148, 226, 160, 163, 226, 133, 143, 226, 161,
147>>,
<<226, 137, 182, 226, 152, 133, 226, 147, 167, 226, 142, 171, 226, 173,
153>>,
<<226, 159, 143, 226, 173, 172, 226, 153, 183, 226, 145, 172>>,
<<226, 176, 157, 226, 157, 183, 226, 136, 160>>,
<<226, 181, 140, 226, 152, 172, 226, 162, 154, 226, 153, 172, 226, 137,
176, 226, 169, 176>>
]
=> ["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺", "⭆⹆⳰✣⋀⽹",
"✼⫻⣶⅑", "⮹⺔⠣⅏⡓", "≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠",
"ⵌ☬⢚♬≰⩰"]
import EnumExt, only: [split_while_reduce: 3]
split_while_reduce(items, %{size: 0, length: 0, string: ""}, fn item, state
->
size = byte_size(item)
length = String.length(item)
new_size = state.size + size
new_length = state.length + length
if new_size < 100 and new_length < 100 do
{:cont, %{state | size: new_size, length: new_length, string:
state.string <> item}}
else
{:halt, state}
end
end)
=> {{["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺", "⭆⹆⳰✣⋀⽹",
"✼⫻⣶⅑", "⮹⺔⠣⅏⡓"],
%{
length: 31,
size: 99,
string: "⼒⤸⇊⇢ⷩ⨦ⲥ⡵↢⨽⅀╝⌰✴◶⾖⊜⛺⭆⹆⳰✣⋀⽹✼⫻⣶⅑⮹⺔⠣⅏⡓"
}}, ["≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠", "ⵌ☬⢚♬≰⩰"]}
It’s still a little clunky, no? I mean, the part of the list that is being
accumulated could be collected into a list in the state as well. I'm
thinking of something similar to how Enum.map_reduce/3 returns a list and
an accumulator.
In this case, however, the list returned would be the *untouched values*.
The simplified version:
EnumExt2
defmodule EnumExt2 do
@spec preserve_reduce(
Enumerable.t(),
any,
(Enum.element(), any ->
{:cont, any} | {:halt, any})
) ::
{any, [Enum.element()]}
def preserve_reduce(enumerable, initial_state, fun) when
is_list(enumerable) do
preserve_reduce_list(enumerable, fun, initial_state)
end
def preserve_reduce(enumerable, initial_state, fun) do
{state, list} =
Enum.reduce(enumerable, {initial_state, []}, fn
entry, {state, []} ->
case fun.(entry, state) do
{:cont, new_state} -> {new_state, []}
{:halt, new_state} -> {new_state, [entry]}
end
entry, {state, acc} ->
{state, [entry | acc]}
end)
{state, :lists.reverse(list)}
end
defp preserve_reduce_list([head | tail], fun, state) do
case fun.(head, state) do
{:cont, new_state} -> preserve_reduce_list(tail, fun, new_state)
{:halt, new_state} -> {new_state, [head | tail]}
end
end
defp preserve_reduce_list([], _fun, state) do
{state, []}
end
end
=> {:module, EnumExt2, <<70, 79, 82, 49, 0, 0, 11, ...>>,
{:preserve_reduce_list, 3}}
import EnumExt2, only: [preserve_reduce: 3]
initial_state = %{size: 0, length: 0, string: "", collected: []}
preserve_reduce(items, initial_state, fn item, state ->
size = byte_size(item)
length = String.length(item)
new_size = state.size + size
new_length = state.length + length
if new_size < 100 and new_length < 100 do
{:cont,
%{
state
| size: new_size,
length: new_length,
string: state.string <> item,
collected: [item | state.collected]
}}
else
{:halt, %{state | collected: Enum.reverse(state.collected)}}
end
end)
=> {%{
collected: ["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺",
"⭆⹆⳰✣⋀⽹", "✼⫻⣶⅑", "⮹⺔⠣⅏⡓"],
length: 31,
size: 99,
string: "⼒⤸⇊⇢ⷩ⨦ⲥ⡵↢⨽⅀╝⌰✴◶⾖⊜⛺⭆⹆⳰✣⋀⽹✼⫻⣶⅑⮹⺔⠣⅏⡓"
}, ["≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠", "ⵌ☬⢚♬≰⩰"]}
*I have no idea why the fourth element in the collected list is displaying
like that. The source HTML has the correct characters.*
While the developer now has to reverse any list elements they want to keep,
it gives us a cleaner return. In the case that the developer does not want
to accumulate the processed list, they just don’t accumulate it.
Conclusion
The standard library would benefit from such a pattern. I’m unsure of a
good name (split_while_reduce and preserve_reduce kind of seem out of place
compared to the other names in the library) for such a function. I was
considering partial_reduce, but that collides with the concept of partial
functions.
I’d love to hear your feedback or your ideas about how this can already be
done in the standard library!
Best regards,
Douglas
--
You received this message because you are subscribed to the Google Groups
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elixir-lang-core/ab2d6114-1d1f-424a-89b6-85b6648080d8n%40googlegroups.com.
2024-08-06_proposal-enum-preserve-reduce-3.output.livemd
Description: Binary data
2024-08-06_proposal-enum-preserve-reduce-3.nooutput.livemd
Description: Binary data
