On Tue, Dec 24, 2013 at 7:33 PM, Patrick Walton <[email protected]> wrote:
> Suppose you have an photo app that fetches a list of images from the
> network, sorts them by date, and places the result into a folder. You might
> structure it as two tasks, A and B, and two channels, "Images" and "Done". A
> fetches images one by one and places them into the "Images" channel. When
> it's done it sends a ping on the "Done" channel. B waits on the "Done"
> channel, then drains the "Images" channel, sorts the results, and writes the
> result into a folder.
>
> Despite having only one-way communication, this app contains a nasty
> deadlock hazard with bounded channels. If there are more images than the
> size of the "Images" channel then you will deadlock. Your unit tests may not
> catch this, because it only happens if you have a lot of images. Unbounded
> channels do not have this problem.
>
> You could argue that this is bad design, and of course it is bad design if
> you have bounded channels, but one of the purposes of API design is to
> minimize the fallout of ill-thought-through designs.

To address the last sentence – bounded channels with default size 0
_do_ minimize the fallout of this design: The program would reliably
deadlock every time it is tested with a nonzero number of images,
since A will try to write to "Images" while B is blocked receiving
from "Done", not listening on "Images" yet.  I don't see this deadlock
as a nasty hazard – the code wouldn't work at all, and the programmer
would immediately notice.  If the programmer uses a non-zero buffer
size for the channel, it's a magic number that they came up with, so
they should know to test inputs around that magnitude.


But it looks like a bad design, regardless of whether channels are
bounded or unbounded:

The processing is sequential, so using two tasks seems rather
contrived.  The natural way to program this would be to read the
images into a vector, sort it, and write them out, all in one task.

If we really want to use two tasks, the first task should read all
images into a vector, then send that vector as part of the "done"
message over the channel.  There is no point in sending the images
one-by-one over the channel if they aren't going to be processed
one-by-one.

Both of these approaches avoid deadlocks.  The approach you describe
uses a channel as nothing but an unbounded buffer.  Using a channel
where a vector is called for, and concluding that channels should
behave more like vectors (by being unbounded), is not very compelling.
 It's a little bit like implementing a loop recursively and concluding
that recursion should behave more like loops, asking for tail call
optimization.


As soon as we switch to an approach where feeding items into a channel
one-by-one makes sense – by allowing task B to run concurrently with
task A, rather than waiting for "done" – we immediately benefit from
the boundedness of channels, since they force the scheduler to
allocate cycles to B when A runs too far ahead, keeping the memory
consumption caused by scheduling decisions in check.  B still puts
everything in a vector, so memory consumption is still unbounded; but
that is the way the algorithm works, not a scheduling artifact.

(In this approach, it would also be more natural to have the "done"
message on the same channel as the images.)
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to