On Tue, Dec 24, 2013 at 7:33 PM, Patrick Walton <[email protected]> wrote: > Suppose you have an photo app that fetches a list of images from the > network, sorts them by date, and places the result into a folder. You might > structure it as two tasks, A and B, and two channels, "Images" and "Done". A > fetches images one by one and places them into the "Images" channel. When > it's done it sends a ping on the "Done" channel. B waits on the "Done" > channel, then drains the "Images" channel, sorts the results, and writes the > result into a folder. > > Despite having only one-way communication, this app contains a nasty > deadlock hazard with bounded channels. If there are more images than the > size of the "Images" channel then you will deadlock. Your unit tests may not > catch this, because it only happens if you have a lot of images. Unbounded > channels do not have this problem. > > You could argue that this is bad design, and of course it is bad design if > you have bounded channels, but one of the purposes of API design is to > minimize the fallout of ill-thought-through designs.
To address the last sentence – bounded channels with default size 0 _do_ minimize the fallout of this design: The program would reliably deadlock every time it is tested with a nonzero number of images, since A will try to write to "Images" while B is blocked receiving from "Done", not listening on "Images" yet. I don't see this deadlock as a nasty hazard – the code wouldn't work at all, and the programmer would immediately notice. If the programmer uses a non-zero buffer size for the channel, it's a magic number that they came up with, so they should know to test inputs around that magnitude. But it looks like a bad design, regardless of whether channels are bounded or unbounded: The processing is sequential, so using two tasks seems rather contrived. The natural way to program this would be to read the images into a vector, sort it, and write them out, all in one task. If we really want to use two tasks, the first task should read all images into a vector, then send that vector as part of the "done" message over the channel. There is no point in sending the images one-by-one over the channel if they aren't going to be processed one-by-one. Both of these approaches avoid deadlocks. The approach you describe uses a channel as nothing but an unbounded buffer. Using a channel where a vector is called for, and concluding that channels should behave more like vectors (by being unbounded), is not very compelling. It's a little bit like implementing a loop recursively and concluding that recursion should behave more like loops, asking for tail call optimization. As soon as we switch to an approach where feeding items into a channel one-by-one makes sense – by allowing task B to run concurrently with task A, rather than waiting for "done" – we immediately benefit from the boundedness of channels, since they force the scheduler to allocate cycles to B when A runs too far ahead, keeping the memory consumption caused by scheduling decisions in check. B still puts everything in a vector, so memory consumption is still unbounded; but that is the way the algorithm works, not a scheduling artifact. (In this approach, it would also be more natural to have the "done" message on the same channel as the images.) _______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
