Thanks for the great reply, Alex. This is the approach we are going to take. Rust is not going to move away from green threads; the plan is to support both use cases in the standard library.

On 11/13/2013 10:32 AM, Alex Crichton wrote:
The situation may not be as dire as you think. The runtime is still in a state
of flux, and don't forget that in one summer the entire runtime was rewritten in
rust and was entirely redesigned. I personally still think that M:N is a viable
model for various applications, and it seems especially unfortunate to just
remove everything because it's not tailored for all use cases.

Rust made an explicit design decision early on to pursue lightweight/green
tasks, and it was made with the understanding that there were drawbacks to the
strategy. Using libuv as a backend for driving I/O was also an explicit decision
with known drawbacks.

That being said, I do not believe that all is lost. I don't believe that the
rust standard library as-is today can support *every* use case, but it's getting
to a point where it can get pretty close. In the recent redesign of the I/O
implementation, all I/O was abstracted behind trait objects that are synchronous
in their interface. This I/O interface is all implemented in librustuv by
talking to the rust scheduler under the hood. Additionally, in pull #10457, I'm
starting to add support for a native implementation of this I/O interface. The
great boon of this strategy is that all std::io primitives have no idea if their
underlying interface is native and blocking or libuv and asynchronous. The exact
same rust code works for one as it does for the other.

I personally don't see why the same strategy shouldn't work for the task model
as well. When you link a program to the librustuv crate, then you're choosing to
have a runtime with M:N scheduling and asynchronous I/O. Perhaps, though, if you
didn't link to librustuv, you would get 1:1 scheduling with blocking I/O. You
would still have all the benefits of the standard library's communication
primitives, spawning primitives, I/O, task-local-storage etc. The only
difference is that everything would be powered by OS-level threads instead of
rust-level green tasks.

I would very much like to see a standard library which supports this
abstraction, and I believe that it is very realistically possible. Right now we
have an EventLoop interface which defines interacting with I/O that is the
abstraction between asynchronous I/O and blocking I/O. This sounds like
we need a more formalized Scheduler interface which abstracts M:N scheduling vs
1:1 scheduling.

The main goal of all of this would be to allow the same exact rust code to work
in both M:N and 1:1 environments. This ability would allow authors to specialize
their code for their task at-hand. Those writing web servers would be sure to
link to librustuv, but those writing command-line utilities would simply just
omit librustuv. Additionally, as a library author, I don't really care which
implementation you're using. I can write a mysql database driver and then you as
a consumer of my library decided whether my network calls are blocking or not.

This is a fairly new concept to me (I haven't thought much about it before), but
this sounds like it may be the right way forward to addressing your concerns
without compromising too much existing functionality. There would certainly be
plenty of work to do in this realm, and I'm not sure if this goal would block
the 1.0 milestone or not. Ideally, this would be a completely
backwards-compatible change, but there would perhaps be unintended consequences.
As always, this would need plenty of discussion to see whether this is even a
reasonable strategy to take.


On Wed, Nov 13, 2013 at 2:45 AM, Daniel Micay <danielmi...@gmail.com> wrote:
Before getting right into the gritty details about why I think we should think
about a path away from M:N scheduling, I'll go over the details of the
concurrency model we currently use.

Rust uses a user-mode scheduler to cooperatively schedule many tasks onto OS
threads. Due to the lack of preemption, tasks need to manually yield control
back to the scheduler. Performing I/O with the standard library will block the
*task*, but yield control back to the scheduler until the I/O is completed.

The scheduler manages a thread pool where the unit of work is a task rather
than a queue of closures to be executed or data to be pass to a function. A
task consists of a stack, register context and task-local storage much like an
OS thread.

In the world of high-performance computing, this is a proven model for
maximizing throughput for CPU-bound tasks. By abandoning preemption, there's
zero overhead from context switches. For socket servers with only negligible
server-side computations the avoidance of context switching is a boon for
scalability and predictable performance.

# Lightweight?

Rust's tasks are often called *lightweight* but at least on Linux the only
optimization is the lack of preemption. Since segmented stacks have been
dropped, the resident/virtual memory usage will be identical.

# Spawning performance

An OS thread can actually spawn nearly as fast as a Rust task on a system with
one CPU. On a multi-core system, there's a high chance of the new thread being
spawned on a different CPU resulting in a performance loss.

Sample C program, if you need to see it to believe it:

```
#include <pthread.h>
#include <err.h>

static const size_t n_thread = 100000;

static void *foo(void *arg) {
     return arg;
}

int main(void) {
     for (size_t i = 0; i < n_thread; i++) {
         pthread_attr_t attr;
         if (pthread_attr_init(&attr) < 0) {
             return 1;
         }
         if (pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) < 0) {
             return 1;
         }
         pthread_t thread;
         if (pthread_create(&thread, &attr, foo, NULL) < 0) {
             return 1;
         }
     }
     pthread_exit(NULL);
}
```

Sample Rust program:

```
fn main() {
     for _ in range(0, 100000) {
         do spawn {
         }
     }
}
```

For both programs, I get around 0.9s consistently when pinned to a core. The
Rust version drops to 1.1s when not pinned and the OS thread one to about 2s.
It drops further when asked to allocate 8MiB stacks like C is doing, and will
drop more when it has to do `mmap` and `mprotect` calls like the pthread API.

# Asynchronous I/O

Rust's requirements for asynchronous I/O would be filled well by direct usage
of IOCP on Windows. However, Linux only has solid support for non-blocking
sockets because file operations usually just retrieve a result from cache and
do not truly have to block. This results in libuv being significantly slower
than blocking I/O for most common cases for the sake of scalable socket
servers.

On modern systems with flash memory, including mobile, there is a *consistent*
and relatively small worst-case latency for accessing data on the disk so
blocking is essentially a non-issue. Memory mapped I/O is also an incredibly
important feature for I/O performance, and there's almost no reason to use
traditional I/O on 64-bit. However, it's a no-go with M:N scheduling because
the page faults block the thread.

# Overview

Advantages:

* lack of preemptive/fair scheduling, leading to higher throughput
* very fast context switches to other tasks on the same scheduler thread

Disadvantages:

* lack of preemptive/fair scheduling (lower-level model)
* poor profiler/debugger support
* async I/O stack is much slower for the common case; for example stat is 35x
   slower when run in a loop for an mlocate-like utility
* true blocking code will still block a scheduler thread
* most existing libraries use blocking I/O and OS threads
* cannot directly use fast and easy to use linker-supported thread-local data
* many existing libraries rely on thread-local storage, so there's a need to be
   wary of hidden yields in Rust function calls and it's very difficult to
   expose a safe interface to these libraries
* every level of a CPU architecture adding registers needs explicit support
   from Rust, and it must be selected at runtime when not targeting a specific
   CPU (this is currently not done correctly)

# User-mode scheduling

Windows 7 introduced user-mode scheduling[1] to replace fibers on 64-bit.
Google implemented the same thing for Linux (perhaps even before Windows 7 was
released), and plans on pushing for it upstream.[2] The linked video does a
better job of covering this than I can.

User-mode scheduling provides a 1:1 threading model including full support for
normal thread-local data and existing debuggers/profilers. It can yield to the
scheduler on system calls and page faults. The operating system is responsible
for details like context switching, so a large maintenance/portability burden
is dealt with. It narrows down the above disadvantage list to just the point
about not having preemptive/fair scheduling and doesn't introduce any new ones.

I hope this is where concurrency is headed, and I hope Rust doesn't miss this
boat by concentrating too much on libuv. I think it would allow us to simply
drop support for pseudo-blocking I/O in the Go style and ignore asynchronous
I/O and non-blocking sockets in the standard library. It may be useful to have
the scheduler use them, but it wouldn't be essential.

[1] 
http://msdn.microsoft.com/en-us/library/windows/desktop/dd627187(v=vs.85).aspx
[2] http://www.youtube.com/watch?v=KXuZi9aeGTw
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to