hi Weston,

We've discussed some of these problems in the past -- I was
enumerating some of these issues to highlight the problems that are
resulting from an absence of a richer programming model for
hierarchical work scheduling. Parallel tasks originating in each
workload are submitted to a global thread pool where they are
commingled with the tasks coming from other workloads.

As an example of how this can go wrong, suppose we have a static
thread pool with 4 executors. If we submit 4 long-running tasks to the
pool, and then each of these tasks spawn additional tasks that go into
the thread pool, a deadlock can occur, because the thread pool thinks
that it's executing tasks when in fact those tasks are waiting on
their dependent tasks to complete.

A similar resource underutilization occurs when we do
pool->Submit(ReadFile), where ReadFile needs to do some IO -- from the
thread pool's perspective, the task is "working" even though it may
wait for one or more IO calls to complete.

In the Datasets API in C++ we have both of these problems: file scan
tasks are being pushed onto the global thread pool, and so to prevent
deadlocks multithreaded file parsing has been disabled. Additionally,
the scan tasks do IO, resulting in suboptimal performance (the
problems caused by this will be especially exacerbated when running
against slower filesystems like Amazon S3)

Hopefully the issues are more clear.

Thanks
Wes

On Tue, Sep 15, 2020 at 2:57 PM Weston Pace <weston.p...@gmail.com> wrote:
>
> It sounds like you are describing two problems.
>
> 1) Idleness - Tasks are holding threads in the thread pool while they
> wait for IO or some long running non-CPU task to complete.  These
> threads are often in a "wait" state or something similar.
> 2) Fairness - The ordering of tasks is causing short tasks that could
> be completed quickly from being stuck behind longer term tasks.
> Fairness can be an issue even if all tasks are always in the active
> state consuming CPU time.
>
> Are both of these issues a problem?  Are you looking to address both of them?
>
> I doubt it's much help as it is probably a more substantial change
> than what you were looking for but the popular solution to #1 these
> days seems to be moving toward non blocking IO with
> promises/callbacks/async.  That way threads are never in the waiting
> state (unless sitting idle in the pool).
>
> -Weston
>
> On Tue, Sep 15, 2020 at 7:00 AM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > In light of ARROW-9924, I wanted to rekindle the discussion about our
> > approach to multithreading (especially the _programming model_) in
> > C++. We had some discussions about this about 6 months ago and there
> > were more discussions as I recall in summer 2019.
> >
> > Realistically, we are going to be consistently dealing with
> > independent concurrent in-process workloads that each respectively can
> > go faster by multithreading. These could be things like:
> >
> > * Reading file formats (CSV, Parquet, etc.) that benefit from
> > multithreaded parsing/decoding
> > * Reading one or more files in parallel using the Datasets API
> > * Executing any number of multithreaded analytical workloads
> >
> > One obvious issue with our thread scheduling is the FIFO nature of the
> > global thread pool. If a new independent multithreaded workload shows
> > up, it has to wait for other workloads to complete before the new work
> > will be scheduled. Think about a Flight server serving queries to
> > users -- is it fair for one query to "hog" the thread pool and force
> > other requests to wait until they can get access to some CPU
> > resources? You could imagine a workload that spawns 10 minutes worth
> > of CPU work, where a new workload has to wait for all of that work to
> > complete before having any tasks scheduled for execution.
> >
> > The approach that's been taken in the Datasets API to avoid problems
> > with nested parallelism (file-specific operations spawning multiple
> > tasks onto the global thread pool) is simply to disable multithreading
> > at the level of a single file. This is clearly suboptimal.
> >
> > We have additional problems in that some file-loading related tasks do
> > a mixture of CPU work and IO work, and once a thread has been
> > dispatched to execute one of these tasks, when IO takes place, a CPU
> > core may sit underutilized while the IO is waiting.
> >
> > There's more aspects we can discuss, but in general I think we need to
> > come up with a programming model for building our C++ system
> > components with the following requirements:
> >
> > * Deadlocks not possible by design
> > * Any component can safely use "nested parallelism" without the
> > programmer having to worry about deadlocks or one task "hogging" the
> > thread pool. So in other words, if there's only a single
> > multithreading-capable workload running, we "let it rip"
> > * Resources can be reasonably fairly allocated amongst concurrent
> > workloads (think: independent requests coming in through Flight, or
> > scan tasks on different Parquet files in the Datasets API). Limit
> > scenarios where a new workload is blocked altogether on the completion
> > of other workloads
> > * A well-defined programming pattern for tasks that do a mixture of
> > CPU work and IO work that allows CPU cores to be used when a task is
> > waiting on IO
> >
> > We can't be the only project that has these problems, so I'm
> > interested to see what solutions have been successfully employed by
> > others. For example, it strikes me as similar to concurrency issues
> > inside an analytic database. How are they preventing concurrent
> > workload starvation problems or handling CPU/IO task scheduling to
> > avoid CPU underutilization?
> >
> > Choices of which threading libraries we might use to implement a
> > viable solution (e.g. TBB) seem secondary to the programming model
> > that we use to implement our components.
> >
> > Thanks,
> > Wes

Reply via email to