There are actually three separate issues here, but as (a) is already
known and (b) is not a bug, I define this bug to be (c).

To understand them, it is necessary to know that OpenCL computations are
asynchronous: a clmath expression like "aCL=bCL+cCL" places this
operation in a CommandQueue and returns without waiting for it to
finish.  (This is to allow the CPU to do other work during the GPU
computation.)

(a) Running out of memory can hang the entire system, rather than ending
just the OpenCL application with CL_OUT_OF_RESOURCES.

This is probably the same long-standing issue (e.g. bug 620074, bug
1504914, bug 1592813) that makes Linux out-of-memory conditions in
general do this.  (The integrated GPUs supported by beignet share the
host's memory.)

(b) In both beignet and pocl (probably all ICDs), a long sequence of
allocate/deallocate operations (e.g. clmath creating a new array each
operation) *without* waiting for results uses up memory, but regularly
waiting for results avoids this.

This is because allocating memory (clCreateBuffer) happens immediately,
but the actual computations are queued, and memory can't be freed until
the computations using it have finished.  Hence, if many operations are
queued without waiting for a result, memory allocation can run far ahead
of computation, filling up the memory.

This is not a bug: don't do that.  Either wait for results often enough
that this doesn't build up to the point of running out of memory, or
(better for performance) re-use existing memory objects instead of
allocating/deallocating.  (To do the latter with clmath, use
pyopencl.tools.MemoryPool.)

While investigating this I discovered that all beignet queues are out-
of-order execution even if the user requested in-order, which is a bug,
but is not the cause of this issue.

(c) In beignet but not pocl, a long sequence of clmath operations leaks
memory, even with regular waits.

To ensure that intermediate results are calculated before they are used,
clmath arrays use Event objects to track dependencies.  A beignet event
includes references to the event(s) it depends on
(https://sources.debian.org/src/beignet/1.3.2-2/src/cl_event.h/?hl=47#L40),
and continues to hold these as long as the event object exists, even if
it has completed and been waited for.  As OpenCL objects are freed by
reference counting, this means that as long as the last event in a
dependency tree exists, the whole tree of (recursive) dependencies also
exists, taking up memory (~20kB per event).

pocl avoids this by dropping these references after completion (
https://sources.debian.org/src/pocl/1.1-5/lib/CL/devices/common.c/?hl=722#L714
); the attached patch makes beignet do so.  Checking the source suggests
mesa is also affected (
https://sources.debian.org/src/mesa/18.1.3-1/src/gallium/state_trackers/clover/core/event.hpp/?hl=84#L34
), but I don't have the hardware to try it.  (The OpenCL part of mesa is
AMD/Radeon only.)


** Also affects: mesa (Ubuntu)
   Importance: Undecided
       Status: New

** Changed in: pyopencl (Ubuntu)
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to mesa in Ubuntu.
https://bugs.launchpad.net/bugs/1354086

Title:
  [i5-3230] Tight pyopencl.clmath loops cause out-of-memory system hang

Status in beignet package in Ubuntu:
  In Progress
Status in mesa package in Ubuntu:
  New
Status in pyopencl package in Ubuntu:
  Invalid

Bug description:
  In beignet (not pocl), tight loops involving OpenCL array creation and
  destruction, eg. repeated bCL=aCL+bCL (or other pyopencl.clmath
  operations) or repeated pyopencl.enqueue_copy(cq0,bCL.data,aCL.data),
  often hang the whole system, after a number of operations consistent
  with memory exhaustion.

  As waiting for queued operations to finish
  (pyopencl.enqueue_barrier(cq0).wait()) before attempting more avoids
  the bug, but dependencies between the operations (as in the
  bCL=aCL+bCL example) do not, this is probably a result of the
  "allocate memory" step being separate from, and faster than, the "do
  the operation" step, so being able to run ahead until it uses up all
  the memory.

  (Note that while the above wait() can be used as a workaround for this
  bug, it is usually faster to avoid frequent memory allocation
  altogether, by reusing existing arrays; for pyopencl.clmath, this
  means using pyopencl.tools.MemoryPool.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/beignet/+bug/1354086/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to