Hi Grigori, Grigori Goronzy <g...@chown.ath.cx> writes:
> Wrap MapBuffer and MapImage as hard_event actions, like other > operations. This enables correct profiling. Also make sure to wait > for events to finish when blocking is requested by the caller. > --- > src/gallium/state_trackers/clover/api/transfer.cpp | 50 > ++++++++++++++++++++-- > 1 file changed, 46 insertions(+), 4 deletions(-) > > diff --git a/src/gallium/state_trackers/clover/api/transfer.cpp > b/src/gallium/state_trackers/clover/api/transfer.cpp > index fdb9405..4986f53 100644 > --- a/src/gallium/state_trackers/clover/api/transfer.cpp > +++ b/src/gallium/state_trackers/clover/api/transfer.cpp > @@ -270,6 +270,18 @@ namespace { > src_obj->resource(q), src_orig); > }; > } > + > + /// > + /// Resource mapping > + /// > + template<typename T> > + std::function<void (event &)> > + map_resource_op(command_queue &q, T obj, cl_map_flags flags, bool > blocking, > + const vector_t &origin, const vector_t ®ion, void > **map) { > + return [=, &q](event &) { > + *map = obj->resource(q).add_map(q, flags, blocking, origin, region); > + }; > + } > } > > CLOVER_API cl_int > @@ -363,6 +375,10 @@ clEnqueueReadBufferRect(cl_command_queue d_q, cl_mem > d_mem, cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + hard_event::wait() may fail, so this should probably be done before the ret_object() call to avoid leaks. Is there any reason you didn't make the same change in clEnqueueReadBuffer() and clEnqueueWriteBuffer()? > return CL_SUCCESS; > > } catch (error &e) { > @@ -400,6 +416,10 @@ clEnqueueWriteBufferRect(cl_command_queue d_q, cl_mem > d_mem, cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + Same comment as above. Also note that this is being more strict than the spec requires (which I believe is what Tom was referring to). From the CL 1.2 spec: | If blocking_write is CL_TRUE, the OpenCL implementation copies the data | referred to by ptr and enqueues the write operation in the | command-queue. The memory pointed to by ptr can be reused by the | application after the clEnqueueWriteBufferRect call returns. The spec is giving you no guarantee that the write to the actual memory object will be complete by the time the clEnqueueWriteBufferRect call returns -- Only that your data will have been buffered somewhere and the memory pointed to by the argument can be reused immediately by the application. The reason why I was reluctant to make this change last time it came up was that it's likely to hurt performance unnecessarily because the wait() call blocks until *all* previous commands in the same queue have completed execution, even though in the most common case the copy is performed synchronously using soft_copy_op(), so the wait() call is redundant even for blocking copies. The case with blocking reads is similar, the copy is handled synchronously using soft_copy_op() when no user events are present in the list of dependencies, so calling wait() on the event is unnecessary to guarantee that the execution of the read has completed, and will cause a pipe_context flush and wait until the most recent fence is signalled. Ideally we would have a weaker variant of event::wait() (e.g. wait_signalled()) that doesn't flush and just waits for the associated action call-back to have been executed without giving any guarantees about the corresponding GPU command. The event interface doesn't expose such a functionality right now, I'm attaching two (completely untested) patches implementing it, you should be able to use them as starting point to fix blocking transfers. > return CL_SUCCESS; > > } catch (error &e) { > @@ -505,6 +525,10 @@ clEnqueueReadImage(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + > return CL_SUCCESS; > > } catch (error &e) { > @@ -539,6 +563,10 @@ clEnqueueWriteImage(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > region)); > > ret_object(rd_ev, hev); > + > + if (blocking) > + hev().wait(); > + Same comment as above for image reads and writes. > return CL_SUCCESS; > > } catch (error &e) { > @@ -665,10 +693,17 @@ clEnqueueMapBuffer(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > validate_object(q, mem, obj_origin, obj_pitch, region); > validate_map_flags(mem, flags); > > - void *map = mem.resource(q).add_map(q, flags, blocking, obj_origin, > region); > + void *map = nullptr; > + auto hev = create<hard_event>( > + q, CL_COMMAND_MAP_BUFFER, deps, > + map_resource_op(q, &mem, flags, blocking, obj_origin, region, &map)); > > - ret_object(rd_ev, create<hard_event>(q, CL_COMMAND_MAP_BUFFER, deps)); > + ret_object(rd_ev, hev); > ret_error(r_errcode, CL_SUCCESS); > + > + if (blocking) > + hev().wait(); > + This doesn't look right for non-blocking maps, if you wrap it in an event you need to make sure that the action has been executed (e.g. by using event::wait_signalled()) before you return, even if the map is non-blocking, otherwise the "map" variable won't be initialized in time and you'll just return nullptr. > return map; > > } catch (error &e) { > @@ -693,10 +728,17 @@ clEnqueueMapImage(cl_command_queue d_q, cl_mem d_mem, > cl_bool blocking, > validate_object(q, img, origin, region); > validate_map_flags(img, flags); > > - void *map = img.resource(q).add_map(q, flags, blocking, origin, region); > + void *map = nullptr; > + auto hev = create<hard_event>( > + q, CL_COMMAND_MAP_IMAGE, deps, > + map_resource_op(q, &img, flags, blocking, origin, region, &map)); > > - ret_object(rd_ev, create<hard_event>(q, CL_COMMAND_MAP_IMAGE, deps)); > + ret_object(rd_ev, hev); > ret_error(r_errcode, CL_SUCCESS); > + > + if (blocking) > + hev().wait(); > + Same here. > return map; > > } catch (error &e) { > -- > 1.9.1 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev
From 162047763a83a57764c5784c95c3f5474da897b1 Mon Sep 17 00:00:00 2001 From: Francisco Jerez <curroje...@riseup.net> Date: Tue, 9 Jun 2015 22:52:25 +0300 Subject: [PATCH 1/2] clover: Wrap event::wait_count in a method taking care of the required locking. --- src/gallium/state_trackers/clover/core/event.cpp | 19 ++++++++++++------- src/gallium/state_trackers/clover/core/event.hpp | 3 ++- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/src/gallium/state_trackers/clover/core/event.cpp b/src/gallium/state_trackers/clover/core/event.cpp index e1f9de0..fdd9209 100644 --- a/src/gallium/state_trackers/clover/core/event.cpp +++ b/src/gallium/state_trackers/clover/core/event.cpp @@ -27,7 +27,7 @@ using namespace clover; event::event(clover::context &ctx, const ref_vector<event> &deps, action action_ok, action action_fail) : - context(ctx), wait_count(1), _status(0), + context(ctx), _wait_count(1), _status(0), action_ok(action_ok), action_fail(action_fail) { for (auto &ev : deps) ev.chain(*this); @@ -41,7 +41,7 @@ event::trigger_self() { std::lock_guard<std::mutex> lock(mutex); std::vector<intrusive_ref<event>> evs; - if (!--wait_count) + if (!--_wait_count) std::swap(_chain, evs); return evs; @@ -81,10 +81,15 @@ event::abort(cl_int status) { ev.abort(status); } +unsigned +event::wait_count() const { + std::lock_guard<std::mutex> lock(mutex); + return _wait_count; +} + bool event::signalled() const { - std::lock_guard<std::mutex> lock(mutex); - return !wait_count; + return !wait_count(); } cl_int @@ -99,8 +104,8 @@ event::chain(event &ev) { std::unique_lock<std::mutex> lock_ev(ev.mutex, std::defer_lock); std::lock(lock, lock_ev); - if (wait_count) { - ev.wait_count++; + if (_wait_count) { + ev._wait_count++; _chain.push_back(ev); } ev.deps.push_back(*this); @@ -112,7 +117,7 @@ event::wait() const { ev.wait(); std::unique_lock<std::mutex> lock(mutex); - cv.wait(lock, [=]{ return !wait_count; }); + cv.wait(lock, [=]{ return !_wait_count; }); } hard_event::hard_event(command_queue &q, cl_command_type command, diff --git a/src/gallium/state_trackers/clover/core/event.hpp b/src/gallium/state_trackers/clover/core/event.hpp index 6469e48..53dac68 100644 --- a/src/gallium/state_trackers/clover/core/event.hpp +++ b/src/gallium/state_trackers/clover/core/event.hpp @@ -85,8 +85,9 @@ namespace clover { private: std::vector<intrusive_ref<event>> trigger_self(); std::vector<intrusive_ref<event>> abort_self(cl_int status); + unsigned wait_count() const; - unsigned wait_count; + unsigned _wait_count; cl_int _status; action action_ok; action action_fail; -- 2.3.5
From 9789fa01f013b3f451cc8f8bcfa315edad1865a0 Mon Sep 17 00:00:00 2001 From: Francisco Jerez <curroje...@riseup.net> Date: Tue, 9 Jun 2015 22:59:43 +0300 Subject: [PATCH 2/2] clover: Run the associated action before an event is signalled. And define a method for other threads to wait until the action function associated with an event has been executed to completion. For hard events, this will mean waiting until the corresponding command has been submitted to the pipe driver, without necessarily flushing the pipe_context and waiting for the actual command to be processed by the GPU (which is what hard_event::wait() already does). This weaker kind of event wait will allow implementing blocking memory transfers efficiently. --- src/gallium/state_trackers/clover/core/event.cpp | 22 +++++++++++----------- src/gallium/state_trackers/clover/core/event.hpp | 1 + 2 files changed, 12 insertions(+), 11 deletions(-) diff --git a/src/gallium/state_trackers/clover/core/event.cpp b/src/gallium/state_trackers/clover/core/event.cpp index fdd9209..1f0db1b 100644 --- a/src/gallium/state_trackers/clover/core/event.cpp +++ b/src/gallium/state_trackers/clover/core/event.cpp @@ -44,19 +44,16 @@ event::trigger_self() { if (!--_wait_count) std::swap(_chain, evs); + cv.notify_all(); return evs; } void event::trigger() { - auto evs = trigger_self(); - - if (signalled()) { + if (wait_count() == 1) action_ok(*this); - cv.notify_all(); - } - for (event &ev : evs) + for (event &ev : trigger_self()) ev.trigger(); } @@ -73,11 +70,9 @@ event::abort_self(cl_int status) { void event::abort(cl_int status) { - auto evs = abort_self(status); - action_fail(*this); - for (event &ev : evs) + for (event &ev : abort_self(status)) ev.abort(status); } @@ -112,12 +107,17 @@ event::chain(event &ev) { } void +event::wait_signalled() const { + std::unique_lock<std::mutex> lock(mutex); + cv.wait(lock, [=]{ return !_wait_count; }); +} + +void event::wait() const { for (event &ev : deps) ev.wait(); - std::unique_lock<std::mutex> lock(mutex); - cv.wait(lock, [=]{ return !_wait_count; }); + wait_signalled(); } hard_event::hard_event(command_queue &q, cl_command_type command, diff --git a/src/gallium/state_trackers/clover/core/event.hpp b/src/gallium/state_trackers/clover/core/event.hpp index 53dac68..03c97bc 100644 --- a/src/gallium/state_trackers/clover/core/event.hpp +++ b/src/gallium/state_trackers/clover/core/event.hpp @@ -69,6 +69,7 @@ namespace clover { virtual cl_int status() const; virtual command_queue *queue() const = 0; virtual cl_command_type command() const = 0; + void wait_signalled() const; virtual void wait() const; virtual struct pipe_fence_handle *fence() const { -- 2.3.5
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev