Emil Velikov <emil.l.veli...@gmail.com> writes: > On 28 November 2015 at 15:10, Francisco Jerez <curroje...@riseup.net> wrote: >> Emil Velikov <emil.l.veli...@gmail.com> writes: >> >>> Hi Francisco, >>> >>> In all due respect, I am surprised with the volume of justification >>> you put into some changes. >>> >> And apparently it wasn't enough? :P >> >>> On 28 November 2015 at 11:52, Francisco Jerez <curroje...@riseup.net> wrote: >>>> Francisco Jerez <curroje...@riseup.net> writes: >>>> >>>>> Tom Stellard <thomas.stell...@amd.com> writes: >>>>> >>>>>> When probing for devices, clover will call pipe_loader_probe() twice. >>>>>> The first time to retrieve the number of devices, and then second time >>>>>> to retrieve the device structures. >>>>>> >>>>>> We currently assume that the return value of both calls will be the >>>>>> same, but this will not be the case if a device happens to disappear >>>>>> between the two calls. >>>>>> >>>>> >>>>> Assuming that the two sequential calls will give the same result is bad, >>>>> but the likelihood of a device disappearing between the two calls is so >>>>> small it's unlikely to happen more than once in a million years. If >>> Err... have to love this estimate :-P With some recent chances that >>> probability increases ever so slightly, as we push the "can we find >>> the module, does it have the entry point" from .create_screen and >>> .configure further up to probe screen. >>> >> >> I was referring to the probability of a device actually disappearing >> From the system between the two calls. >> >>>>> this problem happens deterministically to you there's almost certainly a >>>>> more serious bug in pipe-loader that will cause it to return an >>>>> incorrect device count during the first call. >>>>> >>> Care to give an example. I'm not sure where/how this is possible. >>> >> The immediately following paragraph explained how and where it's >> possible: pipe_loader_sw_probe() returns one regardless of whether the >> software device is actually available. >> > That's why we have the "two stage" approach > 1) Query the total count, allocate memory and 2) populate the memory > with information. > > The alternative is to build a separate pipe-loader for each target, > which ... doesn't sound optimal. Neither is forcing every target to > have be with the same set of drivers. > >>>>> Indeed, pipe_loader_sw_probe() seems to return 1 unconditionally if ndev >>>>> is zero, so it will give a different device count depending on ndev if >>>>> pipe_loader_sw_probe_null() happens to fail. Seems like a bug we don't >>>>> want to simply paper over. >>>>> >>> Can you elaborate what exactly the problem here ? If we ask "how many >>> we have" sure we can provide that [fixed] number. On the other hand, >>> if we want more information about it (them), there is no guarantee >>> that we'll succeed. >> >> The return value of pipe_loader_probe() is supposed to be the number of >> devices actually available on the system, it doesn't depend on the devs >> pointer or ndev parameter -- See the doxygen comment in pipe_loader.h >> for the expected behaviour and pipe_loader_drm_probe() for a correct >> implementation. >> > True, the documentation does say that. Yet the EGL Devices approach is > what I perceive as better (clearer, more intuitive) approach - if your > storage pointer is null return the total count. Otherwise upon > information retrieval cap up-to the specified device count. >
That's a valid yet somewhat inconsistent alternative approach. In pipe-loader the return value of pipe_loader_probe() always has the same semantics. I don't think I see why you consider it more intuitive for the return value to have inconsistent semantics depending on what the function arguments are, but you're free to propose such a change as a separate series as long as you fix the documentation, all users and back-ends of pipe-loader, if you consider it worth doing -- I personally don't. > Admittedly the EGL spec came long after the pipe-loader. > >>> >>>>> Anyway I'm not saying that clover's assumption shouldn't be fixed too -- >>>>> See comment below. >>>>> >>>>>> This patch removes this assumption and checks the return value of the >>>>>> second pipe_loader_probe() call to ensure it does not try to initialize >>>>>> devices that no longer exits. >>>>>> >>>>>> CC: <mesa-sta...@lists.freedesktop.org> >>>>>> --- >>>>>> src/gallium/state_trackers/clover/core/platform.cpp | 5 +++-- >>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/src/gallium/state_trackers/clover/core/platform.cpp >>>>>> b/src/gallium/state_trackers/clover/core/platform.cpp >>>>>> index 328b71c..689d692 100644 >>>>>> --- a/src/gallium/state_trackers/clover/core/platform.cpp >>>>>> +++ b/src/gallium/state_trackers/clover/core/platform.cpp >>>>>> @@ -28,9 +28,10 @@ platform::platform() : adaptor_range(evals(), devs) { >>>>>> int n = pipe_loader_probe(NULL, 0); >>>>>> std::vector<pipe_loader_device *> ldevs(n); >>>>>> >>>>>> - pipe_loader_probe(&ldevs.front(), n); >>>>>> + n = pipe_loader_probe(&ldevs.front(), n); >>>>>> >>>>>> - for (pipe_loader_device *ldev : ldevs) { >>>>>> + for (int i = 0; i < n; ++i) { >>>>>> + pipe_loader_device *ldev = ldevs[i]; >>>>> >>>>> There are many reasons to prefer range loops to induction variables >>>>> (including iterators), and there's no need to change it here: Because >>>>> ldevs is zero-initialized you can just wrap the push_back(create()) call >>>>> around an 'if (ldev)' to make this a one-liner. >>>>> >>>> Let me be more explicit about what could go wrong with this change in >>>> its current form: The return value of pipe_loader_probe() is the number >>>> of devices available on the system, so basically the same race condition >>>> is still possible if 'n' increases from the first call to the second, >>>> and in that case you'll end up reading uninitialized memory past the end >>>> of the array -- Worse than a null dereference. Range loops make this >>>> kind of situation impossible. >>>> >>> If I understood you correctly, you're saying that on the second call >>> can return count greater than the initial one. >>> >>> Errm... this looks like a bug in the pipe-loader. I don't think it's >>> normal to claim that X devices are available only to return >>> information for fewer devices. OK, calling it a bug might not be >>> politically correct, although it is a misleading design decision, at >>> the very least. >>> >> What I was saying is that if the rationale of this patch is correct and >> clover may not make any assumptions about the number of available >> devices remaining the same between calls to pipe_loader_probe(), this >> patch doesn't fix the problem at all, and in fact introduces a worse >> kind of bug (read from uninitialized memory). The alternative I >> suggested would be simpler and more robust, so it's worth changing >> regardless of whether or not it's actually likely that the device count >> will increase or decrease between calls. >> > So the difference boils down to what is/should be the behaviour of > pipe_loader_probe(). I take it that you're adamant that pipe-loader's > approach is superior, or there is a sliver of hope that one can sway > your view ? > I don't think it boils down to that. I consider the current idiom used by pipe_loader_probe() satisfactory, but that doesn't mean I'm against fixing this assumption in Clover in addition, since Tom is right that there currently is a theoretical race condition, regardless of what the semantics of the return value of pipe_loader_probe() are -- Even though the race condition is extremely unlikely with the currently documented semantics (as implemented by the DRM back-end). That's why I didn't NAK the patch, I simply asked for an improvement. > Thanks > Emil
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev