geom

Alexander Motin Thu, 03 Nov 2011 01:35:01 -0700

On 03.11.2011 09:51, Pawel Jakub Dawidek wrote:
> On Wed, Nov 02, 2011 at 05:09:42PM +0200, Alexander Motin wrote:
>> On 11/02/11 15:42, Pawel Jakub Dawidek wrote:
>>> On Wed, Nov 02, 2011 at 09:24:59AM +0000, Alexander Motin wrote:
>>>> Author: mav
>>>> Date: Wed Nov  2 09:24:59 2011
>>>> New Revision: 227015
>>>> URL: http://svn.freebsd.org/changeset/base/227015
>>>>
>>>> Log:
>>>>   Add mutex and two flags to make orphan() call properly asynchronous:
>>>>    - delay consumer closing and detaching on orphan() until all I/Os 
>>>> complete;
>>>>    - prevent new I/Os submission after orphan() called.
>>>>   Previous implementation could destroy consumers still having active
>>>>   requests and worked only because of global workaround made on GEOM level.
>>>
>>> Alexander, I'm not sure I agree with your recent changes to address
>>> this. The checks in GEOM were there to avoid the need for counting
>>> outstanding I/O requests in every single GEOM class.
>>
>> Sorry, I've sent you letter last week asking for your opinion on this
>> problem, but got no response. :(
> 
> I did not receive it, sorry.
> 
>>> Why do you think the checks in GEOM are not good enough?
>>
>> Mostly because nstart increment and request submission are not locked.
>> There are race windows between start() call, request submission and
>> consumer detach: start() method may get provider pointer that will not
>> be valid in time of the request submission.
>>
>> According geom(4), that race should be closed by assumption that
>> provider should not be closed until all active requests are completed.
>> Kind of reference counting, done by top consumers, such as geom_dev or
>> geom_vfs. That is what I am trying to fix with my changes.
>>
>> Also I don't very like idea of periodic polling, trying to catch moment
>> when nstart == nend. As soon as at that time we haven't called orphan()
>> method yet, requests may go infinitely, even though each will be aborted
>> quickly. It looks dirty at least.
>>
>> Counting of outstanding I/O requests needed only for classes that for
>> some reason can't follow that assumption. For example, geom_vfs releases
>> provider as soon as it orphans, not waiting for close() call from file
>> system. Another example is gmirror -- we want to drop single
>> disconnected disk while upstream provider is still working and won't be
>> closed.
> 
> Ok, I can see your point now. You are right. The check that nstart is
> equal to nend I added was not to fix races within one class. IIRC I was
> seeing panics there with simple classes that only forward orphan events,
> so even if provider's error was set and no new I/O requests were coming,
> we could destroy orphaned provider even if there were still in-flight
> requests. I was a bit confused, because you were refering to the check I
> added and I don't think it is really related.


Your check does good job to the original problem, reducing the race
window from huge start() -- bio_done() to two small start() -- nstart++
and nend++ -- bio_done(). It protects geoms below the broken one (panics
I see without check usually happen on random level below the real
source), but it does not protect the broken geom itself. Proper orphan()
methods protect both.

-- 
Alexander Motin
_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Re: svn commit: r227015 - head/sys/geom

Reply via email to