Vladimir Ozerov wrote
> This is how we fixed it for now. But it requires paths locking starting
> from the root. Otherwise they can become parent-child in a moment after
> sucessfull check.

I am not sure if we should be concerned about looking at the root or path
here. This lock is only held for the duration of directory structure change.
I don't think Hadoop/Spark jobs will be affected if we add an extra 10ms
somewhere during the execution.


Vladimir Ozerov wrote
> On Thu, Sep 24, 2015 at 11:28 AM, Sergi Vladykin <

> sergi.vladykin@

> >
> wrote:
> 
>> May be just check that they are not parent-child within the tx?
>>
>> Sergi
>> Igniters,
>>
>> We revealed concurrency problem in IGFS and I would like to discuss
>> possible solutions to it.
>>
>> Consider the following file system structure:
>> root
>> |-- A
>> |   |-- B
>> |   |   |-- C
>> |   |-- D
>>
>> ... two concurrent operations in different threads:
>> T1: move(/A/B, /A/D);
>> T2: move(/A/D, /A/B/C);
>>
>> ... and how IGFS handles it now:
>> T1: verify that "/A/B" and "/A/D" exist, they are not child-parent to
>> each
>> other, etc. -> OK.
>> T2: do the same for "A/D" and "A/B/C" -> OK.
>> T1: get IDs of "/A", "/A/B" and "/A/D" to lock them later inside tx.
>> T2: get IDs of "/A", "/A/D", "/A/B" and "/A/B/C" to lock them later
>> inside
>> tx.
>>
>> T1: Start pessimistic tx, lock IDs of "/A", "/A/B", "/A/D", perform move
>> ->
>> OK.
>> root
>> |-- A
>> |   |-- D
>> |   |   |-- B
>> |   |   |   |-- C
>>
>> T2: Start pessimistic tx, lock IDs of "/A", "/A/D", "/A/B" and
>> "/A/B/C" (*directory
>> structure already changed at this time!*), perform move -> OK.
>> root
>> |-- A
>> B
>> |-- D
>> |   |-- C
>> |   |   |-- B (loop!)
>>
>> File system is corrupted. Folders B, C and D are not reacheable from
>> root.
>>
>> To fix this now we additionaly check if directory structure is still
>> valid *inside
>> transaction*. It works, no more corruptions. But it requres taking locks
>> on
>> the whole paths *including root*. So move, delete and mkdirs opeartions
>> *can
>> no longer be concurrent*.
>>
>> Probably there is a way to relax this while still ensuring consistency,
>> but
>> I do not see how. One idea is to store real path inside each entry. This
>> way we will be able to ensure that it is still at a valid location
>> without
>> blocking parents, so concurrnecy will be restored. But we will have to
>> propagate strucutral changes to children. E.g. move of a folder with 100
>> items will lead to update of >100 cache entries. Not so good.
>>
>> Any other ideas?
>>
>> Vladimir.
>>





--
View this message in context: 
http://apache-ignite-developers.2346864.n4.nabble.com/IGFS-concurrency-issue-tp3449p3734.html
Sent from the Apache Ignite Developers mailing list archive at Nabble.com.

Reply via email to