Thanks Vladimir. In the case of HDFS, the global lock for conflicting simultaneous changes ie
mkdir /a/b/c together with mv /a/b /a/d will trigger the NN global lock on the file system so the latter operation won't start until the former is completed. Hence, the point of my confusion is why IGFS is adding an additional lock on top of what the secondary file system does anyway. Sorry, if I haven't expressed myself in a better way from the beginning. Cos On Thu, Oct 08, 2015 at 11:33AM, Vladimir Ozerov wrote: > Cos, > > Initially IGFS was designed to support concurrent structural updates. E.g., > creation of directories /A/C/D and /E/F/G can be performed concurrently. > Now we revealed that it might cause concurrency issues in case of > conflicting updates. > > To fix this problem we now perform every update holding a kind of global > file system lock. This doesn't affect file read/write operations - they > still can be performed concurrently of course. The question was whether > this could cause severe performance degradation or not. If we assume that > in average Hadoop jobs file operations dominate over structural operations > on directories, then it should not cause any significant performance issues. > > Vladimir. > > On Wed, Oct 7, 2015 at 10:38 PM, Konstantin Boudnik <c...@apache.org> wrote: > > > On Wed, Oct 07, 2015 at 09:11AM, Vladimir Ozerov wrote: > > > Cos, > > > Yes, no long-time locking is expected here. > > > > Sorry, I musta be still dense from the jet-lag: could you put in a bit more > > details? Thanks in advance! > > > > Cos > > > > > On Wed, Oct 7, 2015 at 6:57 AM, Konstantin Boudnik <c...@apache.org> > > wrote: > > > > > > > IIRC NN should be locking on these ops anyway, shouldn't it? The > > situation > > > > is > > > > no different if multiple clients are doing these operations > > > > near-simultaneously. Unless I missed something here... > > > > > > > > On Thu, Sep 24, 2015 at 11:28AM, Sergi Vladykin wrote: > > > > > May be just check that they are not parent-child within the tx? > > > > > > > > > > Sergi > > > > > Igniters, > > > > > > > > > > We revealed concurrency problem in IGFS and I would like to discuss > > > > > possible solutions to it. > > > > > > > > > > Consider the following file system structure: > > > > > root > > > > > |-- A > > > > > | |-- B > > > > > | | |-- C > > > > > | |-- D > > > > > > > > > > ... two concurrent operations in different threads: > > > > > T1: move(/A/B, /A/D); > > > > > T2: move(/A/D, /A/B/C); > > > > > > > > > > ... and how IGFS handles it now: > > > > > T1: verify that "/A/B" and "/A/D" exist, they are not child-parent to > > > > each > > > > > other, etc. -> OK. > > > > > T2: do the same for "A/D" and "A/B/C" -> OK. > > > > > T1: get IDs of "/A", "/A/B" and "/A/D" to lock them later inside tx. > > > > > T2: get IDs of "/A", "/A/D", "/A/B" and "/A/B/C" to lock them later > > > > inside > > > > > tx. > > > > > > > > > > T1: Start pessimistic tx, lock IDs of "/A", "/A/B", "/A/D", perform > > move > > > > -> > > > > > OK. > > > > > root > > > > > |-- A > > > > > | |-- D > > > > > | | |-- B > > > > > | | | |-- C > > > > > > > > > > T2: Start pessimistic tx, lock IDs of "/A", "/A/D", "/A/B" and > > > > > "/A/B/C" (*directory > > > > > structure already changed at this time!*), perform move -> OK. > > > > > root > > > > > |-- A > > > > > B > > > > > |-- D > > > > > | |-- C > > > > > | | |-- B (loop!) > > > > > > > > > > File system is corrupted. Folders B, C and D are not reacheable from > > > > root. > > > > > > > > > > To fix this now we additionaly check if directory structure is still > > > > > valid *inside > > > > > transaction*. It works, no more corruptions. But it requres taking > > locks > > > > on > > > > > the whole paths *including root*. So move, delete and mkdirs > > opeartions > > > > *can > > > > > no longer be concurrent*. > > > > > > > > > > Probably there is a way to relax this while still ensuring > > consistency, > > > > but > > > > > I do not see how. One idea is to store real path inside each entry. > > This > > > > > way we will be able to ensure that it is still at a valid location > > > > without > > > > > blocking parents, so concurrnecy will be restored. But we will have > > to > > > > > propagate strucutral changes to children. E.g. move of a folder with > > 100 > > > > > items will lead to update of >100 cache entries. Not so good. > > > > > > > > > > Any other ideas? > > > > > > > > > > Vladimir. > > > > > >
signature.asc
Description: Digital signature