I'd like to summarize the IRC discussion Mark and I had: The lock file's existence in the directory should not fail obtain() from retrieving obtaining a lock. That's the whole difference between Simple and Native. So we should make a best-effort to delete it. If the delete fails on release(), then ok. On obtain(), we won't return false if the lock exists, but attempt to really obtain it and fail appropriately.
While the previously proposed fix (add "&& path.exists()" to release()) might work most of the times, it will only work "most of the times". I.e., between release() and delete(), an external process, like AntiVirus, might lock the file, and delete will fail, but the file will still be there, and we'll throw an exception still. So, the proposed changes are: * release() is allowed to fail to delete the lock file. * obtain() should not return false if the lock file exists - it should really attempt to obtain it. * in acquireTestLock(), if after release() is called, the lock file still exists, we'll retry the delete few ms later, and if that fails, call deleteOnExit. How's that sound? Shai On Wed, Apr 28, 2010 at 5:58 PM, Mark Miller <[email protected]> wrote: > I don't follow. The simple lock impl must delete the file, but the native > impl should not have to. The file has nothing to do with the lock - its just > the medium to ask for and release the lock. If it already exists, you don't > have to create it - you can just use it to try and get a native lock. > Likewise, it doesn't need to be removed to release a native lock - you > simply call unlock on it. > > > On 4/28/10 10:34 AM, Shai Erera wrote: > >> But this method is called also for the regular lock file - if release() >> won't delete the file, then the next l.obtain() will return false. >> >> Shai >> >> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <[email protected] >> <mailto:[email protected]>> wrote: >> >> It shouldn't need too though - the native lock file is simply a >> dummy file to apply the lock too - shouldn't matter if it already >> exists or not (though it seems to in the current code). >> >> >> On 4/28/10 10:22 AM, Shai Erera wrote: >> >> If you won't delete the file, the next obtain will fail? >> >> On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> >> >> wrote: >> >> I wonder if not being able to delete the file should throw a >> release >> failed exception at all. You have actually released the >> native lock >> - you where just not able to clean up - but that seems more >> like a >> warning situation than a failure. >> >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> On 4/28/10 9:53 AM, Shai Erera wrote: >> >> I've hit it again and here's the full stacktrace (at least >> what's printed): >> >> [junit] Exception in thread "main" >> java.lang.RuntimeException: >> Failed to acquire random test lock; please verify >> filesystem for >> lock >> directory >> 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' >> supports >> locking >> [junit] at >> >> >> >> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88) >> [junit] at >> >> >> >> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127) >> [junit] at >> >> >> >> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74) >> [junit] at >> java.lang.J9VMInternals.newInstanceImpl(Native Method) >> [junit] at >> java.lang.Class.newInstance(Class.java:1325) >> [junit] at >> >> >> >> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248) >> [junit] at >> >> >> >> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214) >> [junit] at >> >> >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819) >> [junit] at >> >> >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909) >> [junit] at >> >> >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743) >> [junit] Caused by: >> org.apache.lucene.store.LockReleaseFailedException: >> failed to delete >> >> >> C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock >> [junit] at >> >> >> org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311) >> [junit] at >> >> >> >> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86) >> [junit] ... 9 more >> >> The exception is thrown from NativeFSLock.release() b/c >> it fails to >> delete the lock file. I think I know what the problem is >> - and >> it must >> be related to the large number of JVMs that are created >> w/ the >> parallel >> tests: >> * Suppose that JVM1 draws the number '1' for the test >> lock file - it >> thus creates lock1. >> * Now suppose that JVM2 draws the same number, magically >> somehow >> - it >> thus creates lock1 as well. >> * The code of acquireTestLock in NativeFSLockFactory >> looks like >> this: >> Lock l = makeLock(randomLockName); >> try { >> l.obtain(); >> l.release(); >> --> both will create the same test Lock file. Then >> l.obtain() >> probably >> returns false for one of them, but it's not checked. >> * Then in release there are a couple of things to note: >> 1) the method is synced on the instance, which does not >> affect >> the two JVMs. >> 2) suppose that both JVMs pass through the if (exists()) >> check. Then >> JVM1 releases the lock, and deletes the file. >> 3) Now JVM2 kicks in, calls lock.release() which has no >> effect >> (from the >> jdoc: "If this lock object is invalid then invoking this >> method >> has no >> effect." ). Then when it comes to path.delete(), the >> file isn't >> there, >> the method returns false and thus an exception is thrown >> ... >> >> This situation is extremely unlikely to happen, but >> still, it >> happens on >> my machine quite frequently since the parallel tests. I'm >> thinking that >> acquireTestLock should be less strict, but perhaps we >> can fix it >> if we >> replace the line: >> if (!path.delete()) (line 310) >> with this >> if (!path.delete() && path.exists()) >> >> I.e., if the lock file fails to delete but is still >> there, throw the >> exception ... >> >> What do you think? >> >> Shai >> >> On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>>> wrote: >> >> >> >> On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda >> <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> <mailto:[email protected] <mailto:[email protected]> >> >> <mailto:[email protected] >> <mailto:[email protected]>>>> wrote: >> >> >> I've had similar random failures on Mac OS X >> 10.6. They >> started >> happening recently, about two weeks ago. >> >> >> Thats just too randomly close to when i last worked >> on this >> build >> system stuff for LUCENE-1709... perhaps I made it worse >> instead of >> better. >> >> -- >> Robert Muir >> [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> <mailto:[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> >> >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> <mailto:[email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>> >> >> For additional commands, e-mail: [email protected] >> <mailto:[email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>> >> >> >> >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> <mailto:[email protected]> >> For additional commands, e-mail: [email protected] >> <mailto:[email protected]> >> >> >> > > -- > - Mark > > http://www.lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
