If you won't delete the file, the next obtain will fail? On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller <[email protected]> wrote:
> I wonder if not being able to delete the file should throw a release failed > exception at all. You have actually released the native lock - you where > just not able to clean up - but that seems more like a warning situation > than a failure. > > > -- > - Mark > > http://www.lucidimagination.com > > On 4/28/10 9:53 AM, Shai Erera wrote: > >> I've hit it again and here's the full stacktrace (at least what's >> printed): >> >> [junit] Exception in thread "main" java.lang.RuntimeException: >> Failed to acquire random test lock; please verify filesystem for lock >> directory 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports >> locking >> [junit] at >> >> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88) >> [junit] at >> >> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127) >> [junit] at >> >> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74) >> [junit] at java.lang.J9VMInternals.newInstanceImpl(Native Method) >> [junit] at java.lang.Class.newInstance(Class.java:1325) >> [junit] at >> >> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248) >> [junit] at >> >> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214) >> [junit] at >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819) >> [junit] at >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909) >> [junit] at >> >> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743) >> [junit] Caused by: >> org.apache.lucene.store.LockReleaseFailedException: failed to delete >> C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock >> [junit] at >> org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311) >> [junit] at >> >> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86) >> [junit] ... 9 more >> >> The exception is thrown from NativeFSLock.release() b/c it fails to >> delete the lock file. I think I know what the problem is - and it must >> be related to the large number of JVMs that are created w/ the parallel >> tests: >> * Suppose that JVM1 draws the number '1' for the test lock file - it >> thus creates lock1. >> * Now suppose that JVM2 draws the same number, magically somehow - it >> thus creates lock1 as well. >> * The code of acquireTestLock in NativeFSLockFactory looks like this: >> Lock l = makeLock(randomLockName); >> try { >> l.obtain(); >> l.release(); >> --> both will create the same test Lock file. Then l.obtain() probably >> returns false for one of them, but it's not checked. >> * Then in release there are a couple of things to note: >> 1) the method is synced on the instance, which does not affect the two >> JVMs. >> 2) suppose that both JVMs pass through the if (exists()) check. Then >> JVM1 releases the lock, and deletes the file. >> 3) Now JVM2 kicks in, calls lock.release() which has no effect (from the >> jdoc: "If this lock object is invalid then invoking this method has no >> effect." ). Then when it comes to path.delete(), the file isn't there, >> the method returns false and thus an exception is thrown ... >> >> This situation is extremely unlikely to happen, but still, it happens on >> my machine quite frequently since the parallel tests. I'm thinking that >> acquireTestLock should be less strict, but perhaps we can fix it if we >> replace the line: >> if (!path.delete()) (line 310) >> with this >> if (!path.delete() && path.exists()) >> >> I.e., if the lock file fails to delete but is still there, throw the >> exception ... >> >> What do you think? >> >> Shai >> >> On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> >> On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> I've had similar random failures on Mac OS X 10.6. They started >> happening recently, about two weeks ago. >> >> >> Thats just too randomly close to when i last worked on this build >> system stuff for LUCENE-1709... perhaps I made it worse instead of >> better. >> >> -- >> Robert Muir >> [email protected] <mailto:[email protected]> >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
