I've hit it again and here's the full stacktrace (at least what's printed):
[junit] Exception in thread "main" java.lang.RuntimeException: Failed to
acquire random test lock; please verify filesystem for lock directory
'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports locking
[junit] at
org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
[junit] at
org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
[junit] at
org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
[junit] at java.lang.J9VMInternals.newInstanceImpl(Native Method)
[junit] at java.lang.Class.newInstance(Class.java:1325)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
[junit] at
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
[junit] Caused by: org.apache.lucene.store.LockReleaseFailedException:
failed to delete
C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
[junit] at
org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
[junit] at
org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
[junit] ... 9 more
The exception is thrown from NativeFSLock.release() b/c it fails to delete
the lock file. I think I know what the problem is - and it must be related
to the large number of JVMs that are created w/ the parallel tests:
* Suppose that JVM1 draws the number '1' for the test lock file - it thus
creates lock1.
* Now suppose that JVM2 draws the same number, magically somehow - it thus
creates lock1 as well.
* The code of acquireTestLock in NativeFSLockFactory looks like this:
Lock l = makeLock(randomLockName);
try {
l.obtain();
l.release();
--> both will create the same test Lock file. Then l.obtain() probably
returns false for one of them, but it's not checked.
* Then in release there are a couple of things to note:
1) the method is synced on the instance, which does not affect the two JVMs.
2) suppose that both JVMs pass through the if (exists()) check. Then JVM1
releases the lock, and deletes the file.
3) Now JVM2 kicks in, calls lock.release() which has no effect (from the
jdoc: "If this lock object is invalid then invoking this method has no
effect." ). Then when it comes to path.delete(), the file isn't there, the
method returns false and thus an exception is thrown ...
This situation is extremely unlikely to happen, but still, it happens on my
machine quite frequently since the parallel tests. I'm thinking that
acquireTestLock should be less strict, but perhaps we can fix it if we
replace the line:
if (!path.delete()) (line 310)
with this
if (!path.delete() && path.exists())
I.e., if the lock file fails to delete but is still there, throw the
exception ...
What do you think?
Shai
On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir <[email protected]> wrote:
>
>
> On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda <[email protected]>wrote:
>
>>
>> I've had similar random failures on Mac OS X 10.6. They started happening
>> recently, about two weeks ago.
>>
>>
> Thats just too randomly close to when i last worked on this build system
> stuff for LUCENE-1709... perhaps I made it worse instead of better.
>
> --
> Robert Muir
> [email protected]
>