If you won't delete the file, the next obtain will fail?

On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller <[email protected]> wrote:

> I wonder if not being able to delete the file should throw a release failed
> exception at all. You have actually released the native lock - you where
> just not able to clean up - but that seems more like a warning situation
> than a failure.
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> On 4/28/10 9:53 AM, Shai Erera wrote:
>
>> I've hit it again and here's the full stacktrace (at least what's
>> printed):
>>
>>     [junit] Exception in thread "main" java.lang.RuntimeException:
>> Failed to acquire random test lock; please verify filesystem for lock
>> directory 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' supports
>> locking
>>     [junit]     at
>>
>> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88)
>>     [junit]     at
>>
>> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127)
>>     [junit]     at
>>
>> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74)
>>     [junit]     at java.lang.J9VMInternals.newInstanceImpl(Native Method)
>>     [junit]     at java.lang.Class.newInstance(Class.java:1325)
>>     [junit]     at
>>
>> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248)
>>     [junit]     at
>>
>> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214)
>>     [junit]     at
>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819)
>>     [junit]     at
>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909)
>>     [junit]     at
>>
>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743)
>>     [junit] Caused by:
>> org.apache.lucene.store.LockReleaseFailedException: failed to delete
>> C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock
>>     [junit]     at
>> org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311)
>>     [junit]     at
>>
>> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86)
>>     [junit]     ... 9 more
>>
>> The exception is thrown from NativeFSLock.release() b/c it fails to
>> delete the lock file. I think I know what the problem is - and it must
>> be related to the large number of JVMs that are created w/ the parallel
>> tests:
>> * Suppose that JVM1 draws the number '1' for the test lock file - it
>> thus creates lock1.
>> * Now suppose that JVM2 draws the same number, magically somehow - it
>> thus creates lock1 as well.
>> * The code of acquireTestLock in NativeFSLockFactory looks like this:
>>     Lock l = makeLock(randomLockName);
>>     try {
>>       l.obtain();
>>       l.release();
>> --> both will create the same test Lock file. Then l.obtain() probably
>> returns false for one of them, but it's not checked.
>> * Then in release there are a couple of things to note:
>> 1) the method is synced on the instance, which does not affect the two
>> JVMs.
>> 2) suppose that both JVMs pass through the if (exists()) check. Then
>> JVM1 releases the lock, and deletes the file.
>> 3) Now JVM2 kicks in, calls lock.release() which has no effect (from the
>> jdoc: "If this lock object is invalid then invoking this method has no
>> effect." ). Then when it comes to path.delete(), the file isn't there,
>> the method returns false and thus an exception is thrown ...
>>
>> This situation is extremely unlikely to happen, but still, it happens on
>> my machine quite frequently since the parallel tests. I'm thinking that
>> acquireTestLock should be less strict, but perhaps we can fix it if we
>> replace the line:
>>      if (!path.delete()) (line 310)
>> with this
>>      if (!path.delete() && path.exists())
>>
>> I.e., if the lock file fails to delete but is still there, throw the
>> exception ...
>>
>> What do you think?
>>
>> Shai
>>
>> On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>
>>
>>    On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda <[email protected]
>>    <mailto:[email protected]>> wrote:
>>
>>
>>        I've had similar random failures on Mac OS X 10.6. They started
>>        happening recently, about two weeks ago.
>>
>>
>>    Thats just too randomly close to when i last worked on this build
>>    system stuff for LUCENE-1709... perhaps I made it worse instead of
>>    better.
>>
>>    --
>>    Robert Muir
>>    [email protected] <mailto:[email protected]>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to