Nice! Mike
On Wed, Apr 28, 2010 at 12:54 PM, Robert Muir <[email protected]> wrote: > As far as the build system goes, I implemented the two ideas mentioned > earlier in this message (not creating a new Formatter for each test, and not > spawning 26 jvms for each batch) > > Jira is down, but if you want to help test you can try a patch here: > http://pastebin.com/iqwb73H2 (click Raw/Download) > > Additionally this cuts 1:20 off the total Solrcene 'ant clean test' for me. > > before: > BUILD SUCCESSFUL > Total time: 7 minutes 42 seconds > > after: > BUILD SUCCESSFUL > Total time: 6 minutes 23 seconds > > On Wed, Apr 28, 2010 at 12:25 PM, Michael McCandless > <[email protected]> wrote: >> >> I think this are good changes to NativeFSLockFactory. >> >> But: the chances that N JVMs launched at once would conflict on the >> randomly generated lock file name should be miniscule... though it >> does depend on how good new Random() is at seeding itself. Do we >> really think this explains your exceptions Shai? (And, if so, even w/ >> these changes, the conflict could still happen?) Maybe we should >> explicitly seed it? >> >> Mike >> >> On Wed, Apr 28, 2010 at 11:22 AM, Shai Erera <[email protected]> wrote: >> > I'd like to summarize the IRC discussion Mark and I had: >> > >> > The lock file's existence in the directory should not fail obtain() from >> > retrieving obtaining a lock. That's the whole difference between Simple >> > and >> > Native. So we should make a best-effort to delete it. If the delete >> > fails on >> > release(), then ok. On obtain(), we won't return false if the lock >> > exists, >> > but attempt to really obtain it and fail appropriately. >> > >> > While the previously proposed fix (add "&& path.exists()" to release()) >> > might work most of the times, it will only work "most of the times". >> > I.e., >> > between release() and delete(), an external process, like AntiVirus, >> > might >> > lock the file, and delete will fail, but the file will still be there, >> > and >> > we'll throw an exception still. >> > >> > So, the proposed changes are: >> > * release() is allowed to fail to delete the lock file. >> > * obtain() should not return false if the lock file exists - it should >> > really attempt to obtain it. >> > * in acquireTestLock(), if after release() is called, the lock file >> > still >> > exists, we'll retry the delete few ms later, and if that fails, call >> > deleteOnExit. >> > >> > How's that sound? >> > >> > Shai >> > >> > On Wed, Apr 28, 2010 at 5:58 PM, Mark Miller <[email protected]> >> > wrote: >> >> >> >> I don't follow. The simple lock impl must delete the file, but the >> >> native >> >> impl should not have to. The file has nothing to do with the lock - its >> >> just >> >> the medium to ask for and release the lock. If it already exists, you >> >> don't >> >> have to create it - you can just use it to try and get a native lock. >> >> Likewise, it doesn't need to be removed to release a native lock - you >> >> simply call unlock on it. >> >> >> >> On 4/28/10 10:34 AM, Shai Erera wrote: >> >>> >> >>> But this method is called also for the regular lock file - if >> >>> release() >> >>> won't delete the file, then the next l.obtain() will return false. >> >>> >> >>> Shai >> >>> >> >>> On Wed, Apr 28, 2010 at 5:31 PM, Mark Miller <[email protected] >> >>> <mailto:[email protected]>> wrote: >> >>> >> >>> It shouldn't need too though - the native lock file is simply a >> >>> dummy file to apply the lock too - shouldn't matter if it already >> >>> exists or not (though it seems to in the current code). >> >>> >> >>> >> >>> On 4/28/10 10:22 AM, Shai Erera wrote: >> >>> >> >>> If you won't delete the file, the next obtain will fail? >> >>> >> >>> On Wed, Apr 28, 2010 at 5:12 PM, Mark Miller >> >>> <[email protected] <mailto:[email protected]> >> >>> <mailto:[email protected] <mailto:[email protected]>>> >> >>> wrote: >> >>> >> >>> I wonder if not being able to delete the file should throw >> >>> a >> >>> release >> >>> failed exception at all. You have actually released the >> >>> native lock >> >>> - you where just not able to clean up - but that seems more >> >>> like a >> >>> warning situation than a failure. >> >>> >> >>> >> >>> -- >> >>> - Mark >> >>> >> >>> http://www.lucidimagination.com >> >>> >> >>> On 4/28/10 9:53 AM, Shai Erera wrote: >> >>> >> >>> I've hit it again and here's the full stacktrace (at >> >>> least >> >>> what's printed): >> >>> >> >>> [junit] Exception in thread "main" >> >>> java.lang.RuntimeException: >> >>> Failed to acquire random test lock; please verify >> >>> filesystem for >> >>> lock >> >>> directory >> >>> 'C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock' >> >>> supports >> >>> locking >> >>> [junit] at >> >>> >> >>> >> >>> >> >>> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:88) >> >>> [junit] at >> >>> >> >>> >> >>> >> >>> org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:127) >> >>> [junit] at >> >>> >> >>> >> >>> >> >>> org.apache.lucene.util.LuceneJUnitResultFormatter.<init>(LuceneJUnitResultFormatter.java:74) >> >>> [junit] at >> >>> java.lang.J9VMInternals.newInstanceImpl(Native Method) >> >>> [junit] at >> >>> java.lang.Class.newInstance(Class.java:1325) >> >>> [junit] at >> >>> >> >>> >> >>> >> >>> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:248) >> >>> [junit] at >> >>> >> >>> >> >>> >> >>> org.apache.tools.ant.taskdefs.optional.junit.FormatterElement.createFormatter(FormatterElement.java:214) >> >>> [junit] at >> >>> >> >>> >> >>> >> >>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.transferFormatters(JUnitTestRunner.java:819) >> >>> [junit] at >> >>> >> >>> >> >>> >> >>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:909) >> >>> [junit] at >> >>> >> >>> >> >>> >> >>> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:743) >> >>> [junit] Caused by: >> >>> org.apache.lucene.store.LockReleaseFailedException: >> >>> failed to delete >> >>> >> >>> >> >>> >> >>> C:\DOCUME~1\shaie\LOCALS~1\Temp\lucene_junit_lock\lucene-wn1v4z-test.lock >> >>> [junit] at >> >>> >> >>> >> >>> >> >>> org.apache.lucene.store.NativeFSLock.release(NativeFSLockFactory.java:311) >> >>> [junit] at >> >>> >> >>> >> >>> >> >>> org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:86) >> >>> [junit] ... 9 more >> >>> >> >>> The exception is thrown from NativeFSLock.release() b/c >> >>> it fails to >> >>> delete the lock file. I think I know what the problem >> >>> is >> >>> - and >> >>> it must >> >>> be related to the large number of JVMs that are created >> >>> w/ the >> >>> parallel >> >>> tests: >> >>> * Suppose that JVM1 draws the number '1' for the test >> >>> lock file - it >> >>> thus creates lock1. >> >>> * Now suppose that JVM2 draws the same number, >> >>> magically >> >>> somehow >> >>> - it >> >>> thus creates lock1 as well. >> >>> * The code of acquireTestLock in NativeFSLockFactory >> >>> looks like >> >>> this: >> >>> Lock l = makeLock(randomLockName); >> >>> try { >> >>> l.obtain(); >> >>> l.release(); >> >>> --> both will create the same test Lock file. Then >> >>> l.obtain() >> >>> probably >> >>> returns false for one of them, but it's not checked. >> >>> * Then in release there are a couple of things to note: >> >>> 1) the method is synced on the instance, which does not >> >>> affect >> >>> the two JVMs. >> >>> 2) suppose that both JVMs pass through the if >> >>> (exists()) >> >>> check. Then >> >>> JVM1 releases the lock, and deletes the file. >> >>> 3) Now JVM2 kicks in, calls lock.release() which has no >> >>> effect >> >>> (from the >> >>> jdoc: "If this lock object is invalid then invoking >> >>> this >> >>> method >> >>> has no >> >>> effect." ). Then when it comes to path.delete(), the >> >>> file isn't >> >>> there, >> >>> the method returns false and thus an exception is >> >>> thrown >> >>> ... >> >>> >> >>> This situation is extremely unlikely to happen, but >> >>> still, it >> >>> happens on >> >>> my machine quite frequently since the parallel tests. >> >>> I'm >> >>> thinking that >> >>> acquireTestLock should be less strict, but perhaps we >> >>> can fix it >> >>> if we >> >>> replace the line: >> >>> if (!path.delete()) (line 310) >> >>> with this >> >>> if (!path.delete() && path.exists()) >> >>> >> >>> I.e., if the lock file fails to delete but is still >> >>> there, throw the >> >>> exception ... >> >>> >> >>> What do you think? >> >>> >> >>> Shai >> >>> >> >>> On Tue, Apr 27, 2010 at 10:21 PM, Robert Muir >> >>> <[email protected] <mailto:[email protected]> >> >>> <mailto:[email protected] <mailto:[email protected]>> >> >>> <mailto:[email protected] <mailto:[email protected]> >> >>> <mailto:[email protected] <mailto:[email protected]>>>> wrote: >> >>> >> >>> >> >>> >> >>> On Tue, Apr 27, 2010 at 3:06 PM, Andi Vajda >> >>> <[email protected] <mailto:[email protected]> >> >>> <mailto:[email protected] >> >>> <mailto:[email protected]>> >> >>> <mailto:[email protected] >> >>> <mailto:[email protected]> >> >>> >> >>> <mailto:[email protected] >> >>> <mailto:[email protected]>>>> wrote: >> >>> >> >>> >> >>> I've had similar random failures on Mac OS X >> >>> 10.6. They >> >>> started >> >>> happening recently, about two weeks ago. >> >>> >> >>> >> >>> Thats just too randomly close to when i last worked >> >>> on this >> >>> build >> >>> system stuff for LUCENE-1709... perhaps I made it >> >>> worse >> >>> instead of >> >>> better. >> >>> >> >>> -- >> >>> Robert Muir >> >>> [email protected] <mailto:[email protected]> >> >>> <mailto:[email protected] <mailto:[email protected]>> >> >>> <mailto:[email protected] <mailto:[email protected]> >> >>> <mailto:[email protected] <mailto:[email protected]>>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: [email protected] >> >>> <mailto:[email protected]> >> >>> <mailto:[email protected] >> >>> <mailto:[email protected]>> >> >>> >> >>> For additional commands, e-mail: [email protected] >> >>> <mailto:[email protected]> >> >>> <mailto:[email protected] >> >>> <mailto:[email protected]>> >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> - Mark >> >>> >> >>> http://www.lucidimagination.com >> >>> >> >>> >> >>> --------------------------------------------------------------------- >> >>> To unsubscribe, e-mail: [email protected] >> >>> <mailto:[email protected]> >> >>> For additional commands, e-mail: [email protected] >> >>> <mailto:[email protected]> >> >>> >> >>> >> >> >> >> >> >> -- >> >> - Mark >> >> >> >> http://www.lucidimagination.com >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: [email protected] >> >> For additional commands, e-mail: [email protected] >> >> >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > > > -- > Robert Muir > [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
