Re: Is there a way to share IndexReader data sensibly across independent callers?
Can't you just call ReaderManager.close? All in-flight operations with that RM will keep working, and the underlying reader will only finally close once they have all finished. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 9, 2016 at 12:12 AM, Trejkaz wrote: > On Tue, Feb 9, 2016 at 2:10 AM, Sanne Grinovero > wrote: >> Hi, >> you should really try to reuse the same opened Directory, like you >> suggest without closing it until your application is "done" with it in >> all its threads (normally on application shutdown). >> Keeping a Directory open will not lead to have open files, that is >> probably caused by not closing the instances of IndexReader. >> >> I'd highly recommend to use the ReaderManager for these reasons, >> especially because handling these details across different threads >> both correctly and efficiently can be tricky - I've learned that >> myself when implementing similar things before the ReaderManager was >> created. > > I'm already using ReaderManager, but there are issues. > > I want to close it when the last acquired index has been released, but > no thread knows anything about what indexes other threads could be > using, yet we still want indexes to be closed once nobody is using > them. So I end up having to reference count the ReaderManager, which > seems to defeat the purpose of using it in the first place since I > could just reference count the reader itself. I wish it could handle > automatically closing and reopening the index by itself, but I don't > think it can. > > At the moment I have bolted this additional level of reference > counting around ReaderManager and it just creates a new ReaderManager > when the reference count goes back up to 1 and closes it when it goes > back to 0. But this blob has to be synchronised to implement it safely > and the map for looking these things up can never clean out entries, > because I couldn't find a safe way to do that even using > ConcurrentMap. > > TX > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Is there a way to share IndexReader data sensibly across independent callers?
On Tue, Feb 9, 2016 at 7:59 PM, Michael McCandless wrote: > Can't you just call ReaderManager.close? > > All in-flight operations with that RM will keep working, and the > underlying reader will only finally close once they have all finished. I guess that has the caveat that it would be possible to have two readers open on the same directory, which is mostly what I was trying to avoid. My current solution absolutely prevents that, at the cost of having to synchronise when acquiring or releasing, although I can probably use double-checked locking to reduce the impact of that. Really what would be handy is something that resembles ReaderManager but takes Path for every method and also opens and closes the Directory... TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Generate Lucene segments_N file
Hello, First, I don't know if it's the right mailing list to ask for your help, if no please accept my apologies for the inconvenience. While moving Lucene (5.3) index files from a server to an other, I forgot to move the segments_N file (because I use the pattern *.*) Unfortunately I've erased the original folder, and I only have these files in my directory now : _1rpt.fdt _1rpt.fdx _1rpt.fnm _1rpt.nvd _1rpt.nvm _1rpt.si _1rpt_Lucene50_0.doc _1rpt_Lucene50_0.dvd _1rpt_Lucene50_0.dvm _1rpt_Lucene50_0.pos _1rpt_Lucene50_0.tim _1rpt_Lucene50_0.tip write.lock I am missing the segments_42u file, and without it I cannot even do an org.apache.lucene.index.CheckIndex : Exception in thread "main" org.apache.lucene.index.IndexNotFoundException: no segments* file found in MMapDirectory@/solr-5.3.1 /nodes/node1/core/data/index lockFactory=org.apache.lucene.store. NativeFSLockFactory@119d7047: files: [write.lock, _1rpt.fdt, _1rpt.fdx, _1rpt.fnm, _1rpt.nvd, _1rpt.nvm, _1rpt.si, _1rpt_Lucene50_0.doc, _1rpt_Lucene50_0.dvd, _1rpt_Lucene50_0.dvm, _1rpt_Lucene50_0.pos, _1rpt_Lucene50_0.tim, _1rpt_Lucene50_0.tip] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:483) at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2354) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2237) The index is pretty huge (> 800GB) and it will take weeks to rebuild it. Is there a way to generate this missing segment info file ? Thanks a lot for your help. Khanh-Lam Mai khanh-lam@bnf.fr Exposition De Rouge et de Noir. Les vases grecs de la collection de Luynes - jusqu'au 1 er mars 2016 - BnF - Richelieu Avant d'imprimer, pensez à l'environnement.
Re: Is there a way to share IndexReader data sensibly across independent callers?
Why do you need to close the Directory? It should be light weight. But if you really do need it, can't you subclass ReaderManager and override afterClose to close the directory? So you essentially need to "lazy close" your ReaderManager, when there are no searches currently needing it? Why not have a sync'd block, with a reference to the ReaderManager. Inside that block, if the reference is null, that means it's closed, and you open a new one. Else, use the existing one. Won't that do what you need w/o requiring full fledged reference counts? Yes, it is a sync'd block around acquire/release, but I don't see how that can be avoided, and it'd be fast when the ReaderManager is already opened. Mike McCandless http://blog.mikemccandless.com On Tue, Feb 9, 2016 at 6:39 AM, Trejkaz wrote: > On Tue, Feb 9, 2016 at 7:59 PM, Michael McCandless > wrote: >> Can't you just call ReaderManager.close? >> >> All in-flight operations with that RM will keep working, and the >> underlying reader will only finally close once they have all finished. > > I guess that has the caveat that it would be possible to have two > readers open on the same directory, which is mostly what I was trying > to avoid. My current solution absolutely prevents that, at the cost of > having to synchronise when acquiring or releasing, although I can > probably use double-checked locking to reduce the impact of that. > > Really what would be handy is something that resembles ReaderManager > but takes Path for every method and also opens and closes the > Directory... > > TX > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Is there a way to share IndexReader data sensibly across independent callers?
On Wed, Feb 10, 2016 at 3:17 AM, Michael McCandless wrote: > Why do you need to close the Directory? It should be light weight. > But if you really do need it, can't you subclass ReaderManager and > override afterClose to close the directory? I guess that's the next thing I'll try out. I am already subclassing ReaderManager to wrap the readers after discovering that reader caches don't work if I wrap from outside it. As for the need to close it, it's true, in production we don't really need to. But we happen to be using a Directory implementation that checks that callers close things, and that check is only triggered when we close the Directory at the moment. So it's just something that adds diagnostics to help resolve other warnings about files not being closed. > So you essentially need to "lazy close" your ReaderManager, when there > are no searches currently needing it? That's more or less the right way to think about it. Actually each session might make more than one search using the same reader and reuse the same one for those, but when no sessions are running for that index anymore I wanted to close it because Windows has annoying file locking for read operations. If it weren't for Windows... > Why not have a sync'd block, with a reference to the ReaderManager. > Inside that block, if the reference is null, that means it's closed, > and you open a new one. Else, use the existing one. Won't that do > what you need w/o requiring full fledged reference counts? Yes, it is > a sync'd block around acquire/release, but I don't see how that can be > avoided, and it'd be fast when the ReaderManager is already opened. This is roughly what I currently have. At the moment I lock my entire acquire() / release() methods, which maybe it's possible to reduce, although I'm not entirely sure. Concurrency is fiddly... TX - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
JTRES 2016 First Call For Papers
== CALL FOR PAPERS The 14th Workshop on Java Technologies for Real-Time and Embedded Systems JTRES 2016 Part of the Managed Languages & Runtimes Week 2016 29 August - 2 September 2016 Lugano, Switzerland http://jtres2016.compute.dtu.dk/ == Submission deadline: 12 June, 2016 Submission site: https://easychair.org/conferences/?conf=jtres2016 == Over 90% of all microprocessors are now used for real-time and embedded applications. Embedded devices are deployed on a broad diversity of distinct processor architectures and operating systems. The application software for many embedded devices is custom tailored if not written entirely from scratch. The size of typical embedded system software applications is growing exponentially from year to year, with many of today's embedded systems comprised of multiple millions of lines of code. For all of these reasons, the software portability, reuse, and modular composability benefits offered by Java are especially valuable to developers of embedded systems. Both embedded and general purpose software frequently need to comply with real-time constraints. Higher-level programming languages and middleware are needed to robustly and productively design, implement, compose, integrate, validate, and enforce memory and real-time constraints along with conventional functional requirements for reusable software components. The Java programming language has become an attractive choice because of its safety, productivity, its relatively low maintenance costs, and the availability of well trained developers. ::Goal:: Interest in real-time Java by both the academic research community and commercial industry has been motivated by the need to manage the complexity and costs associated with continually expanding embedded real-time software systems. The goal of the workshop is to gather researchers working on real-time and embedded Java to identify the challenging problems that still need to be solved in order to assure the success of real-time Java as a technology and to report results and experience gained by researchers. The Java ecosystem has outgrown the combination of Java as programming language and the JVM. For example, Android uses Java as source language and the Dalvik virtual machine for execution. Languages such as Scala are compiled to Java bytecode and executed on the JVM. JTRES welcomes submissions that apply such approaches to embedded and/or real-time systems. ::Submission Requirements:: Participants are expected to submit a paper of at most 10 pages (ACM Conference Format, i.e., two-columns, 10 point font). Accepted papers will be published in the ACM International Conference Proceedings Series via the ACM Digital Library and have to be presented by one author at the JTRES. LaTeX and Word templates can be found at: http://www.acm.org/sigs/pubs/proceed/template.html The ISBN number for JTRES 2016 is TBD. Papers describing open source projects shall include a description how to obtain the source and how to run the experiments in the appendix. The source version for the published paper will be hosted at the JTRES web site. Papers should be submitted through EasyChair. Please use the submission link: https://easychair.org/conferences/?conf=jtres2016 Selected papers will be invited for submission to a special issue of the TBD. Topics of interest to this workshop include, but are not limited to: New real-time programming paradigms and language features Industrial experience and practitioner reports Open source solutions for real-time Java Real-time design patterns and programming idioms High-integrity and safety critical system support Java-based real-time operating systems and processors Extensions to the RTSJ and SCJ Real-time and embedded virtual machines and execution environments Memory management and real-time garbage collection Multiprocessor and distributed real-time Java Real-time solutions for Android Languages other than Java on real-time or embedded JVMs Benchmarks and Open Source applications using real-time Java ::Important Dates:: Paper Submission: 12 June, 2016 Notification of Acceptance: 20 July, 2016 Camera Ready Paper Due: 15 August, 2016 Workshop: 29 August - 2 September, 2016 ::Program Chair:: Martin Schoeberl, Technical University of Denmark ::Workshop Chair:: Walter Binder, University of Lugano (USI), Switzerland ::Program Committee Members:: Ethan Blanton, Fiji Systems Inc Ana Cavalcanti, University of York Peter Dibble, RTSJ M. Teres