Re: Is there a way to share IndexReader data sensibly across independent callers?

2016-02-09 Thread Michael McCandless
Can't you just call ReaderManager.close?

All in-flight operations with that RM will keep working, and the
underlying reader will only finally close once they have all finished.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Feb 9, 2016 at 12:12 AM, Trejkaz  wrote:
> On Tue, Feb 9, 2016 at 2:10 AM, Sanne Grinovero
>  wrote:
>> Hi,
>> you should really try to reuse the same opened Directory, like you
>> suggest without closing it until your application is "done" with it in
>> all its threads (normally on application shutdown).
>> Keeping a Directory open will not lead to have open files, that is
>> probably caused by not closing the instances of IndexReader.
>>
>> I'd highly recommend to use the ReaderManager for these reasons,
>> especially because handling these details across different threads
>> both correctly and efficiently can be tricky - I've learned that
>> myself when implementing similar things before the ReaderManager was
>> created.
>
> I'm already using ReaderManager, but there are issues.
>
> I want to close it when the last acquired index has been released, but
> no thread knows anything about what indexes other threads could be
> using, yet we still want indexes to be closed once nobody is using
> them. So I end up having to reference count the ReaderManager, which
> seems to defeat the purpose of using it in the first place since I
> could just reference count the reader itself. I wish it could handle
> automatically closing and reopening the index by itself, but I don't
> think it can.
>
> At the moment I have bolted this additional level of reference
> counting around ReaderManager and it just creates a new ReaderManager
> when the reference count goes back up to 1 and closes it when it goes
> back to 0. But this blob has to be synchronised to implement it safely
> and the map for looking these things up can never clean out entries,
> because I couldn't find a safe way to do that even using
> ConcurrentMap.
>
> TX
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Is there a way to share IndexReader data sensibly across independent callers?

2016-02-09 Thread Trejkaz
On Tue, Feb 9, 2016 at 7:59 PM, Michael McCandless
 wrote:
> Can't you just call ReaderManager.close?
>
> All in-flight operations with that RM will keep working, and the
> underlying reader will only finally close once they have all finished.

I guess that has the caveat that it would be possible to have two
readers open on the same directory, which is mostly what I was trying
to avoid. My current solution absolutely prevents that, at the cost of
having to synchronise when acquiring or releasing, although I can
probably use double-checked locking to reduce the impact of that.

Really what would be handy is something that resembles ReaderManager
but takes Path for every method and also opens and closes the
Directory...

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Generate Lucene segments_N file

2016-02-09 Thread khanh-lam . mai
Hello,

First, I don't know if it's the right mailing list to ask for your help, 
if no please accept my apologies for the inconvenience.

While moving Lucene (5.3) index files from a server to an other, I forgot 
to move the segments_N file (because I use the pattern *.*) 
Unfortunately I've erased the original folder, and I only have these files 
in my directory now : 

_1rpt.fdt
_1rpt.fdx
_1rpt.fnm
_1rpt.nvd
_1rpt.nvm
_1rpt.si
_1rpt_Lucene50_0.doc
_1rpt_Lucene50_0.dvd
_1rpt_Lucene50_0.dvm
_1rpt_Lucene50_0.pos
_1rpt_Lucene50_0.tim
_1rpt_Lucene50_0.tip
write.lock

I am missing the segments_42u file, and without it I cannot even do an 
org.apache.lucene.index.CheckIndex : 

Exception in thread "main" org.apache.lucene.index.IndexNotFoundException: 
no segments* file found in MMapDirectory@/solr-5.3.1
/nodes/node1/core/data/index lockFactory=org.apache.lucene.store.
NativeFSLockFactory@119d7047: files: [write.lock, _1rpt.fdt, _1rpt.fdx, 
_1rpt.fnm, _1rpt.nvd, _1rpt.nvm, _1rpt.si, _1rpt_Lucene50_0.doc, 
_1rpt_Lucene50_0.dvd, _1rpt_Lucene50_0.dvm, _1rpt_Lucene50_0.pos, 
_1rpt_Lucene50_0.tim, _1rpt_Lucene50_0.tip]
at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:483)
at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2354)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2237)

The index is pretty huge (> 800GB) and it will take weeks to rebuild it.
Is there a way to generate this missing segment info file ?

Thanks a lot for your help.


Khanh-Lam Mai
khanh-lam@bnf.fr
Exposition  De Rouge et de Noir. Les vases grecs de la collection de Luynes  - 
jusqu'au 1 er  mars 2016 - BnF - Richelieu Avant d'imprimer, pensez à 
l'environnement.


Re: Is there a way to share IndexReader data sensibly across independent callers?

2016-02-09 Thread Michael McCandless
Why do you need to close the Directory?  It should be light weight.
But if you really do need it, can't you subclass ReaderManager and
override afterClose to close the directory?

So you essentially need to "lazy close" your ReaderManager, when there
are no searches currently needing it?

Why not have a sync'd block, with a reference to the ReaderManager.
Inside that block, if the reference is null, that means it's closed,
and you open a new one.  Else, use the existing one.  Won't that do
what you need w/o requiring full fledged reference counts?  Yes, it is
a sync'd block around acquire/release, but I don't see how that can be
avoided, and it'd be fast when the ReaderManager is already opened.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Feb 9, 2016 at 6:39 AM, Trejkaz  wrote:
> On Tue, Feb 9, 2016 at 7:59 PM, Michael McCandless
>  wrote:
>> Can't you just call ReaderManager.close?
>>
>> All in-flight operations with that RM will keep working, and the
>> underlying reader will only finally close once they have all finished.
>
> I guess that has the caveat that it would be possible to have two
> readers open on the same directory, which is mostly what I was trying
> to avoid. My current solution absolutely prevents that, at the cost of
> having to synchronise when acquiring or releasing, although I can
> probably use double-checked locking to reduce the impact of that.
>
> Really what would be handy is something that resembles ReaderManager
> but takes Path for every method and also opens and closes the
> Directory...
>
> TX
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Is there a way to share IndexReader data sensibly across independent callers?

2016-02-09 Thread Trejkaz
On Wed, Feb 10, 2016 at 3:17 AM, Michael McCandless
 wrote:
> Why do you need to close the Directory?  It should be light weight.
> But if you really do need it, can't you subclass ReaderManager and
> override afterClose to close the directory?

I guess that's the next thing I'll try out. I am already subclassing
ReaderManager to wrap the readers after discovering that reader caches
don't work if I wrap from outside it.

As for the need to close it, it's true, in production we don't really
need to. But we happen to be using a Directory implementation that
checks that callers close things, and that check is only triggered
when we close the Directory at the moment. So it's just something that
adds diagnostics to help resolve other warnings about files not being
closed.

> So you essentially need to "lazy close" your ReaderManager, when there
> are no searches currently needing it?

That's more or less the right way to think about it. Actually each
session might make more than one search using the same reader and
reuse the same one for those, but when no sessions are running for
that index anymore I wanted to close it because Windows has annoying
file locking for read operations. If it weren't for Windows...

> Why not have a sync'd block, with a reference to the ReaderManager.
> Inside that block, if the reference is null, that means it's closed,
> and you open a new one.  Else, use the existing one.  Won't that do
> what you need w/o requiring full fledged reference counts?  Yes, it is
> a sync'd block around acquire/release, but I don't see how that can be
> avoided, and it'd be fast when the ReaderManager is already opened.

This is roughly what I currently have. At the moment I lock my entire
acquire() / release() methods, which maybe it's possible to reduce,
although I'm not entirely sure.

Concurrency is fiddly...

TX

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



JTRES 2016 First Call For Papers

2016-02-09 Thread ma...@dtu.dk

==

CALL FOR PAPERS

  The 14th Workshop on
   Java Technologies for Real-Time and Embedded Systems
   JTRES 2016

  Part of the
 Managed Languages & Runtimes Week 2016
   29 August - 2 September 2016
 Lugano, Switzerland

  http://jtres2016.compute.dtu.dk/


==

Submission deadline: 12 June, 2016
Submission site: https://easychair.org/conferences/?conf=jtres2016

==

Over 90% of all microprocessors are now used for real-time and embedded 
applications. Embedded devices are deployed on a broad diversity of distinct 
processor architectures and operating systems. The application software for 
many embedded devices is custom tailored if not written entirely from scratch. 
The size of typical embedded system software applications is growing 
exponentially from year to year, with many of today's embedded systems 
comprised of multiple millions of lines of code. For all of these reasons, the 
software portability, reuse, and modular composability benefits offered by Java 
are especially valuable to developers of embedded systems.

Both embedded and general purpose software frequently need to comply with 
real-time constraints. Higher-level programming languages and middleware are 
needed to robustly and productively design, implement, compose, integrate, 
validate, and enforce memory and real-time constraints along with conventional 
functional requirements for reusable software components. The Java programming 
language has become an attractive choice because of its safety, productivity, 
its relatively low maintenance costs, and the availability of well trained 
developers.

::Goal::

Interest in real-time Java by both the academic research community and 
commercial industry has been motivated by the need to manage the complexity and 
costs associated with continually expanding embedded real-time software 
systems. The goal of the workshop is to gather researchers working on real-time 
and embedded Java to identify the challenging problems that still need to be 
solved in order to assure the success of real-time Java as a technology and to 
report results and experience gained by researchers.

The Java ecosystem has outgrown the combination of Java as programming language 
and the JVM. For example, Android uses Java as source language and the Dalvik 
virtual machine for execution. Languages such as Scala are compiled to Java 
bytecode and executed on the JVM. JTRES welcomes submissions that apply such 
approaches to embedded and/or real-time systems.

::Submission Requirements::

Participants are expected to submit a paper of at most 10 pages (ACM Conference 
Format, i.e., two-columns, 10 point font). Accepted papers will be published in 
the ACM International Conference Proceedings Series via the ACM Digital Library 
and have to be presented by one author at the JTRES.

LaTeX and Word templates can be found at: 
http://www.acm.org/sigs/pubs/proceed/template.html

The ISBN number for JTRES 2016 is TBD.

Papers describing open source projects shall include a description how to 
obtain the source and how to run the experiments in the appendix. The source 
version for the published paper will be hosted at the JTRES web site.
Papers should be submitted through EasyChair. Please use the submission link:
https://easychair.org/conferences/?conf=jtres2016

Selected papers will be invited for submission to a special issue of the TBD.

Topics of interest to this workshop include, but are not limited to:

New real-time programming paradigms and language features
Industrial experience and practitioner reports
Open source solutions for real-time Java
Real-time design patterns and programming idioms
High-integrity and safety critical system support
Java-based real-time operating systems and processors
Extensions to the RTSJ and SCJ
Real-time and embedded virtual machines and execution environments
Memory management and real-time garbage collection
Multiprocessor and distributed real-time Java
Real-time solutions for Android
Languages other than Java on real-time or embedded JVMs
Benchmarks and Open Source applications using real-time Java

::Important Dates::

Paper Submission: 12 June, 2016
Notification of Acceptance: 20 July, 2016
Camera Ready Paper Due: 15 August, 2016
Workshop: 29 August - 2 September, 2016

::Program Chair::

Martin Schoeberl, Technical University of Denmark

::Workshop Chair::

Walter Binder, University of Lugano (USI), Switzerland

::Program Committee Members::

Ethan Blanton, Fiji Systems Inc
Ana Cavalcanti, University of York
Peter Dibble, RTSJ
M. Teres