RE: How best to handle a reasonable amount to data (25TB+)

2012-02-06 Thread Peter Miller
Thanks for the response. Actually, I am more concerned with trying to use an Object Store for the indexes. The next concern is the use of a local index versus the sharded ones, but I'm more relaxed about that now after thinking about it. I see that index shards could be up to 100 million documen

Re: Why read past EOF

2012-02-06 Thread superruiye
ok,thanks. I modify my program like you suggest.But another problem appear: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.lucene.index.TermInfosReader.seekEnum(TermInfosReader.java:203) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:273) at

Re: Need to enforce logging of Lucene queries

2012-02-06 Thread Erick Erickson
Solr already logs the queries themselves although there isn't any way that I know of to associate that with a user. Although in Solr land, it seems that whatever servlet container that you would use for Solr should be able to log all the URLs that hit the server. Best Erick On Mon, Feb 6, 2012 a

RE: recording a universal ID from DocID in a CustomScoreQuery

2012-02-06 Thread Paul Allan Hill
To complete this thread, I read the document itself with a 1 field fieldSelector, so as not to bother with anything but exactly what I needed at this point in the code (particular not the text body). Then I saved the primary key (the path) of documents that visited this CustomScoreQuery (functi

Need to enforce logging of Lucene queries

2012-02-06 Thread Charles Bearden
I have a set of Lucene indexes for which I need to log all accesses and possibly queries. I can use kernel-level auditing to record file accesses, but what would be the best approach to logging the strings for all queries against these indexes? What comes to mind is a Lucene analogy to a databa

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Will do. On Tue, Feb 7, 2012 at 12:52 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > You tell NRTCachingDirectory how much RAM it's allowed to use, and it > then caches newly flushed segments in a private RAMDirectory. > > But you should first test performance w/o it (after removing

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Michael McCandless
You tell NRTCachingDirectory how much RAM it's allowed to use, and it then caches newly flushed segments in a private RAMDirectory. But you should first test performance w/o it (after removing the commit calls). NRT is very fast... Mike McCandless http://blog.mikemccandless.com On Mon, Feb 6,

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Good point. I should remove the commits. Any difference between NRTCashingDirectory and RAMDirectory? how to define the "small"? On Tue, Feb 7, 2012 at 12:42 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > You shouldn't call IW.commit when using NRT; that's the point of NRT > (makin

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Michael McCandless
You shouldn't call IW.commit when using NRT; that's the point of NRT (making changes visible w/o calling commit). Only call commit when you require that all changes be durable (surive OS / JVM crash, power loss, etc.) on disk. Also, you can use NRTCachingDirectory which acts like RAMDirectory for

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Agree. On Mon, Feb 6, 2012 at 11:53 PM, Uwe Schindler wrote: > Hi Cheng, > > all pros and cons are explained in those articles written by Mike! As soon > as there are harddisks in the game, there is a slowdown, what do you > expect? > If you need it faster, buy SSDs! :-) > > Uwe > > - > Uwe

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
My original question is if there exists a way to configure writer when to writer to FSDirectory. I think there may be something in the IndexWriterConfig that can helps. On Mon, Feb 6, 2012 at 11:50 PM, Ian Lea wrote: > Well, yes. What would you expect? From the javadocs for > IndexWriter.commi

RE: Configure writer to write to FSDirectory?

2012-02-06 Thread Uwe Schindler
Hi Cheng, all pros and cons are explained in those articles written by Mike! As soon as there are harddisks in the game, there is a slowdown, what do you expect? If you need it faster, buy SSDs! :-) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@t

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Ian Lea
Well, yes. What would you expect? From the javadocs for IndexWriter.commit() Commits all pending changes (added & deleted documents, segment merges, added indexes, etc.) to the index, and syncs all referenced index files ... This may be a costly operation, so you should test the cost in your app

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
I meant that when I use NRTManager and use commit(), the speed is slower than when I use RAMDirectory. In my case, NRTManager instance not only perform search but update/modify indexes which should be visible to other threads. In RAMDirectory, the commit() doesn't synchronize indexes with the FSDi

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Uwe, when I meant speed is slow, I didn't refer to instant visibility of changes, but that the changes may be synchronized with FSDirectory when I use writer.commit(). When I use RAMDirectory, the writer.commit() seems much faster than using NRTManager built upon FSDirectory. So, I am guessing the

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Ian Lea
What exactly do you mean by the "speed is slower"? Time taken to update the index? Time taken for updates to become visible in search results? Time taken for searches to run on the IndexSearcher returned from SearcherManager? Something else? -- Ian. On Mon, Feb 6, 2012 at 3:27 PM, Cheng wr

RE: Configure writer to write to FSDirectory?

2012-02-06 Thread Uwe Schindler
Please review the following articles about NRT, absolutely instant updates that are visible as they are done are almost impossible (even with RAMDirectory): http://goo.gl/mzAHt http://goo.gl/5RoPx http://goo.gl/vSJ7x Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetap

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
Ian, I encountered an issue that I need to frequently update the index. The NRTManager seems not very helpful on this front as the speed is slower than RAMDirectory is used. Any improvement advice? On Mon, Feb 6, 2012 at 10:24 PM, Cheng wrote: > That really helps! I will try it out. > > Than

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
That really helps! I will try it out. Thanks. On Mon, Feb 6, 2012 at 10:12 PM, Ian Lea wrote: > You would use NRTManagerReopenThread as a standalone thread, not > plugged into your Executor stuff. It is a utility class which you > don't have to use. See the javadocs. > > But in your case I'd

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Ian Lea
You would use NRTManagerReopenThread as a standalone thread, not plugged into your Executor stuff. It is a utility class which you don't have to use. See the javadocs. But in your case I'd use it, to start with anyway. Fire it up with suitable settings and forget about it, except to call close(

Re: Custom Payload Analyzer and Query

2012-02-06 Thread Ian Lea
Not sure if you got an answer to this or not. Don't recall seeing one and gmail threading says not. > Is the use of payloads I've described appropriate? Sounds OK to me, although I'm not sure why you can't store the metadata as a Document Field. > Can I exclude/filter the matching terms based o

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Cheng
I don't understand this following portion: IndexWriter iw = new IndexWriter(whatever - some standard disk index); NRTManager nrtm = new NRTManager(iw, null); NRTManagerReopenThread ropt = new NRTManagerReopenThread(nrtm, ...); ropt.setXxx(...); ropt.start(); I have a java ExecutorServices in

Re: Configure writer to write to FSDirectory?

2012-02-06 Thread Ian Lea
If you can use NRTManager and SearcherManager things should be easy and blazingly fast rather than unbearably slow. The latter phrase is not one often associated with lucene. IndexWriter iw = new IndexWriter(whatever - some standard disk index); NRTManager nrtm = new NRTManager(iw, null); NRTMana

Re: recording a universal ID from DocID in a CustomScoreQuery

2012-02-06 Thread Ian Lea
int doc will be for the subreader, not for the entire index. oal.search.Collector has setNextReader(IndexReader reader, int docBase) which you might somehow be able to use. Failing that I'd go for FieldCache, or store the docids in a Set in a Map keyed by current Reader, if that would give you wha

Re: weightage of each word according to precedence in document

2012-02-06 Thread Ian Lea
At least it doesn't give the same score for a doc which doesn't have all the terms which I think at one point you claimed. So to try and simplify this, you've got one field called content and doc1: pqrst uvwx abcd doc2: abcd pqrst uvwx and the query "abcd^10.0 content:pqrst^5.0" gives the same s

Re: Apache Lucene file search

2012-02-06 Thread Dheeraj Kv
Hi The issue of searching file name is resolved with some modifications in SearchFiles.java . A field named path has been added in the code. String field = "path"; Also appended parser.setAllowLeadingWildcard(true) for searching leading wildcard strings, which was not available