Aw: RE: RE: Re: Performance StringCoding.decode

2014-08-07 Thread Sascha Janz
we use jdk 1.7.55 and lucene 4.9.0 Sascha     Gesendet: Mittwoch, 06. August 2014 um 18:11 Uhr Von: "Uwe Schindler" An: java-user@lucene.apache.org Betreff: RE: RE: Re: Performance StringCoding.decode What Java version are you using? In Java 7 decoding of bytes to strings should be fast. Uwe

improve indexing speed with nomergepolicy

2014-08-07 Thread Sascha Janz
hi, i try to speed up our indexing process. we use SeacherManager with applydeletes to get near real time Reader. we have not really "much" incoming documents, but the documents must be updated from time to time and the amount of documents to be updated could be quite large. i tried some tes

EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Bianca Pereira
Hi, I am new in the list and I have been working on a problem for some time already. I would like to know if someone has any idea of how I can solve it. Given a term, I want to get the term frequency in a lucene document. When I use the WhiteSpaceAnalyzer my code works properly but when I use

Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Shai Erera
Using NoMergePolicy for online indexes is usually not recommended. You want to use NoMP in case where you build an index in a batch job, then in the end before the index is "published" you run a forceMerge or maybeMerge (with a real MergePolicy). For online indexes, i.e. indexes that are being sea

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Jack Krupansky
Generally, the standard analyzer will be a better choice, unless you have some special need. A language-specific analyzer will include stemming. The English analyzer includes the Porter stemmer. Generally, you need to apply a compatible analyzer to query terms to match the index, or you need

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Bianca Pereira
Hi Jack, Thank you very much. I just changed for the StandardAnalyzer and it is working as I would like. But there is something I still cannot understand. If I use the same analyzer for indexing and for searching, the same term should be parsed in the same way in both moments, shouldn't it? It i

Aw: Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Sascha Janz
 many thanks for the tip with NRTCachingDirectory. didn't know that. i will try it . Sascha   Gesendet: Donnerstag, 07. August 2014 um 13:37 Uhr Von: "Shai Erera" An: "java-user@lucene.apache.org" Betreff: Re: improve indexing speed with nomergepolicy Using NoMergePolicy for online indexes i

RE: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Uwe Schindler
Hi, if you create the term yourself, it is not going through the analyzer: public int getTermFrequency(String term, String id) (you create a BytesRef out of it). So you have to also let the term go through the analyzer. The stemming analyzers change the terms, so you won't find them without als

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Jack Krupansky
Also, usually query-time analysis is done by a "query parser", so if you aren't going through a quwery parser, you have to call the aalyzer yourself. The stemming is very likely the culprit here. -- Jack Krupansky -Original Message- From: Uwe Schindler Sent: Thursday, August 7, 2014

Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term Frequency

2014-08-07 Thread Bianca Pereira
Hi Uwe, Jack, Thank you very much for your answers. I will work on it. Regards, Bianca 2014-08-07 14:04 GMT+01:00 Jack Krupansky : > Also, usually query-time analysis is done by a "query parser", so if you > aren't going through a quwery parser, you have to call the aalyzer > yourself.

Aw: Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Sascha Janz
many thanks again. this was a good tip. after switching from FSDirectory to NRTCachingDirectory queries run at double speed. Sascha     Gesendet: Donnerstag, 07. August 2014 um 14:54 Uhr Von: "Sascha Janz" An: java-user@lucene.apache.org Betreff: Aw: Re: improve indexing speed with nomergepol

Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Jon Stewart
Related, how does one change the MergePolicy on an IndexWriter (e.g., use NoMergePolicy during batch indexing, then change to something better once finished with batch)? It looks like the MergePolicy is set through IndexWriterConfig but I don't see a way to update an IWC on an IW. Thanks, Jon O

Kmob

2014-08-07 Thread craiglang44
Don't take the piss ur all at it bb n all of youze!!! Pisstaking weirdos! Sent from my BlackBerry® smartphone - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene

Aw: Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Sascha Janz
 it only could be set when opening IndexWriter IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_46, new StandardAnalyzer(Version.LUCENE_46)); iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND); iwc.setRAMBufferSizeMB(250); iwc.setMergePolicy(NoMergePolicy.INSTANCE); Directo

Re: improve indexing speed with nomergepolicy

2014-08-07 Thread Shai Erera
Yes, currently an MP isn't a "live" setting on IndexWriter, meaning you pass it at construction time and don't change it afterwards. I wonder if after LUCENE-5711 we can move MergePolicy to LiveIndexWriterConfig and fix IndexWriter to not hold on to it, but rather pull it from the config. Not sure

RE: improve indexing speed with nomergepolicy

2014-08-07 Thread Uwe Schindler
This is a good idea, because sometimes it's nice to change the MergePolicy on the fly without reopening! One example is https://issues.apache.org/jira/browse/LUCENE-5526 In my case, I would like to open an IndexWriter, set its merge policy to IndexUpdaterMergePolicy, force a merge to upgrade all

Determining Relevancy/Cutoff

2014-08-07 Thread dfl
What is the best way to determine relevancy and the cutoff of results to show? So the system I'm working on right now involves searching the inventory and returning the results. Each result must be reviewed by an employee to determine whether it is a true match. Obviously, we want to minimize the

Lucene installation

2014-08-07 Thread Fatemeh Lashkari
I want to install lucene-4.9.0. I create new java project in my eclipse and I import core/src/java to my project then import demo/src/java. But, I got foollwing error what should I do? import org.apache.lucene.analysis.standard.StandardAnalyzer; not regonize import org.apache.lucene.queryparser.

RE: Lucene installation

2014-08-07 Thread #LI JUN#
Hi Fatemeh There are multiple libraries in lucene, you have to import necessary ones for your need. As far as I know, org.apache.lucene.analysis.standard.StandardAnalyzer is in the analysis project. You try to add that one and try. Jun From: Fatemeh La

BooleanWeight.scorer() gives a TermScorer

2014-08-07 Thread Christian Reuschling
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, I try to get the scorer for a result document, for further computation. List leafContexts = indexReader.leaves(); int n = ReaderUtil.subIndex(scoreDoc.doc, leafContexts); AtomicReaderContext ctx = leafContexts.get(n); Scorer scorer = weight.sc

Re: BooleanWeight.scorer() gives a TermScorer

2014-08-07 Thread Robert Muir
This can happen in some cases: for example if you are doing a disjunction of "foo" and "bar" with coordination factor disabled, and the segment has no postings for "bar". In this case the optimum scorer to return is just a termscorer for "foo". On Thu, Aug 7, 2014 at 12:42 PM, Christian Reuschlin