Re: Re-creating IndexSearcher after update
Luc, I tried adding your DelayCloseIndexSearcher to my project (a Tomcat app where the index is repeatedly searched and frequently updated) and as soon as an index modify occurs (by a separate thread) and I call closeWhenDone() in the main thread I get IllegalStateException("closeWhenDone() already called"). The Exception is thrown for every subsequent search attempt. Any ideas? Thanks, Nick. Vanlerberghe, Luc wrote: > Yep, > > I created DelayCloseIndexSearcher just for this scenario and it's > running in production for about half a year now... > > There's an usage example in the javadoc, but it can be optimised even > more (without touching the code that does the searches, handles the > hits, etc...). > > In my production environment, isCurrent() is called in a separate > thread. If it returns false, a new DelayCloseIndexSearcher instance is > created, some warming up is done and only then the existing one is > replaced and closeWhenDone is called on it. > > Luc > > -Original Message- > From: Koji Sekiguchi [mailto:[EMAIL PROTECTED] > Sent: dinsdag 21 maart 2006 9:24 > To: java-user@lucene.apache.org > Subject: RE: Re-creating IndexSearcher after update > > Hi Steve, > > DelayCloseIndexSearcher may suit your requirement? > > Please check: > http://issues.apache.org/jira/browse/LUCENE-445 > > Hope this helps. > > Koji > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
PhraseQuery with synonyms or having n tokens at the same tokenposition.
Hi, PhraseQuery is not working as I wanted,when indexed with synonyms. ex: I have indexed name: "sony dsc-d cybershot" as following tokens provided token positions. 1: [sony:0->4] 2: [dsc:5->10] 3: [dscd:5->10] 4: [d:5->10] 5: [cybershot:11->20] So "dsc-d" is tokenized into 3 tokens "dsc", "dscd" and "d" at the same token location. Indexing part is ok. But the problem is with searching. PhraseQuery "dsc cybershot" is not returning any results. Because "dsc" & "cybershot" are not 0-spanned(But my imagination is they are). I could increase the span at search time. But it is not fitting well into our needs.It is also hard to decide the maximus span in our case and also returned results are different, which I don't want. Thanks in advance, Jelda -"Impossible is Nothing"--- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
add word filtering?
Hi all I'm really new to lucene. In fact I just found it when i googled a few days ago. Never thought that java have this kind of excellent library for free. I would like to ask a few questions, which is where to add if we would like to filter certain text from being searched, and filter certain results from being displayed, or display alternative result for filtered results when we're using lucent? Instead of just editing the resutls .jsp page (from the demo) is there any better way? Any information is greatly appreciated. Thanks in advance
RE: add word filtering?
Are you asking that common words not be searched? For this, you can use StopFilter to prevent words from being indexed and searched. Alternatively, you can use StandardAnalyzer, which in addition to removing stop words also does more sophisticated tokenizing. Venu -Original Message- From: abdul muhaimin [mailto:[EMAIL PROTECTED] Sent: Monday, March 27, 2006 3:13 PM To: java-user@lucene.apache.org Subject: add word filtering? Hi all I'm really new to lucene. In fact I just found it when i googled a few days ago. Never thought that java have this kind of excellent library for free. I would like to ask a few questions, which is where to add if we would like to filter certain text from being searched, and filter certain results from being displayed, or display alternative result for filtered results when we're using lucent? Instead of just editing the resutls .jsp page (from the demo) is there any better way? Any information is greatly appreciated. Thanks in advance - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Get All Entries
Hello Everyone, I have 6000 Entries in my Lucene DB and if I search for entries with "00*" in the Number-Field it works fine. But additional I must have alle entries no matter which number they have. A Term like "*" doesn't work. How can I get all entries? The code of my search is: IndexSearcher is = new IndexSearcher( INDEX_DIR ); QueryParser parser = new QueryParser( "number", analyzer); Query query = parser.parse("00*"); Hits hits = is.search( query, new Sort("number") ); Thanks for your help Stefan H -- View this message in context: http://www.nabble.com/Get-All-Entries-t1348226.html#a3606783 Sent from the Lucene - Java Users forum at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Get All Entries
I believe there's a MatchAllDocsQuery class from Lucene 1.9 onwards. You can run this query to get all documents. If you are not using 1.9, to my knowledge, you would have to add a redundant field that would true for all documents and query on that field. Something like Field.Keyword("AllDocsTrue", "true") and add this to your doc. You can run the query AllDocsTrue:true to get all your docs. Venu -Original Message- From: StefanH [mailto:[EMAIL PROTECTED] Sent: Monday, March 27, 2006 3:24 PM To: java-user@lucene.apache.org Subject: Get All Entries Hello Everyone, I have 6000 Entries in my Lucene DB and if I search for entries with "00*" in the Number-Field it works fine. But additional I must have alle entries no matter which number they have. A Term like "*" doesn't work. How can I get all entries? The code of my search is: IndexSearcher is = new IndexSearcher( INDEX_DIR ); QueryParser parser = new QueryParser( "number", analyzer); Query query = parser.parse("00*"); Hits hits = is.search( query, new Sort("number") ); Thanks for your help Stefan H -- View this message in context: http://www.nabble.com/Get-All-Entries-t1348226.html#a3606783 Sent from the Lucene - Java Users forum at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Get All Entries
It works perfect. After installation of 1.9. I've the MatchAllDocsQuery. Thanks! -- View this message in context: http://www.nabble.com/Get-All-Entries-t1348226.html#a3610840 Sent from the Lucene - Java Users forum at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Phrase Query query
Hi I'm using PhraseQuery in conjunction with WhiteSpaceAnalyzer but it's giving me slightly unusual results. If I have a text file containing the text (quotes are just for clarity): "Hello this is some text" I don't find any results when I search. But if I put spaces before and after the phrase: " Hello this is some text " then it does work. I'm breaking the phrase down into Terms, and setting the slop to '0' by the way. I'm kind of see that this makes sense, given the name: WhiteSpaceAnalyzer. But aren't newlines, carriage-returns etc also treated as whitespace? Thanks for your help! Regards Richard Gundersen Honda UK - ISD Tel: +44 (0)1753 590681 ** This email is confidential and intended solely for the use of the individual to whom it is addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of Honda Motor Europe Ltd. or any of its group of companies. If you are not the intended recipient, be advised that you have received this email in error and that any use, dissemination, forwarding, printing or copying of this email is strictly prohibited. Visit our website: http://www.honda.co.uk ** - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: span query scoring vs boolean query scoring
Vincent Le Maout wrote: I am missing something ? Is it intented or is it a bug ? Looks like a bug. Can you submit a patch? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: span query scoring vs boolean query scoring
Vincent Le Maout wrote: I am missing something ? Is it intented or is it a bug ? Looks like a bug. Can you please submit a bug report, and, ideally, attach a patch? Thanks, Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene indexing on Hadoop distributed file system
Igor Bolotin wrote: If somebody is interested - I can post our changes in TermInfosWriter and SegmentTermEnum code, although they are pretty trivial. Please submit this as a patch attached to a bug report. I contemplated making this change to Lucene myself, when writing Nutch's FsDirectory, but thought that no one else would ever be interested in using it. Now that's been proven wrong! Note that any change to the file format must be back-compatible. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Does Optimize preserve index order?
On 3/24/06, chan kang <[EMAIL PROTECTED]> wrote: > What I want to do is to show the results in > chronological order. (btw, the index contains the time field) > One solution I have thought up was: > 1. index the whole set > 2. read in all the time field values > 3. re-index the whole set according to time >(heard that the index order is same as insertion order) > 4. optimize. > > > However, although I think the step 3 would result > in a sorted index, isn't there a possibility that > step 4 might ruin all the sortedness? > - Wouldn't optimizing break the order in which they > are indexed? Index order is retained, so your plan should work fine. How long is sorting actually taking? FYI, the first time you sort on a field will take much longer because a fieldcache entry must be populated. -Yonik http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Phrase Query query
Richard, WhitespaceTokenizer (the tokenizer that WhitespaceAnalyzer uses) really just tokenizes on space characters: /** Collects only characters which do not satisfy * [EMAIL PROTECTED] Character#isWhitespace(char)}.*/ protected boolean isTokenChar(char c) { return !Character.isWhitespace(c); } Otis - Original Message From: Richard Gunderson <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Monday, March 27, 2006 10:56:18 AM Subject: Phrase Query query Hi I'm using PhraseQuery in conjunction with WhiteSpaceAnalyzer but it's giving me slightly unusual results. If I have a text file containing the text (quotes are just for clarity): "Hello this is some text" I don't find any results when I search. But if I put spaces before and after the phrase: " Hello this is some text " then it does work. I'm breaking the phrase down into Terms, and setting the slop to '0' by the way. I'm kind of see that this makes sense, given the name: WhiteSpaceAnalyzer. But aren't newlines, carriage-returns etc also treated as whitespace? Thanks for your help! Regards Richard Gundersen Honda UK - ISD Tel: +44 (0)1753 590681 ** This email is confidential and intended solely for the use of the individual to whom it is addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of Honda Motor Europe Ltd. or any of its group of companies. If you are not the intended recipient, be advised that you have received this email in error and that any use, dissemination, forwarding, printing or copying of this email is strictly prohibited. Visit our website: http://www.honda.co.uk ** - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene indexing on Hadoop distributed file system
Doug Cutting wrote: Igor Bolotin wrote: If somebody is interested - I can post our changes in TermInfosWriter and SegmentTermEnum code, although they are pretty trivial. Please submit this as a patch attached to a bug report. I contemplated making this change to Lucene myself, when writing Nutch's FsDirectory, but thought that no one else would ever be interested in using it. Now that's been proven wrong! Note that any change to the file format must be back-compatible. This could be solved by putting a marker value in the first 8 bytes (== -1L), which would indicate that the real length is at the end. This way the new implementation will be able to read old indexes. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: PhraseQuery with synonyms or having n tokens at the same tokenposition.
On Montag 27 März 2006 11:17, Ramana Jelda wrote: > I have indexed name: "sony dsc-d cybershot" as following tokens provided > token positions. > 1: [sony:0->4] > > 2: [dsc:5->10] > > 3: [dscd:5->10] > > 4: [d:5->10] > > 5: [cybershot:11->20] If the first number is the token position, the tokens "dsc", "dscd", and "d" are obviously *not* at the same position. You need to call setPositionIncrement(0) to add a token at the same position during indexing. If that doesn't help, please provide a small test case that shows the problem. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: delte documents into index
On Samstag 25 März 2006 00:39, Tom Hill wrote: > IndexModifier won't work > in multithreaded scenario, at least as far as I can tell. Yes it does, but you need to use one IndexModifier object from all classes (see the javadoc). Regards Daniel I stand corrected (after going back and reading the code more carefully ;-). Thanks, Tom - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene indexing on Hadoop distributed file system
Does it make sense to change TermInfosWriter.FORMAT in the patch? Igor On 3/27/06, Doug Cutting <[EMAIL PROTECTED]> wrote: > > Igor Bolotin wrote: > > If somebody is interested - I can post our changes in TermInfosWriter > and > > SegmentTermEnum code, although they are pretty trivial. > > Please submit this as a patch attached to a bug report. > > I contemplated making this change to Lucene myself, when writing Nutch's > FsDirectory, but thought that no one else would ever be interested in > using it. Now that's been proven wrong! > > Note that any change to the file format must be back-compatible. > > Doug > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >
to OR or not
Hi everybody, I am using lucene in almost every web application I am working on. It's simply a great software. I have developed an advanced search with Lucene 1.4. Now I am looking for developing a fuzzy search i.e get one search string from the user and search across all fields of member documents. I can think of two options : - form a OR query using given search string for all fields - add one more field ( say keyword ) to the member document with all information of the user. - Are there any other options ? - Which will be a better option for the system which has arround one million documents each having 20 fields and performance is a major concern? thanks Amol Sent via the WebMail system at mail.synechron.com Mail Disclaimer: This e-mail and any files transmitted with it are confidential and the views expressed in the same are not necessarily the views of Synechron, and its Directors, Management or Employees. This communication represents the originator's personal views and opinions. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, be advised that you have received this e-mail by error, and that any use, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. You shall be under obligation to keep the contents of this e-mail, strictly confidential and shall not disclose, disseminate or divulge the same to any Person, Company, Firm or Entity. Even though Synechron uses up-to-date virus checking software to scan it's emails please ensure you have adequate virus protection before you open or detach any documents from this transmission. Synechron does not accept any liability for viruses or vulnerabilities. The rights to monitor all e-mail communication through our network are reserved with us. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene indexing on Hadoop distributed file system
Igor Bolotin wrote: Does it make sense to change TermInfosWriter.FORMAT in the patch? Yes. This should be updated for any change to the format of the file, and this certainly constitutes a format change. This discussion should move to [EMAIL PROTECTED] Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
How to write to and read from the same index
I'm using Lucene running on Tomcat to index a large amount of email data and as the indexer runs through the mailbox creating, merging and deleting documents it does lots of searches at the same time to see if the document exists. Actually all my modification operations are done "in batch" every x seconds or so. This seems to cause me lots of problems. It believe it is not possible to keep a single Searcher open while the index is being modified so the only way is to detect the index changes, close the old one and create a new one. However, doing this causes the number of file handles to grow beyond the max allowed by the system. I have tried using Luc's DelayCloseIndexSearcher with his Factory example but as my index is modified frequently this causes lots of new DelayCloseIndexSearcher objects. The way it calls close on them when there are no more usages doesn't seem to keep the number of file handles down, they just grow. I would expect close to release file handles to the system when nothing is using the object (I even set it explicitly to null) but this does not happen. If this problem makes sense, has anyone else faced it, and does anyone have a solution? Cheers, Nick. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: add word filtering?
No. I'm sorry I didn't convey my question very well. Anyway thanks a lot for the info. What I really meant is, I want to filter out some words like for example, "violence" & "hatred" from the search engine results. Consequently lucene will display some alternative results for the above attempted search, such as "Peace to the world." instead of the searched "violence". How can I do it? On 3/27/06, Satuluri, Venu_Madhav <[EMAIL PROTECTED]> wrote: > > Are you asking that common words not be searched? For this, you can use > StopFilter to prevent words from being indexed and searched. > Alternatively, you can use StandardAnalyzer, which in addition to > removing stop words also does more sophisticated tokenizing. > > Venu > > -Original Message- > From: abdul muhaimin [mailto:[EMAIL PROTECTED] > Sent: Monday, March 27, 2006 3:13 PM > To: java-user@lucene.apache.org > Subject: add word filtering? > > > Hi all > > I'm really new to lucene. In fact I just found it when i googled a few > days > ago. Never thought that java have this kind of excellent library for > free. > > I would like to ask a few questions, which is where to add if we would > like > to filter certain text from being searched, and filter certain results > from > being displayed, or display alternative result for filtered results when > we're using lucent? Instead of just editing the resutls .jsp page (from > the > demo) is there any better way? > > Any information is greatly appreciated. > > Thanks in advance > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >