hit exception flushing segment _0 - IndexWriter configuration
Hi I am currently building an application whereby there is a remote index server (yes it probably does sound like Solr :)) and users use my API to send documents to the indexing server for indexing. The 2 methods primarily used is add and commit. So the user can send requests for documents to be added to the index and then can call commit. I did a test where i simulated a user calling the add method 10 times and then in a separate method call invoked commit. The thing I noticed when i turned the verbose setting for the IndexWriter was: hit exception flushing segment _0 It may be worth mention the settings I have for my index writer: mergeFactor ="100" maxMergeDocs = "999" When i use my api to add 102 documents and then in a separate method call invoke a commit I get no exception. So I was wondering what is the best setting for the mergeFactor, and should i be experiencing this exception after requesting a commit after adding 10 documents to the index? Any help would be appreciated. Thanks Amin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: hit exception flushing segment _0 - IndexWriter configuration
Hi Apologies for re sending this email but I was just wondering if any one might be able to advise on the below. I'm not sure if I've provided enough info. Again any help would be appreciated. Amin Sent from my iPhone On 1 Aug 2010, at 20:00, Amin Mohammed-Coleman wrote: > Hi > > I am currently building an application whereby there is a remote index server > (yes it probably does sound like Solr :)) and users use my API to send > documents to the indexing server for indexing. The 2 methods primarily used > is add and commit. So the user can send requests for documents to be added to > the index and then can call commit. I did a test where i simulated a user > calling the add method 10 times and then in a separate method call invoked > commit. The thing I noticed when i turned the verbose setting for the > IndexWriter was: > > hit exception flushing segment _0 > > It may be worth mention the settings I have for my index writer: > > mergeFactor ="100" > maxMergeDocs = "999" > > > When i use my api to add 102 documents and then in a separate method call > invoke a commit I get no exception. So I was wondering what is the best > setting for the mergeFactor, and should i be experiencing this exception > after requesting a commit after adding 10 documents to the index? > > > Any help would be appreciated. > > > Thanks > Amin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: hit exception flushing segment _0 - IndexWriter configuration
Somewhat embarrassingly I can't seem to reproduce the problem anymore! I've tried to reproduce it for the last hour now and no luck. Sorry about that. If it happens again then I'll post back to the list. Thanks for your time. Amin On 3 Aug 2010, at 22:35, Michael McCandless wrote: > Can you post the full exception? And also the log output from > IndexWriter.setInfoStream. > > Mike > > On Tue, Aug 3, 2010 at 5:28 PM, Amin Mohammed-Coleman > wrote: >> Hi >> >> Apologies for re sending this email but I was just wondering if any one >> might be able to advise on the below. I'm not sure if I've provided enough >> info. >> >> Again any help would be appreciated. >> >> Amin >> >> Sent from my iPhone >> >> On 1 Aug 2010, at 20:00, Amin Mohammed-Coleman wrote: >> >>> Hi >>> >>> I am currently building an application whereby there is a remote index >>> server (yes it probably does sound like Solr :)) and users use my API to >>> send documents to the indexing server for indexing. The 2 methods >>> primarily used is add and commit. So the user can send requests for >>> documents to be added to the index and then can call commit. I did a test >>> where i simulated a user calling the add method 10 times and then in a >>> separate method call invoked commit. The thing I noticed when i turned >>> the verbose setting for the IndexWriter was: >>> >>> hit exception flushing segment _0 >>> >>> It may be worth mention the settings I have for my index writer: >>> >>> mergeFactor ="100" >>> maxMergeDocs = "999" >>> >>> >>> When i use my api to add 102 documents and then in a separate method call >>> invoke a commit I get no exception. So I was wondering what is the best >>> setting for the mergeFactor, and should i be experiencing this exception >>> after requesting a commit after adding 10 documents to the index? >>> >>> >>> Any help would be appreciated. >>> >>> >>> Thanks >>> Amin >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Batch Operation and Commit
Hi I have a list of batch tasks that need to be executed. Each batch contains 1000 documents and basically I use a RAMDirectory based index writer, and at the end of adding 1000 documents to the memory i perform the following: ramWriter.commit(); indexWriter.addIndexesNoOptimize(ramWriter.getDirectory()); ramWriter.close(); Do I then need to explicitly do an indexWriter.commit()? It seems as though if I don't do an explicit commit the documents aren't added to the index (I've inspected via Luke). I would've thought that the indexWriter.addIndexesNoOptimize would not require me to call the commit explicitly. Is this a correct assumption? or should i call the commit explicitly for my disk based index writer? The main idea behind this is that each batch can be executed in a seperate thread and there is only on shared index writer. Any help would be appreciated. Thanks Amin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Batch Operation and Commit
Hi Erick Thanks for your response. I used the Lucene in Action 1st edition as a reference for batch indexing. I've just got my copy of the 2nd edition which mentions that there is no point in using RAM directory. Not saying I don't trust you :). I'll update my code to use the normal fs directory for batch. Thanks Amin On 26 Aug 2010, at 19:33, Erick Erickson wrote: > I'm going to sidestep your question and ask why you're using > a RAMDirectory in the first place. People often think it'll > speed up their indexing because it's in RAM, but the > normal FS-based indexing caches in RAM too, and you > can use various settings governing segments, ramusage > etc. to control how often you flush to disk. So unless you're > certain you need to, I'd just forget the whole RAM thing . > > You must close your indexwriter OR commit the changes > before you can see your changes, see IndexWriter.close/commit > > Best > Erick > > > > On Thu, Aug 26, 2010 at 10:42 AM, Amin Mohammed-Coleman > wrote: > >> Hi >> >> >> I have a list of batch tasks that need to be executed. Each batch contains >> 1000 documents and basically I use a RAMDirectory based index writer, and at >> the end of adding 1000 documents to the memory i perform the following: >> >> ramWriter.commit(); >> indexWriter.addIndexesNoOptimize(ramWriter.getDirectory()); >> ramWriter.close(); >> >> >> >> Do I then need to explicitly do an indexWriter.commit()? It seems as >> though if I don't do an explicit commit the documents aren't added to the >> index (I've inspected via Luke). I would've thought that the >> indexWriter.addIndexesNoOptimize would not require me to call the commit >> explicitly. Is this a correct assumption? or should i call the commit >> explicitly for my disk based index writer? >> >> The main idea behind this is that each batch can be executed in a seperate >> thread and there is only on shared index writer. >> >> Any help would be appreciated. >> >> >> Thanks >> Amin >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
TermRangeQuery
Hi All I was wondering whether I can use TermRangeQuery for my use case. I have a collection of ids (represented as XDF-123) and I would like to do a search for all the ids (might be in the range of 1) and for each matching id I want to get the corresponding data that is stored in the index (for example the document contains id and string value). I am using a custom collector to collect that string value for each match. Is it ok to use a TermRangeQuery for the ids rather than creating a massive query string? Thanks Amin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: TermRangeQuery
Hi Basically test my ids look like: AAA-231 AAD-234 ADD-123 Didn't now about the collator, i was going to do a custom sort based on the number part of the id. Thanks Amin On 26 Nov 2010, at 14:39, Ian Lea wrote: > Absolutely, as long as your ids will sort as you expect. > > I'm not clear what you mean by XDF-123 but if you've got > > AAA-123 > AAA-124 > ... > ABC-123 > ABC-234 > etc. > > then you'll be fine. If they don't sort so neatly you can use the > TermRangeQuery constructor that takes a Collator but note the > performance warning in the javadocs. > > > -- > Ian. > > > On Fri, Nov 26, 2010 at 2:18 PM, Amin Mohammed-Coleman > wrote: >> Hi All >> >> I was wondering whether I can use TermRangeQuery for my use case. I have a >> collection of ids (represented as XDF-123) and I would like to do a search >> for all the ids (might be in the range of 1) and for each matching id I >> want to get the corresponding data that is stored in the index (for example >> the document contains id and string value). I am using a custom collector >> to collect that string value for each match. Is it ok to use a >> TermRangeQuery for the ids rather than creating a massive query string? >> >> >> Thanks >> Amin >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: TermRangeQuery
Hi Unfortunately my range query approach did not work. It seems to be related to the ids themselves. The list has ids that look this: ID-NYC-1234 ID-LND-1234 TX-NYC-1334 TX-NYC-BBC-123 The ids may range from 90 to 1000. Is there another approach I could take? I tried building a string with all the ids and set them against a field for example: dataId: ID-NYC-123 dataId: ID-NYC-1234 but that's not a great approach I know... any help would be appreciated. Thanks Amin On 26 Nov 2010, at 14:39, Ian Lea wrote: > Absolutely, as long as your ids will sort as you expect. > > I'm not clear what you mean by XDF-123 but if you've got > > AAA-123 > AAA-124 > ... > ABC-123 > ABC-234 > etc. > > then you'll be fine. If they don't sort so neatly you can use the > TermRangeQuery constructor that takes a Collator but note the > performance warning in the javadocs. > > > -- > Ian. > > > On Fri, Nov 26, 2010 at 2:18 PM, Amin Mohammed-Coleman > wrote: >> Hi All >> >> I was wondering whether I can use TermRangeQuery for my use case. I have a >> collection of ids (represented as XDF-123) and I would like to do a search >> for all the ids (might be in the range of 1) and for each matching id I >> want to get the corresponding data that is stored in the index (for example >> the document contains id and string value). I am using a custom collector >> to collect that string value for each match. Is it ok to use a >> TermRangeQuery for the ids rather than creating a massive query string? >> >> >> Thanks >> Amin >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: TermRangeQuery
Essentially I'd like to construct a query which is almost like SQL in clause. The lucene document contains the id and a string value. I'd like to get the string value based on the id key. The ids may range within 1000. Is this possible to do? Thanks Amin Sent from my iPhone On 26 Nov 2010, at 20:18, Ian Lea wrote: > What sort of ranges are you trying to use? Maybe you could store a > separate field, just for these queries, with some normalized form of > the ids, with all numbers padded out to the same length etc. > > -- > Ian. > > On Fri, Nov 26, 2010 at 4:34 PM, Amin Mohammed-Coleman > wrote: >> Hi >> >> Unfortunately my range query approach did not work. It seems to be related >> to the ids themselves. The list has ids that look this: >> >> >> ID-NYC-1234 >> ID-LND-1234 >> TX-NYC-1334 >> TX-NYC-BBC-123 >> >> The ids may range from 90 to 1000. Is there another approach I could take? >> I tried building a string with all the ids and set them against a field for >> example: >> >> dataId: ID-NYC-123 dataId: ID-NYC-1234 >> >> but that's not a great approach I know... >> >> any help would be appreciated. >> >> Thanks >> Amin >> >> >> >> On 26 Nov 2010, at 14:39, Ian Lea wrote: >> >>> Absolutely, as long as your ids will sort as you expect. >>> >>> I'm not clear what you mean by XDF-123 but if you've got >>> >>> AAA-123 >>> AAA-124 >>> ... >>> ABC-123 >>> ABC-234 >>> etc. >>> >>> then you'll be fine. If they don't sort so neatly you can use the >>> TermRangeQuery constructor that takes a Collator but note the >>> performance warning in the javadocs. >>> >>> >>> -- >>> Ian. >>> >>> >>> On Fri, Nov 26, 2010 at 2:18 PM, Amin Mohammed-Coleman >>> wrote: >>>> Hi All >>>> >>>> I was wondering whether I can use TermRangeQuery for my use case. I have >>>> a collection of ids (represented as XDF-123) and I would like to do a >>>> search for all the ids (might be in the range of 1) and for each >>>> matching id I want to get the corresponding data that is stored in the >>>> index (for example the document contains id and string value). I am using >>>> a custom collector to collect that string value for each match. Is it ok >>>> to use a TermRangeQuery for the ids rather than creating a massive query >>>> string? >>>> >>>> >>>> Thanks >>>> Amin >>>> - >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: TermRangeQuery
Hi I'll explain my use case more and then explain the out come of my implementation: I have lucene documents that look like this: Field name Field Value dataId TYX-CC-124 categoryCATEGORY A What I would like to do is for a given collection of dataIds I'd like to get it's corresponding category. The collection of ids that will be passed into my method will vary. Also the id prefix (TYX-CC) will be different for different groups invoking my method. For example I may have ids that look like below: TX-CC-124 AVC-FF-124 and so on. So I tried sorting the list of ids before creating the range query based on the numeric part of the id. This did not work as the number of ids returned from the query was greater than the input ids. You mentioned padding the number part of the ids but will that work in the case of the following: aa-01 bb-01 If pass in aa-01 as the lower range, the query will return the result for bb-01 as well (unless I have mis understood the usage of the range query). In order to get things moving i decided to create a boolean query and essentially batch the queries to avoid hitting too many clause exception. So for each 1000 ids I create a boolean query with the ids being passed in. This may not be the best approach but I can't seem to get my head around how the range query can be used considering the numeric part of the id is essentially not unique. Thanks Amin On 28 Nov 2010, at 18:19, Erick Erickson wrote: > Why won't Ian's suggestion work? You haven't really given us a clue what it > is about > your attempt that didn't work. The expected and actual output would be > useful... > > But Ian's notion is the well-known issue that lexical and numeric sorting > aren't > at all the same. You'd get reasonable results if you left-padded the number > portion of the IDs with 0 out to 4 spaces, thus > aa-90 -> aa-0090 > aa-123 -> aa-0123 > aa-1000 > aa-1000 > > and your range queries should work. You might have to transform them > back when displayed. Or you could add them to your document twice. > Once in a "hidden" field, the one you searched against in your range query > and the other to display. This latter wouldn't bloat your index (much) since > you would store one and index the other > > Best > Erick > > On Fri, Nov 26, 2010 at 5:01 PM, Amin Mohammed-Coleman > wrote: > >> Essentially I'd like to construct a query which is almost like SQL in >> clause. The lucene document contains the id and a string value. I'd like to >> get the string value based on the id key. The ids may range within 1000. Is >> this possible to do? >> >> Thanks >> Amin >> >> Sent from my iPhone >> >> On 26 Nov 2010, at 20:18, Ian Lea wrote: >> >>> What sort of ranges are you trying to use? Maybe you could store a >>> separate field, just for these queries, with some normalized form of >>> the ids, with all numbers padded out to the same length etc. >>> >>> -- >>> Ian. >>> >>> On Fri, Nov 26, 2010 at 4:34 PM, Amin Mohammed-Coleman >> wrote: >>>> Hi >>>> >>>> Unfortunately my range query approach did not work. It seems to be >> related to the ids themselves. The list has ids that look this: >>>> >>>> >>>> ID-NYC-1234 >>>> ID-LND-1234 >>>> TX-NYC-1334 >>>> TX-NYC-BBC-123 >>>> >>>> The ids may range from 90 to 1000. Is there another approach I could >> take? I tried building a string with all the ids and set them against a >> field for example: >>>> >>>> dataId: ID-NYC-123 dataId: ID-NYC-1234 >>>> >>>> but that's not a great approach I know... >>>> >>>> any help would be appreciated. >>>> >>>> Thanks >>>> Amin >>>> >>>> >>>> >>>> On 26 Nov 2010, at 14:39, Ian Lea wrote: >>>> >>>>> Absolutely, as long as your ids will sort as you expect. >>>>> >>>>> I'm not clear what you mean by XDF-123 but if you've got >>>>> >>>>> AAA-123 >>>>> AAA-124 >>>>> ... >>>>> ABC-123 >>>>> ABC-234 >>>>> etc. >>>>> >>>>> then you'll be fine. If they don't sort so neatly you can use the >>>>> TermRangeQuery constructor that takes a Collator but note the >>>>> p
Wildcard Case Sensitivity
Hi Apologies up front if this question has been asked before. I have a document which contains a field that stores an untokenized value such as TEST_TYPE. The analyser used is StandardAnalyzer and I pass the same analyzer into the query. I perform the following query : fieldName:TEST_*, however this does not return any results. Is this the expected behaviour? Can I use capital letters in my wildcard query or do i need to do some processing before passing it to the query parser? Any help would be appreciated. Thanks Amin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Field Not Present In Document
Hi I have the following situation: Document document = new Document(); String body ="This is a body of document"; Field field = new Field("body", body, Field.Store.YES, Field.Index.ANALYZED); document.add(field); String id ="1234"; Field idField = new Field("id", id, Field.Store.YES, Field.Index.ANALYZED); document.add(idField); rtfIndexer.add(document); System.out.println(document.getFields()); When I print the fields of the document I get the following: stored/uncompressed,indexed,tokenizeddocument>, stored/uncompressed,indexed,tokenized] The RtfIndexer looks like this: public void add(Document document) { IndexWriter rtfIndexWriter = IndexWriterFactory.createIndexWriter(rtfDirectory, analyzer); try { rtfIndexWriter.addDocument(document); LOGGER.debug("Added Document:" + document + " to index"); commitAndOptimise(rtfIndexWriter); } catch (CorruptIndexException e) { throw new IllegalStateException(e); } catch (IOException e) { throw new IllegalStateException(e); } } private void commitAndOptimise(IndexWriter rtfIndexWriter) throws CorruptIndexException,IOException { LOGGER.debug("Committing document and closing index writer"); rtfIndexWriter.optimize(); rtfIndexWriter.commit(); rtfIndexWriter.close(); } However I load the Document using the below code: Directory directory = ((RtfIndexer)rtfIndexer).getDirectory(); IndexReader indexReader = IndexReader.open(directory); Document documentFromIndex = indexReader.document(1); System.out.println(documentFromIndex.getFields()); I get: [stored/uncompressed,indexed,tokenized] It seems as though id field is not being stored in the Index...I can't understand why not as I can have added it to the document, I would be grateful if anyone could help! Cheers Amin P.S. Merry Christmas!
Fwd: Field Not Present In Document
Begin forwarded message: From: Amin Mohammed-Coleman Date: 26 December 2008 20:19:02 GMT To: java-user@lucene.apache.org Subject: Field Not Present In Document Hi I have the following situation: Document document = new Document(); String body ="This is a body of document"; Field field = new Field("body", body, Field.Store.YES, Field.Index.ANALYZED); document.add(field); String id ="1234"; Field idField = new Field("id", id, Field.Store.YES, Field.Index.ANALYZED); document.add(idField); rtfIndexer.add(document); System.out.println(document.getFields()); When I print the fields of the document I get the following: stored/uncompressed,indexed,tokenizeddocument>, stored/uncompressed,indexed,tokenized] The RtfIndexer looks like this: public void add(Document document) { IndexWriter rtfIndexWriter = IndexWriterFactory.createIndexWriter(rtfDirectory, analyzer); try { rtfIndexWriter.addDocument(document); LOGGER.debug("Added Document:" + document + " to index"); commitAndOptimise(rtfIndexWriter); } catch (CorruptIndexException e) { throw new IllegalStateException(e); } catch (IOException e) { throw new IllegalStateException(e); } } private void commitAndOptimise(IndexWriter rtfIndexWriter) throws CorruptIndexException,IOException { LOGGER.debug("Committing document and closing index writer"); rtfIndexWriter.optimize(); rtfIndexWriter.commit(); rtfIndexWriter.close(); } However I load the Document using the below code: Directory directory = ((RtfIndexer)rtfIndexer).getDirectory(); IndexReader indexReader = IndexReader.open(directory); Document documentFromIndex = indexReader.document(1); System.out.println(documentFromIndex.getFields()); I get: [stored/uncompressed,indexed,tokenizeddocument>] It seems as though id field is not being stored in the Index...I can't understand why not as I can have added it to the document, I would be grateful if anyone could help! Cheers Amin P.S. Merry Christmas!
Re: Field Not Present In Document
Hi Thanks for your reply. It turns out you were correct and I was not loading the correct document. User error! Cheers Amin On 28 Dec 2008, at 19:50, Grant Ingersoll wrote: How do you know that document in question has an id of 1, as in when you do: Document documentFromIndex = indexReader.document(1) I would fire up Luke (http://www.getopt.org/luke) against your index and see what is inside of it. On Dec 26, 2008, at 3:19 PM, Amin Mohammed-Coleman wrote: Hi I have the following situation: Document document = new Document(); String body ="This is a body of document"; Field field = new Field("body", body, Field.Store.YES, Field.Index.ANALYZED); document.add(field); String id ="1234"; Field idField = new Field("id", id, Field.Store.YES, Field.Index.ANALYZED); document.add(idField); rtfIndexer.add(document); System.out.println(document.getFields()); When I print the fields of the document I get the following: stored/uncompressed,indexed,tokenizeddocument>, stored/uncompressed,indexed,tokenized] The RtfIndexer looks like this: public void add(Document document) { IndexWriter rtfIndexWriter = IndexWriterFactory.createIndexWriter(rtfDirectory, analyzer); try { rtfIndexWriter.addDocument(document); LOGGER.debug("Added Document:" + document + " to index"); commitAndOptimise(rtfIndexWriter); } catch (CorruptIndexException e) { throw new IllegalStateException(e); } catch (IOException e) { throw new IllegalStateException(e); } } private void commitAndOptimise(IndexWriter rtfIndexWriter) throws CorruptIndexException,IOException { LOGGER.debug("Committing document and closing index writer"); rtfIndexWriter.optimize(); rtfIndexWriter.commit(); rtfIndexWriter.close(); } However I load the Document using the below code: Directory directory = ((RtfIndexer)rtfIndexer).getDirectory(); IndexReader indexReader = IndexReader.open(directory); Document documentFromIndex = indexReader.document(1); System.out.println(documentFromIndex.getFields()); I get: [stored/uncompressed,indexed,tokenizeddocument>] It seems as though id field is not being stored in the Index...I can't understand why not as I can have added it to the document, I would be grateful if anyone could help! Cheers Amin P.S. Merry Christmas! -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Fwd: Search Problem
Hi I have created a RTFHandler which takes a RTF file and creates a lucene Document which is indexed. The RTFHandler looks like something like this: if (bodyText != null) { Document document = new Document(); Field field = new Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED); document.add(field); } I am using Java Built in RTF text extraction. When I run my test to verify that the document contains text that I expect this works fine. I get the following when I print the document: Documentrtf document that will be indexed. Amin Mohammed-Coleman> stored/ uncompressed,indexed stored/ uncompressed,indexed stored/ uncompressed,indexed stored/ uncompressed,indexed> The problem is when I use the following to search I get no result: MultiSearcher multiSearcher = new MultiSearcher(new Searchable[] {rtfIndexSearcher}); Term t = new Term("body", "Amin"); TermQuery termQuery = new TermQuery(t); TopDocs topDocs = multiSearcher.search(termQuery, 1); System.out.println(topDocs.totalHits); multiSearcher.close(); RftIndexSearcher is configured with the directory that holds rtf documents. I have used Luke to look at the document and what I am finding in the overview tab is the following for the document: 1 bodytest 1 id 1234 1 namertfDocumentToIndex.rtf 1 pathrtfDocumentToIndex.rtf 1 summary This is a 1 typeRTF_INDEXER 1 bodyrtf However on the Document tab I am getting (in the body field): This is a test rtf document that will be indexed. Amin Mohammed-Coleman I would expect to get a hit using "Amin" or even "document". I am not sure whether the line: TopDocs topDocs = multiSearcher.search(termQuery, 1); is incorrect as I am not too sure of the meaning of "Finds the top n hits for query." for search (Query query, int n) according to java docs. I would be grateful if someone may be able to advise on what I may be doing wrong. I am using Lucene 2.4.0 Cheers Amin
Re: Search Problem
Hi Sorry I was using the StandardAnalyzer in this instance. Cheers On 2 Jan 2009, at 00:55, Chris Lu wrote: You need to let us know the analyzer you are using. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Thu, Jan 1, 2009 at 1:11 PM, Amin Mohammed-Coleman >wrote: Hi I have created a RTFHandler which takes a RTF file and creates a lucene Document which is indexed. The RTFHandler looks like something like this: if (bodyText != null) { Document document = new Document(); Field field = new Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED); document.add(field); } I am using Java Built in RTF text extraction. When I run my test to verify that the document contains text that I expect this works fine. I get the following when I print the document: Documentrtf document that will be indexed. Amin Mohammed-Coleman> stored/uncompressed,indexed stored/uncompressed,indexed stored/uncompressed,indexed stored/uncompressed,indexed> The problem is when I use the following to search I get no result: MultiSearcher multiSearcher = new MultiSearcher(new Searchable[] {rtfIndexSearcher}); Term t = new Term("body", "Amin"); TermQuery termQuery = new TermQuery(t); TopDocs topDocs = multiSearcher.search(termQuery, 1); System.out.println(topDocs.totalHits); multiSearcher.close(); RftIndexSearcher is configured with the directory that holds rtf documents. I have used Luke to look at the document and what I am finding in the overview tab is the following for the document: 1 bodytest 1 id 1234 1 namertfDocumentToIndex.rtf 1 pathrtfDocumentToIndex.rtf 1 summary This is a 1 typeRTF_INDEXER 1 bodyrtf However on the Document tab I am getting (in the body field): This is a test rtf document that will be indexed. Amin Mohammed-Coleman I would expect to get a hit using "Amin" or even "document". I am not sure whether the line: TopDocs topDocs = multiSearcher.search(termQuery, 1); is incorrect as I am not too sure of the meaning of "Finds the top n hits for query." for search (Query query, int n) according to java docs. I would be grateful if someone may be able to advise on what I may be doing wrong. I am using Lucene 2.4.0 Cheers Amin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Search Problem
Hi I have tried this and it doesn't work. I don't understand why using "amin" instead of "Amin" would work, is it not case insensitive? I tried "test" for field "body" and this works. Any other terms don't work for example: "document" "indexed" these are tokens that were extracted when creating the lucene document. Thanks for your reply. Cheers Amin On 2 Jan 2009, at 10:36, Chris Lu wrote: Basically Lucene stores analyzed tokens, and looks up for the matches based on the tokens. "Amin" after StandardAnalyzer is "amin", so you need to use new Term("body", "amin"), instead of new Term("body", "Amin"), to search. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Thu, Jan 1, 2009 at 11:30 PM, Amin Mohammed-Coleman >wrote: Hi Sorry I was using the StandardAnalyzer in this instance. Cheers On 2 Jan 2009, at 00:55, Chris Lu wrote: You need to let us know the analyzer you are using. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Thu, Jan 1, 2009 at 1:11 PM, Amin Mohammed-Coleman wrote: Hi I have created a RTFHandler which takes a RTF file and creates a lucene Document which is indexed. The RTFHandler looks like something like this: if (bodyText != null) { Document document = new Document(); Field field = new Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED); document.add(field); } I am using Java Built in RTF text extraction. When I run my test to verify that the document contains text that I expect this works fine. I get the following when I print the document: Documenttest rtf document that will be indexed. Amin Mohammed-Coleman> stored/uncompressed,indexed stored/uncompressed,indexed stored/uncompressed,indexed stored/uncompressed,indexed> The problem is when I use the following to search I get no result: MultiSearcher multiSearcher = new MultiSearcher(new Searchable[] {rtfIndexSearcher}); Term t = new Term("body", "Amin"); TermQuery termQuery = new TermQuery(t); TopDocs topDocs = multiSearcher.search(termQuery, 1); System.out.println(topDocs.totalHits); multiSearcher.close(); RftIndexSearcher is configured with the directory that holds rtf documents. I have used Luke to look at the document and what I am finding in the overview tab is the following for the document: 1 bodytest 1 id 1234 1 namertfDocumentToIndex.rtf 1 pathrtfDocumentToIndex.rtf 1 summary This is a 1 type RTF_INDEXER 1 bodyrtf However on the Document tab I am getting (in the body field): This is a test rtf document that will be indexed. Amin Mohammed-Coleman I would expect to get a hit using "Amin" or even "document". I am not sure whether the line: TopDocs topDocs = multiSearcher.search(termQuery, 1); is incorrect as I am not too sure of the meaning of "Finds the top n hits for query." for search (Query query, int n) according to java docs. I would be grateful if someone may be able to advise on what I may be doing wrong. I am using Lucene 2.4.0 Cheers Amin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Search Problem
Hi Erick Thanks for your reply. I have used luke to inspect the document and I am some what confused. For example when I view the index using the overview tab of Luke I get the following: 1 bodytest 1 id 1234 1 namertfDocumentToIndex.rtf 1 pathrtfDocumentToIndex.rtf 1 summary This is a 1 typeRTF_INDEXER 1 bodyrtf However when I view the document in the Document tab I get the full text that was extracted from the rft document (field:body) which is: This is a test rtf document that will be indexed. Amin Mohammed-Coleman I am using the StandardAnaylzer therefore I wouldnt expect the words document, indexed, Amin Mohammed-Coleman to be removed. I have referenced the Lucene In Action book and I can't see what I may be doing wrong. I would be happy to provide a testcase should it be required. When adding the body field to the document I am doing: Document document = new Document(); Field field = new Field(FieldNameEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED); document.add(field); When I run the search code the string "test" is the only word that returns a result (TopDocs), whereas the others do not (e.g. "amin", "document", "indexed"). Thanks again for your help and advice. Cheers Amin On 2 Jan 2009, at 21:20, Erick Erickson wrote: Casing is usually handled by the analyzer. Since you construct the term query programmatically, it doesn't go through any analyzers, thus is not converted into lower case for searching as was done automatically for you when you indexed using StandardAnalyzer. As for why you aren't getting hits, it's unclear to me. But what I'd do is get a copy of Luke and examine your index to see what's *really* there. This will often give you clues, usually pointing to some kind of analyzer behavior that you weren't expecting. Best Erick On Fri, Jan 2, 2009 at 6:39 AM, Amin Mohammed-Coleman >wrote: Hi I have tried this and it doesn't work. I don't understand why using "amin" instead of "Amin" would work, is it not case insensitive? I tried "test" for field "body" and this works. Any other terms don't work for example: "document" "indexed" these are tokens that were extracted when creating the lucene document. Thanks for your reply. Cheers Amin On 2 Jan 2009, at 10:36, Chris Lu wrote: Basically Lucene stores analyzed tokens, and looks up for the matches based on the tokens. "Amin" after StandardAnalyzer is "amin", so you need to use new Term("body", "amin"), instead of new Term("body", "Amin"), to search. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Thu, Jan 1, 2009 at 11:30 PM, Amin Mohammed-Coleman wrote: Hi Sorry I was using the StandardAnalyzer in this instance. Cheers On 2 Jan 2009, at 00:55, Chris Lu wrote: You need to let us know the analyzer you are using. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Thu, Jan 1, 2009 at 1:11 PM, Amin Mohammed-Coleman wrote: Hi I have created a RTFHandler which takes a RTF file and creates a lucene Document which is indexed. The RTFHandler looks like something like this: if (bodyText != null) { Document document = new Document(); Field field = new Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED); document.add(field); } I am using Java Built in RTF text extraction. When I run my test to verify that the document contains text that I expect this works fine. I get the following when I print the document: Documenttest rtf document that will be indexed. Amin Mohammed-Coleman> stored/uncompressed,indexed stored/uncompressed,indexed stored/uncompressed,indexed stored/uncompressed,indexed> The problem is when I use the following to search I get no result: MultiSearcher multiSearcher = new MultiSearcher(new Searchable[] {rtfIndexSearcher}); Ter
Re: Search Problem
Hi again! I think I may have found the problem but I was wondering if you could verify: I have the following for my indexer: public void add(Document document) { IndexWriter indexWriter = IndexWriterFactory.createIndexWriter(getDirectory(), getAnalyzer()); try { indexWriter.addDocument(document); LOGGER.debug("Added Document:" + document + " to index"); commitAndOptimise(indexWriter); } catch (CorruptIndexException e) { throw new IllegalStateException(e); } catch (IOException e) { throw new IllegalStateException(e); } } the commitAndOptimise(indexWriter) looks like this: private void commitAndOptimise(IndexWriter indexWriter) throws CorruptIndexException,IOException { LOGGER.debug("Committing document and closing index writer"); indexWriter.optimize(); indexWriter.commit(); indexWriter.close(); } It seems as though if I comment out optimize then the overview tab in Luke for the rtf document looks like: 5 id 1234 3 bodydocument 3 bodybody 1 bodytest 1 bodyrtf 1 namertfDocumentToIndex.rtf 1 bodynew 1 pathrtfDocumentToIndex.rtf 1 summary This is a 1 typeRTF_INDEXER 1 bodycontent This is more what I expected although "Amin Mohammed-Coleman" hasn't been stored in the index. Should I not be using indexWriter.optimize() ? I tried using the search function in luke and got the following results: body:test ---> returns result body:document ---> no result body:content ---> no result body:rtf > returns result Thanks again...sorry to be sending so many emails about this. I am in the process of designing and developing a prototype of a document and domain indexing/searching component and I would like to demo to the rest of my team. Cheers Amin On 3 Jan 2009, at 01:23, Erick Erickson wrote: Well, your query results are consistent with what Luke is reporting. So I'd go back and test your assumptions. I suspect that you're not indexing what you think you are. For your test document, I'd just print out what you're indexing and the field it's going into. *for each field*. that is, every time you do a document.add(), print out that data. I'm pretty sure you'll find that you're not getting what you expect. For instance, the call to: MetaDataEnum.BODY.getDescription() may be returning some nonsense. Or bodyText.trim() isn't doing what you expect. Lucene is used by many folks, and errors of the magnitude you're experiencing would be seen by many people and the user list would be flooded with complaints if it were a Lucene issue at root. That leaves the code you wrote as the most likely culprit. So try a very simple test case with lots of debugging println's. I'm pretty sure you'll find the underlying issue with some of your assumptions pretty quickly. Sorry I can't be more specific, but we'd have to see all of your code and the test cases to do that Best Erick On Fri, Jan 2, 2009 at 6:13 PM, Amin Mohammed-Coleman >wrote: Hi Erick Thanks for your reply. I have used luke to inspect the document and I am some what confused. For example when I view the index using the overview tab of Luke I get the following: 1 bodytest 1 id 1234 1 namertfDocumentToIndex.rtf 1 pathrtfDocumentToIndex.rtf 1 summary This is a 1 typeRTF_INDEXER 1 bodyrtf However when I view the document in the Document tab I get the full text that was extracted from the rft document (field:body) which is: This is a test rtf document that will be indexed. Amin Mohammed-Coleman I am using the StandardAnaylzer therefore I wouldnt expect the words document, indexed, Amin Mohammed-Coleman to be removed. I have referenced the Lucene In Action book and I can't see what I may be doing wrong. I would be happy to provide a testcase should it be required. When adding the body field to the document I am doing: Document document = new Document(); Field field = new Field(FieldNameEnum.BODY.getDescription(), bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED); document.add(field); When I run the search code the string "test" is the only word that returns a result (TopDocs), whereas the others do not (e.g. "amin", "document", "indexed"). Thanks again for your help and advice. Cheers Amin On 2 Jan 2009, at 21:20, Erick Erickson wrote: Casing is usually handled by the analyzer. Since you construct
Re: Search Problem
Hi I am currently doing this as the indexer will be called from a upload action. There is no bulk file processing functioaliry at the moment. Cheers Sent from my iPhone On 3 Jan 2009, at 13:48, Shashi Kant wrote: Amin, Are you calling Close & Optimize after every addDocument? I would suggest something like this try { while //this could be your looping through a data reader for example { indexWriter.addDocument(document); } } finally { commitAndOptimise() } HTH Shashi - Original Message From: Amin Mohammed-Coleman To: java-user@lucene.apache.org Sent: Saturday, January 3, 2009 4:02:52 AM Subject: Re: Search Problem Hi again! I think I may have found the problem but I was wondering if you could verify: I have the following for my indexer: public void add(Document document) { IndexWriter indexWriter = IndexWriterFactory.createIndexWriter(getDirectory(), getAnalyzer()); try { indexWriter.addDocument(document); LOGGER.debug("Added Document:" + document + " to index"); commitAndOptimise(indexWriter); } catch (CorruptIndexException e) { throw new IllegalStateException(e); } catch (IOException e) { throw new IllegalStateException(e); } } the commitAndOptimise(indexWriter) looks like this: private void commitAndOptimise(IndexWriter indexWriter) throws CorruptIndexException,IOException { LOGGER.debug("Committing document and closing index writer"); indexWriter.optimize(); indexWriter.commit(); indexWriter.close(); } It seems as though if I comment out optimize then the overview tab in Luke for the rtf document looks like: 5id1234 3bodydocument 3bodybody 1bodytest 1bodyrtf 1namertfDocumentToIndex.rtf 1bodynew 1pathrtfDocumentToIndex.rtf 1summaryThis is a 1typeRTF_INDEXER 1bodycontent This is more what I expected although "Amin Mohammed-Coleman" hasn't been stored in the index. Should I not be using indexWriter.optimize() ? I tried using the search function in luke and got the following results: body:test ---> returns result body:document ---> no result body:content ---> no result body:rtf > returns result Thanks again...sorry to be sending so many emails about this. I am in the process of designing and developing a prototype of a document and domain indexing/searching component and I would like to demo to the rest of my team. Cheers Amin On 3 Jan 2009, at 01:23, Erick Erickson wrote: Well, your query results are consistent with what Luke is reporting. So I'd go back and test your assumptions. I suspect that you're not indexing what you think you are. For your test document, I'd just print out what you're indexing and the field it's going into. *for each field*. that is, every time you do a document.add(), print out that data. I'm pretty sure you'll find that you're not getting what you expect. For instance, the call to: MetaDataEnum.BODY.getDescription() may be returning some nonsense. Or bodyText.trim() isn't doing what you expect. Lucene is used by many folks, and errors of the magnitude you're experiencing would be seen by many people and the user list would be flooded with complaints if it were a Lucene issue at root. That leaves the code you wrote as the most likely culprit. So try a very simple test case with lots of debugging println's. I'm pretty sure you'll find the underlying issue with some of your assumptions pretty quickly. Sorry I can't be more specific, but we'd have to see all of your code and the test cases to do that Best Erick On Fri, Jan 2, 2009 at 6:13 PM, Amin Mohammed-Coleman >wrote: Hi Erick Thanks for your reply. I have used luke to inspect the document and I am some what confused. For example when I view the index using the overview tab of Luke I get the following: 1 bodytest 1 id 1234 1 namertfDocumentToIndex.rtf 1 pathrtfDocumentToIndex.rtf 1 summary This is a 1 typeRTF_INDEXER 1 bodyrtf However when I view the document in the Document tab I get the full text that was extracted from the rft document (field:body) which is: This is a test rtf document that will be indexed. Amin Mohammed-Coleman I am using the StandardAnaylzer therefore I wouldnt expect the words document, indexed, Amin Mohammed-Coleman to be removed. I have referenced the Lucene In Action book and I can't see what I may be doing wrong. I would be happy to provide a testcase should it be required. When adding the body field to the document I am doing: Document document = new Document(); Field field = new Field(FieldNameEnu
Re: Search Problem
Hi Please find attached a standalone test (inner classes for rtfHandler, indexing, etc) that shows search not returning expected results. I am using Lucene 2.4. Thanks again for the help! Cheers Amin On 3 Jan 2009, at 14:02, Grant Ingersoll wrote: You shouldn't need to call close and optimize after each document. You also don't need the commit if you are going to immediately close. Also, can you send a standalone test that shows the RTF extraction, the document creation and the indexing code that demonstrates your issue. FWIW, and as a complete aside to save you some time after you get this figured out, instead of re-inventing RTF extraction and PDF extraction (as you appear to be doing), have a look at Tika (http://lucene.apache.org/tika ) On Jan 3, 2009, at 8:48 AM, Shashi Kant wrote: Amin, Are you calling Close & Optimize after every addDocument? I would suggest something like this try { while //this could be your looping through a data reader for example { indexWriter.addDocument(document); } } finally { commitAndOptimise() } HTH Shashi - Original Message ---- From: Amin Mohammed-Coleman To: java-user@lucene.apache.org Sent: Saturday, January 3, 2009 4:02:52 AM Subject: Re: Search Problem Hi again! I think I may have found the problem but I was wondering if you could verify: I have the following for my indexer: public void add(Document document) { IndexWriter indexWriter = IndexWriterFactory.createIndexWriter(getDirectory(), getAnalyzer()); try { indexWriter.addDocument(document); LOGGER.debug("Added Document:" + document + " to index"); commitAndOptimise(indexWriter); } catch (CorruptIndexException e) { throw new IllegalStateException(e); } catch (IOException e) { throw new IllegalStateException(e); } } the commitAndOptimise(indexWriter) looks like this: private void commitAndOptimise(IndexWriter indexWriter) throws CorruptIndexException,IOException { LOGGER.debug("Committing document and closing index writer"); indexWriter.optimize(); indexWriter.commit(); indexWriter.close(); } It seems as though if I comment out optimize then the overview tab in Luke for the rtf document looks like: 5id1234 3bodydocument 3bodybody 1bodytest 1bodyrtf 1namertfDocumentToIndex.rtf 1bodynew 1pathrtfDocumentToIndex.rtf 1summaryThis is a 1typeRTF_INDEXER 1bodycontent This is more what I expected although "Amin Mohammed-Coleman" hasn't been stored in the index. Should I not be using indexWriter.optimize() ? I tried using the search function in luke and got the following results: body:test ---> returns result body:document ---> no result body:content ---> no result body:rtf > returns result Thanks again...sorry to be sending so many emails about this. I am in the process of designing and developing a prototype of a document and domain indexing/searching component and I would like to demo to the rest of my team. Cheers Amin On 3 Jan 2009, at 01:23, Erick Erickson wrote: Well, your query results are consistent with what Luke is reporting. So I'd go back and test your assumptions. I suspect that you're not indexing what you think you are. For your test document, I'd just print out what you're indexing and the field it's going into. *for each field*. that is, every time you do a document.add(), print out that data. I'm pretty sure you'll find that you're not getting what you expect. For instance, the call to: MetaDataEnum.BODY.getDescription() may be returning some nonsense. Or bodyText.trim() isn't doing what you expect. Lucene is used by many folks, and errors of the magnitude you're experiencing would be seen by many people and the user list would be flooded with complaints if it were a Lucene issue at root. That leaves the code you wrote as the most likely culprit. So try a very simple test case with lots of debugging println's. I'm pretty sure you'll find the underlying issue with some of your assumptions pretty quickly. Sorry I can't be more specific, but we'd have to see all of your code and the test cases to do that Best Erick On Fri, Jan 2, 2009 at 6:13 PM, Amin Mohammed-Coleman >wrote: Hi Erick Thanks for your reply. I have used luke to inspect the document and I am some what confused. For example when I view the index using the overview tab of Luke I get the following: 1 bodytest 1 id 1234 1 namertfDocumentToIndex.rtf 1 pathrtfDocumentToIndex.rtf 1 summary This is a 1 typeRTF_INDEXER 1 bodyrtf However when I view the document in the Document tab I
Re: Search Problem
Hi again Sorry I didn't include the WorkItem class! Here is the final test case. Apologies! On 3 Jan 2009, at 14:02, Grant Ingersoll wrote: You shouldn't need to call close and optimize after each document. You also don't need the commit if you are going to immediately close. Also, can you send a standalone test that shows the RTF extraction, the document creation and the indexing code that demonstrates your issue. FWIW, and as a complete aside to save you some time after you get this figured out, instead of re-inventing RTF extraction and PDF extraction (as you appear to be doing), have a look at Tika (http://lucene.apache.org/tika ) On Jan 3, 2009, at 8:48 AM, Shashi Kant wrote: Amin, Are you calling Close & Optimize after every addDocument? I would suggest something like this try { while //this could be your looping through a data reader for example { indexWriter.addDocument(document); } } finally { commitAndOptimise() } HTH Shashi - Original Message From: Amin Mohammed-Coleman To: java-user@lucene.apache.org Sent: Saturday, January 3, 2009 4:02:52 AM Subject: Re: Search Problem Hi again! I think I may have found the problem but I was wondering if you could verify: I have the following for my indexer: public void add(Document document) { IndexWriter indexWriter = IndexWriterFactory.createIndexWriter(getDirectory(), getAnalyzer()); try { indexWriter.addDocument(document); LOGGER.debug("Added Document:" + document + " to index"); commitAndOptimise(indexWriter); } catch (CorruptIndexException e) { throw new IllegalStateException(e); } catch (IOException e) { throw new IllegalStateException(e); } } the commitAndOptimise(indexWriter) looks like this: private void commitAndOptimise(IndexWriter indexWriter) throws CorruptIndexException,IOException { LOGGER.debug("Committing document and closing index writer"); indexWriter.optimize(); indexWriter.commit(); indexWriter.close(); } It seems as though if I comment out optimize then the overview tab in Luke for the rtf document looks like: 5id1234 3bodydocument 3bodybody 1bodytest 1bodyrtf 1namertfDocumentToIndex.rtf 1bodynew 1pathrtfDocumentToIndex.rtf 1summaryThis is a 1typeRTF_INDEXER 1bodycontent This is more what I expected although "Amin Mohammed-Coleman" hasn't been stored in the index. Should I not be using indexWriter.optimize() ? I tried using the search function in luke and got the following results: body:test ---> returns result body:document ---> no result body:content ---> no result body:rtf > returns result Thanks again...sorry to be sending so many emails about this. I am in the process of designing and developing a prototype of a document and domain indexing/searching component and I would like to demo to the rest of my team. Cheers Amin On 3 Jan 2009, at 01:23, Erick Erickson wrote: Well, your query results are consistent with what Luke is reporting. So I'd go back and test your assumptions. I suspect that you're not indexing what you think you are. For your test document, I'd just print out what you're indexing and the field it's going into. *for each field*. that is, every time you do a document.add(), print out that data. I'm pretty sure you'll find that you're not getting what you expect. For instance, the call to: MetaDataEnum.BODY.getDescription() may be returning some nonsense. Or bodyText.trim() isn't doing what you expect. Lucene is used by many folks, and errors of the magnitude you're experiencing would be seen by many people and the user list would be flooded with complaints if it were a Lucene issue at root. That leaves the code you wrote as the most likely culprit. So try a very simple test case with lots of debugging println's. I'm pretty sure you'll find the underlying issue with some of your assumptions pretty quickly. Sorry I can't be more specific, but we'd have to see all of your code and the test cases to do that Best Erick On Fri, Jan 2, 2009 at 6:13 PM, Amin Mohammed-Coleman >wrote: Hi Erick Thanks for your reply. I have used luke to inspect the document and I am some what confused. For example when I view the index using the overview tab of Luke I get the following: 1 bodytest 1 id 1234 1 namertfDocumentToIndex.rtf 1 pathrtfDocumentToIndex.rtf 1 summary This is a 1 typeRTF_INDEXER 1 bodyrtf However when I view the document in the Document tab I get the full text that was extracted from the rft document (field:body) which is: This is a test rtf docu
Re: Search Problem
Hi I have uploaded to google docs: url: http://docs.google.com/Doc?id=d77xf5q_0n6hb38fx Hope this works. Cheers Amin On 3 Jan 2009, at 19:53, Grant Ingersoll wrote: The mailing list often strips attachments (in fact, I'm surprised your earlier ones made it through). Perhaps you can put them up somewhere for download. On Jan 3, 2009, at 1:07 PM, Amin Mohammed-Coleman wrote: Hi again Sorry I didn't include the WorkItem class! Here is the final test case. Apologies! On 3 Jan 2009, at 14:02, Grant Ingersoll wrote: You shouldn't need to call close and optimize after each document. You also don't need the commit if you are going to immediately close. Also, can you send a standalone test that shows the RTF extraction, the document creation and the indexing code that demonstrates your issue. FWIW, and as a complete aside to save you some time after you get this figured out, instead of re-inventing RTF extraction and PDF extraction (as you appear to be doing), have a look at Tika (http://lucene.apache.org/tika ) On Jan 3, 2009, at 8:48 AM, Shashi Kant wrote: Amin, Are you calling Close & Optimize after every addDocument? I would suggest something like this try { while //this could be your looping through a data reader for example { indexWriter.addDocument(document); } } finally { commitAndOptimise() } HTH Shashi - Original Message ---- From: Amin Mohammed-Coleman To: java-user@lucene.apache.org Sent: Saturday, January 3, 2009 4:02:52 AM Subject: Re: Search Problem Hi again! I think I may have found the problem but I was wondering if you could verify: I have the following for my indexer: public void add(Document document) { IndexWriter indexWriter = IndexWriterFactory.createIndexWriter(getDirectory(), getAnalyzer()); try { indexWriter.addDocument(document); LOGGER.debug("Added Document:" + document + " to index"); commitAndOptimise(indexWriter); } catch (CorruptIndexException e) { throw new IllegalStateException(e); } catch (IOException e) { throw new IllegalStateException(e); } } the commitAndOptimise(indexWriter) looks like this: private void commitAndOptimise(IndexWriter indexWriter) throws CorruptIndexException,IOException { LOGGER.debug("Committing document and closing index writer"); indexWriter.optimize(); indexWriter.commit(); indexWriter.close(); } It seems as though if I comment out optimize then the overview tab in Luke for the rtf document looks like: 5id1234 3bodydocument 3bodybody 1bodytest 1bodyrtf 1namertfDocumentToIndex.rtf 1bodynew 1pathrtfDocumentToIndex.rtf 1summaryThis is a 1typeRTF_INDEXER 1body content This is more what I expected although "Amin Mohammed-Coleman" hasn't been stored in the index. Should I not be using indexWriter.optimize() ? I tried using the search function in luke and got the following results: body:test ---> returns result body:document ---> no result body:content ---> no result body:rtf > returns result Thanks again...sorry to be sending so many emails about this. I am in the process of designing and developing a prototype of a document and domain indexing/searching component and I would like to demo to the rest of my team. Cheers Amin On 3 Jan 2009, at 01:23, Erick Erickson wrote: Well, your query results are consistent with what Luke is reporting. So I'd go back and test your assumptions. I suspect that you're not indexing what you think you are. For your test document, I'd just print out what you're indexing and the field it's going into. *for each field*. that is, every time you do a document.add(), print out that data. I'm pretty sure you'll find that you're not getting what you expect. For instance, the call to: MetaDataEnum.BODY.getDescription() may be returning some nonsense. Or bodyText.trim() isn't doing what you expect. Lucene is used by many folks, and errors of the magnitude you're experiencing would be seen by many people and the user list would be flooded with complaints if it were a Lucene issue at root. That leaves the code you wrote as the most likely culprit. So try a very simple test case with lots of debugging println's. I'm pretty sure you'll find the underlying issue with some of your assumptions pretty quickly. Sorry I can't be more specific, but we'd have to see all of your code and the test cases to do that Best Erick On Fri, Jan 2, 2009 at 6:13 PM, Amin Mohammed-Coleman >wrote: Hi Erick Thanks for your reply. I have used luke to inspect the document and I am some what confused. For example when I view the index using the overview tab of Luke I get the
Re: Search Test file
sertNotSame; import static org.junit.Assert.assertTrue; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import javax.swing.text.BadLocationException; import javax.swing.text.DefaultStyledDocument; import javax.swing.text.rtf.RTFEditorKit; import org.apache.commons.lang.StringUtils; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.ant.DocumentHandler; import org.apache.lucene.ant.DocumentHandlerException; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.queryParser.MultiFieldQueryParser; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MultiSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Searchable; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.junit.After; import org.junit.Before; import org.junit.Test; import com.amin.app.lucene.util.WorkItem.IndexerType; public class SearchTest { private File rtfFile = null; private static final String RTF_FILE_NAME = "rtfDocumentToIndex.rtf"; @Before public void setUp() throws Exception { InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream(RTF_FILE_NAME); rtfFile = new File(RTF_FILE_NAME); convertInputStreamToFile(inputStream, rtfFile); } @Test public void testCanCreateLuceneDocumentForRTFDocument() throws Exception { JavaBuiltInRTFHandler builtInRTFHandler = new JavaBuiltInRTFHandler(); Document document = builtInRTFHandler.getDocument(rtfFile); assertNotNull(document); String value = document.get(FieldNameEnum.BODY.getDescription()); assertNotNull(value); assertNotSame("", value); assertTrue(value.contains("Amin Mohammed-Coleman")); assertTrue(value.contains("This is a test rtf document that will be indexed.")); String path = document.get(FieldNameEnum.PATH.getDescription()); assertNotNull(path); assertTrue(path.contains(".rtf")); String fileName = document.get(FieldNameEnum.NAME.getDescription()); assertNotNull(fileName); assertEquals(RTF_FILE_NAME, fileName); assertEquals(WorkItem.IndexerType.RTF_INDEXER.name(), document.get(FieldNameEnum.TYPE.getDescription())); } @Test public void testCanSearchRtfDocument() throws Exception { JavaBuiltInRTFHandler builtInRTFHandler = new JavaBuiltInRTFHandler(); Document document = builtInRTFHandler.getDocument(rtfFile); IndexWriter indexWriter = new IndexWriter(getDirectory(),getAnalyzer(),new IndexWriter.MaxFieldLength(2)); try { indexWriter.addDocument(document); commitAndCloseWriter(indexWriter); } catch (CorruptIndexException e) { throw new IllegalStateException(e); } catch (IOException e) { throw new IllegalStateException(e); } //I plan to use other searchers later IndexSearcher indexSearcher = new IndexSearcher(getDirectory()); MultiSearcher multiSearcher = new MultiSearcher(new Searchable[] {indexSearcher}); QueryParser queryParser = new MultiFieldQueryParser(new String[] {FieldNameEnum.BODY.getDescription()}, new StandardAnalyzer()); Query query = queryParser.parse("amin"); TopDocs topDocs = multiSearcher.search(query, BooleanQuery.getMaxClauseCount()); assertNotNull(topDocs); assertEquals(1, topDocs.totalHits); ScoreDoc[] scoreDocs = topDocs.scoreDocs; for (ScoreDoc scoreDoc : scoreDocs) { Document documentFromSearch = indexSearcher.doc(scoreDoc.doc); assertNotNull(documentFromSearch); String bodyText = documentFromSearch.get(FieldNameEnum.BODY.getDescription()); assertNotNull(bodyText); assertNotSame("", bodyText); assertTrue(bodyText.contains("Amin Mohammed-Coleman")); assertTrue(bodyText.contains("This is a test rtf document that will be indexed.")); } multiSearcher.close(); } @After public void tearDown() throws Exception { rtfFile.delete(); if (getDirectory().list() != null && getDirectory().list().length > 0) { IndexReader reader = IndexReader.open(getDirectory()); for(int i = 0; i < reader.maxDoc();i++) { reader.deleteDocument(i); } reader.close(); } } private void commitAndCloseWriter(IndexWriter indexWriter) throws CorruptIndexException,IOException { indexWriter.commit(); indexWriter.close(); } public Directory getDirectory() throws IOException { return FSDirectory.getDirectory("/tmp/lucene/rtf"); } public Analyzer getAnalyzer() { return new StandardAnalyzer(); } private static void convertInputStreamToFile(InputStrea
Re: Search Test file
Hi, Please ignore my last email. Just woke up and wrote the email. After looking at the luke further it looks like the token is being stored at index.amin, that is why "amin" wasn't working. Making those changes that you recommended worked. I will investigate further why "amin" token is being stored as "indexed.amin". Thanks again for all the help. Cheers Amin On 4 Jan 2009, at 02:23, Grant Ingersoll wrote: Begin forwarded message: From: Grant Ingersoll Date: January 3, 2009 8:19:14 PM EST To: java-...@lucene.apache.org Subject: Fwd: Search Test file Reply-To: java-...@lucene.apache.org Hi Amin, I see a couple of issues with your program below, and one that is the cause of the problem of not finding "amin" as a query term. When you construct your IndexWriter, you are doing: IndexWriter indexWriter = new IndexWriter(getDirectory(),getAnalyzer(),new IndexWriter.MaxFieldLength(2)); The MaxFieldLength parameter specifies the maximum number of tokens allowed in a Field. Everything else after that is dropped. See http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexWriter.html#IndexWriter(org.apache.lucene.store.Directory,%20org.apache.lucene.analysis.Analyzer,%20org.apache.lucene.index.IndexWriter.MaxFieldLength) and http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexWriter.MaxFieldLength.html Also, TopDocs topDocs = multiSearcher.search(query, BooleanQuery.getMaxClauseCount()); strikes me as really odd. Why are you passing in the max clause count as the number of results you want returned? Cheers, Grant Begin forwarded message: From: "ami...@gmail.com" Date: January 3, 2009 3:24:52 PM EST To: gsing...@apache.org Subject: Search Test file I've shared a document with you called "Search Test file": http://docs.google.com/Doc?id=d77xf5q_0n6hb38fx&invite=cjq79zj It's not an attachment -- it's stored online at Google Docs. To open this document, just click the link above. --- Hi I have uploaded the test file at google docs. It is currently a txt file but if you change the extension to .java it should work. package com.amin.app.lucene.search.impl; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotNull; import static org.junit.Assert.assertNotSame; import static org.junit.Assert.assertTrue; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import javax.swing.text.BadLocationException; import javax.swing.text.DefaultStyledDocument; import javax.swing.text.rtf.RTFEditorKit; import org.apache.commons.lang.StringUtils; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.ant.DocumentHandler; import org.apache.lucene.ant.DocumentHandlerException; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.CorruptIndexException; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.queryParser.MultiFieldQueryParser; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.BooleanQuery; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.MultiSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.Searchable; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.junit.After; import org.junit.Before; import org.junit.Test; import com.amin.app.lucene.util.WorkItem.IndexerType; public class SearchTest { private File rtfFile = null; private static final String RTF_FILE_NAME = "rtfDocumentToIndex.rtf"; @Before public void setUp() throws Exception { InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream(RTF_FILE_NAME); rtfFile = new File(RTF_FILE_NAME); convertInputStreamToFile(inputStream, rtfFile); } @Test public void testCanCreateLuceneDocumentForRTFDocument() throws Exception { JavaBuiltInRTFHandler builtInRTFHandler = new JavaBuiltInRTFHandler(); Document document = builtInRTFHandler.getDocument(rtfFile); assertNotNull(document); String value = document.get(FieldNameEnum.BODY.getDescription()); assertNotNull(value); assertNotSame("", value); assertTrue(value.contains("Amin Mohammed-Coleman")); assertTrue(value.contains("This is a test rtf document that will be indexed.")); String path = document.get(FieldNameEnum.PATH.getDescription()); assertNotNull(path); assertTrue(path.contains(".rtf")); String fileName = document.get(FieldNameEnum.NAME.getDescription())
Re: Search Test file
Hi Test case passing now. Thanks for your help. I kind of thought it was probably something I was doing wrong! Cheers Amin On 4 Jan 2009, at 16:59, Grant Ingersoll wrote: On Jan 4, 2009, at 2:49 AM, Amin Mohammed-Coleman wrote: Hi Grant Thank you for looking at the test case. I have updated the IndexWriter to use UNLIMITED for MaxFieldLength. I tried using Integer.MAX_VALUE for Also, TopDocs topDocs = multiSearcher.search(query, BooleanQuery.getMaxClauseCount()); strikes me as really odd. Why are you passing in the max clause count as the number of results you want returned? Just pass in something like "10". However I get the following exception : java.lang.NegativeArraySizeException at org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java: 41) at org.apache.lucene.search.HitQueue.(HitQueue.java:24) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:200) at org.apache.lucene.search.Searcher.search(Searcher.java:136) at org.apache.lucene.search.Searcher.search(Searcher.java:146) at com. amin. app. lucene. search.impl.SearchTest.testCanSearchRtfDocument(SearchTest.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun. reflect. NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun. reflect. DelegatingMethodAccessorImpl. invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.internal.runners.TestMethod.invoke(TestMethod.java: 59) at org. junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java: 98) at org.junit.internal.runners.MethodRoadie $2.run(MethodRoadie.java:79) at org. junit. internal. runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java: 87) at org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77) at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java: 42) at org. junit. internal. runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88) at org. junit. internal. runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51) at org.junit.internal.runners.JUnit4ClassRunner $1.run(JUnit4ClassRunner.java:44) at org. junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java: 27) at org. junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37) at org. junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java: 42) at org. eclipse. jdt. internal. junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45) at org. eclipse. jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org. eclipse. jdt. internal. junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) at org. eclipse. jdt. internal. junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) at org. eclipse. jdt. internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386) at org. eclipse. jdt. internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java: 196) I know that this is an issue (not being able to use Integer.MAX_VALUE). I tried using 100 and my test still doesn't pass. Cheers Amin On 4 Jan 2009, at 02:23, Grant Ingersoll wrote: Begin forwarded message: From: Grant Ingersoll Date: January 3, 2009 8:19:14 PM EST To: java-...@lucene.apache.org Subject: Fwd: Search Test file Reply-To: java-...@lucene.apache.org Hi Amin, I see a couple of issues with your program below, and one that is the cause of the problem of not finding "amin" as a query term. When you construct your IndexWriter, you are doing: IndexWriter indexWriter = new IndexWriter(getDirectory(),getAnalyzer(),new IndexWriter.MaxFieldLength(2)); The MaxFieldLength parameter specifies the maximum number of tokens allowed in a Field. Everything else after that is dropped. See http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexWriter.html#IndexWriter(org.apache.lucene.store.Directory,%20org.apache.lucene.analysis.Analyzer,%20org.apache.lucene.index.IndexWriter.MaxFieldLength ) and http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexWriter.MaxFieldLength.html Also, TopDocs topDocs = multiSearcher.search(query, BooleanQuery.getMaxClauseCount()); strikes me as really odd. Why are you passing in the max clause count as the number of results you want returned? Cheers, Grant Begin forwarded message: From: "ami...@gmail.com" Date: January 3, 2009 3:24:52 PM EST To: gsing...@apache.org Subject: Search Test file I've shared a document with you called "Search Test file": http://docs.google.com/Doc?id=d77xf5q_0n6hb38fx&invite=cjq79zj It's not an attachment -- it's stored online at Google Docs. To open this document, just click the link abov
MultiSearcher: close()
Hi I have a class that uses the MultiSearcher inorder to perform search using different other searches. Here is a snippet of the class: MultiSearcher multiSearcher = null; try { multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[] {})); QueryParser queryParser = new MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query query = queryParser.parse(searchRequest.getSearchTerm()); //TODO: Sort and Filters TopDocs topDocs = multiSearcher.search(query, 100); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = " +topDocs.totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } finally { if (multiSearcher != null) { try { multiSearcher.close(); } catch (IOException e) { LOGGER.error("Could not close multisearcher. Need to investigate why.", e); } } } This class is injected with dependencies using spring. Do I need to explicitly close the multisearcher? If I call the method first time it is ok, then any subsequent calls generate the following: org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed What is the best practice for this? I had a look at Lucene In Action book and example doesn't close the multisearcher. Any help would be highly appreciated. Cheers
Re: clustering with compass & terracotta
I've been working on integrating hibernate search and Gigaspaces XAP. It's been raised as a openspaces project and awaiting approval. The aim is to place indexes on the space and use gigaspaces middleware support for clustering, replication and other services. Sent from my iPhone On 15 Jan 2009, at 20:05, Glen Newton wrote: There is a discussion here: http://www.terracotta.org/web/display/orgsite/Lucene+Integration Also of interest: "Katta - distribute lucene indexes in a grid" http://katta.wiki.sourceforge.net/ -glen http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html http://zzzoot.blogspot.com/2008/11/software-announcement-lusql-database-to.html http://zzzoot.blogspot.com/2008/09/katta-released-lucene-on-grid.html http://zzzoot.blogspot.com/2008/06/lucene-concurrent-search-performance.html http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html 2009/1/15 Angel, Eric : I just ran into this http://www.compass-project.org/docs/2.0.0/reference/html/needle-terracot ta.html and was wondering if any of you had tried anything like this and if so, what your experience was like. Eric -- - - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Indexing and Searching Web Application
Hi I have recently worked on developing an application which allows you to upload a file (which is indexed so you can search later). I have numerous tests to show that you can index and search documents (in some instances within the same test), however when I perform the operation in the site: 1) Upload File and Index 2) Search I don't get any hits. When I restart the application then if I make another search I can find the results. It seems as though indexes aren't being committed when I do the initial upload. This is strange. I explicitly call commit in my code when I upload the file. Has anyone experienced this before? Any help would be appreciated. Kind Regards Amin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Indexing and Searching Web Application
I make a call to my search class which looks like this: public Summary[] search(SearchRequest searchRequest) { List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; try { multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[] {})); QueryParser queryParser = new MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query query = queryParser.parse(searchRequest.getSearchTerm()); //TODO: Sort and Filters TopDocs topDocs = multiSearcher.search(query, 100); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = " +topDocs.totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } Do I need to do this explicitly? Cheers Amin On 19 Jan 2009, at 20:48, Greg Shackles wrote: After you make the commit to the index, are you reloading the index in the searchers? - Greg On Mon, Jan 19, 2009 at 3:29 PM, Amin Mohammed-Coleman >wrote: Hi I have recently worked on developing an application which allows you to upload a file (which is indexed so you can search later). I have numerous tests to show that you can index and search documents (in some instances within the same test), however when I perform the operation in the site: 1) Upload File and Index 2) Search I don't get any hits. When I restart the application then if I make another search I can find the results. It seems as though indexes aren't being committed when I do the initial upload. This is strange. I explicitly call commit in my code when I upload the file. Has anyone experienced this before? Any help would be appreciated. Kind Regards Amin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Indexing and Searching Web Application
Sent from my iPhone On 19 Jan 2009, at 23:23, Greg Shackles wrote: I just quickly skimmed the code since I don't have much time right now but it looks like you are keeping an array of IndexSearchers open that you re-use in this search function, right? If that's the case, you need to tell those IndexSearchers to re-open the indexes because they have changed since they were first opened. That should solve your problem. - Greg On Mon, Jan 19, 2009 at 4:45 PM, Amin Mohammed-Coleman >wrote: I make a call to my search class which looks like this: public Summary[] search(SearchRequest searchRequest) { List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; try { multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[] {})); QueryParser queryParser = new MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query query = queryParser.parse(searchRequest.getSearchTerm()); //TODO: Sort and Filters TopDocs topDocs = multiSearcher.search(query, 100); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = " +topDocs.totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } Do I need to do this explicitly? Cheers Amin On 19 Jan 2009, at 20:48, Greg Shackles wrote: After you make the commit to the index, are you reloading the index in the searchers? - Greg On Mon, Jan 19, 2009 at 3:29 PM, Amin Mohammed-Coleman wrote: Hi I have recently worked on developing an application which allows you to upload a file (which is indexed so you can search later). I have numerous tests to show that you can index and search documents (in some instances within the same test), however when I perform the operation in the site: 1) Upload File and Index 2) Search I don't get any hits. When I restart the application then if I make another search I can find the results. It seems as though indexes aren't being committed when I do the initial upload. This is strange. I explicitly call commit in my code when I upload the file. Has anyone experienced this before? Any help would be appreciated. Kind Regards Amin --- -- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Indexing and Searching Web Application
Hi Thanks for your reply. I originally explicitly closed the multisearcher but this caused problem in that first search would work and then subsequent searches would cause an IndexReaderClosedException. I sent an email to the mailing group on what the best practice is on whether to close the multi searcher or leave it open. I couldn't see in the dogs how to re open. Would it be possible to get some advice on how to do this. Thanks again for your help. On 19 Jan 2009, at 23:23, Greg Shackles wrote: I just quickly skimmed the code since I don't have much time right now but it looks like you are keeping an array of IndexSearchers open that you re-use in this search function, right? If that's the case, you need to tell those IndexSearchers to re-open the indexes because they have changed since they were first opened. That should solve your problem. - Greg On Mon, Jan 19, 2009 at 4:45 PM, Amin Mohammed-Coleman >wrote: I make a call to my search class which looks like this: public Summary[] search(SearchRequest searchRequest) { List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; try { multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[] {})); QueryParser queryParser = new MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query query = queryParser.parse(searchRequest.getSearchTerm()); //TODO: Sort and Filters TopDocs topDocs = multiSearcher.search(query, 100); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = " +topDocs.totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } Do I need to do this explicitly? Cheers Amin On 19 Jan 2009, at 20:48, Greg Shackles wrote: After you make the commit to the index, are you reloading the index in the searchers? - Greg On Mon, Jan 19, 2009 at 3:29 PM, Amin Mohammed-Coleman wrote: Hi I have recently worked on developing an application which allows you to upload a file (which is indexed so you can search later). I have numerous tests to show that you can index and search documents (in some instances within the same test), however when I perform the operation in the site: 1) Upload File and Index 2) Search I don't get any hits. When I restart the application then if I make another search I can find the results. It seems as though indexes aren't being committed when I do the initial upload. This is strange. I explicitly call commit in my code when I upload the file. Has anyone experienced this before? Any help would be appreciated. Kind Regards Amin --- -- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Indexing and Searching Web Application
Hi After your email I had a look around and came up with the below solution (I'm not sure if this is the right approach or there is a performance implication to doing this) public Summary[] search(SearchRequest searchRequest) { List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; List newIndexSearchers = new ArrayList(); try { for (IndexSearcher indexSearcher: searchers) { IndexReader indexReader = indexSearcher.getIndexReader().reopen(); IndexSearcher indexSearch = new IndexSearcher(indexReader); newIndexSearchers.add(indexSearch); } multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new IndexSearcher[] {})); QueryParser queryParser = new MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query query = queryParser.parse(searchRequest.getSearchTerm()); //TODO: Sort and Filters TopDocs topDocs = multiSearcher.search(query, 100); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = " +topDocs.totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } The searchers are configured in spring using which looks like this: class="org.apache.lucene.search.IndexSearcher" scope="prototype" lazy- init="true" > ref="rtfDirectory" /> I set the dependencies on the DocumentSearcher class. Cheers Amin On 19 Jan 2009, at 21:45, Amin Mohammed-Coleman wrote: I make a call to my search class which looks like this: public Summary[] search(SearchRequest searchRequest) { List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; try { multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[] {})); QueryParser queryParser = new MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query query = queryParser.parse(searchRequest.getSearchTerm()); //TODO: Sort and Filters TopDocs topDocs = multiSearcher.search(query, 100); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = " +topDocs.totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } Do I need to do this explicitly? Cheers Amin On 19 Jan 2009, at 20:48, Greg Shackles wrote: After you make the commit to the index, are you reloading the index in the searchers? - Greg On Mon, Jan 19, 2009 at 3:29 PM, Amin Mohammed-Coleman >wrote: Hi I have recently worked
Re: Indexing and Searching Web Application
Am I supposed to close the oldIndexReader? I just tried this and I get an exception stating that the IndexReader is closed. Cheers On Tue, Jan 20, 2009 at 9:33 AM, Ganesh wrote: > Reopen the reader, only if it is modified. > > IndexReader oldIndexReader = indexSearcher.getIndexReader(); > if (!oldIndexReader.isCurrent()) { > IndexReader newIndexReader = oldIndexReader.reOpen(); > oldIndexReader.close(); > indexSearcher.close(); > IndexSearcher indexSearch = new IndexSearcher(newIndexReader); > } > > Regards > Ganesh > > - Original Message - From: "Amin Mohammed-Coleman" < > ami...@gmail.com> > To: > Sent: Tuesday, January 20, 2009 1:38 PM > Subject: Re: Indexing and Searching Web Application > > > > Hi >> >> After your email I had a look around and came up with the below >> solution (I'm not sure if this is the right approach or there is a >> performance implication to doing this) >> >> public Summary[] search(SearchRequest searchRequest) { >> List summaryList = new ArrayList(); >> StopWatch stopWatch = new StopWatch("searchStopWatch"); >> stopWatch.start(); >> MultiSearcher multiSearcher = null; >> List newIndexSearchers = new >> ArrayList(); >> try { >> for (IndexSearcher indexSearcher: searchers) { >> IndexReader indexReader = indexSearcher.getIndexReader().reopen(); >> IndexSearcher indexSearch = new IndexSearcher(indexReader); >> newIndexSearchers.add(indexSearch); >> } >> >> multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new >> IndexSearcher[] {})); >> QueryParser queryParser = new >> MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); >> Query query = queryParser.parse(searchRequest.getSearchTerm()); >> >> //TODO: Sort and Filters >> >> TopDocs topDocs = multiSearcher.search(query, 100); >> ScoreDoc[] scoreDocs = topDocs.scoreDocs; >> LOGGER.debug("total number of hits for [" + query.toString() + " ] >> = " +topDocs.totalHits); >> >> for (ScoreDoc scoreDoc : scoreDocs) { >> final Document doc = multiSearcher.doc(scoreDoc.doc); >> float score = scoreDoc.score; >> final BaseDocument baseDocument = new BaseDocument(doc, score); >> Summary documentSummary = new DocumentSummaryImpl(baseDocument); >> summaryList.add(documentSummary); >> } >> >> } catch (Exception e) { >> throw new IllegalStateException(e); >> } >> >> stopWatch.stop(); >> >> LOGGER.debug("total time taken for seach: " + >> stopWatch.getTotalTimeMillis() + " ms"); >> return summaryList.toArray(new Summary[] {}); >> } >> >> >> The searchers are configured in spring using which looks like this: >> >> > class="org.apache.lucene.search.IndexSearcher" scope="prototype" lazy- >> init="true" > >> > ref="rtfDirectory" /> >> >> >> I set the dependencies on the DocumentSearcher class. >> >> >> Cheers >> Amin >> >> >> On 19 Jan 2009, at 21:45, Amin Mohammed-Coleman wrote: >> >> I make a call to my search class which looks like this: >>> >>> >>> public Summary[] search(SearchRequest searchRequest) { >>> List summaryList = new ArrayList(); >>> StopWatch stopWatch = new StopWatch("searchStopWatch"); >>> stopWatch.start(); >>> MultiSearcher multiSearcher = null; >>> try { >>> multiSearcher = new MultiSearcher(searchers.toArray(new >>> IndexSearcher[] {})); >>> QueryParser queryParser = new >>> MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), >>> analyzer); >>> Query query = queryParser.parse(searchRequest.getSearchTerm()); >>> >>> //TODO: Sort and Filters >>> >>> TopDocs topDocs = multiSearcher.search(query, 100); >>> ScoreDoc[] scoreDocs = topDocs.scoreDocs; >>> LOGGER.debug("total number of hits for [" + query.toString() + >>> " ] = " +topDocs.totalHits); >>> >>> for (ScoreDoc scoreDoc : scoreDocs) { >>> final Document doc = multiSearcher.doc(scoreDoc.doc); >>> float score = scoreDoc.score; >>> final BaseDocument baseDocument = new BaseDocument(doc, score); >>> Summary documentSummary = new DocumentSummaryImpl(baseDocument); >>> summaryList.add(documentSummary); >>> } >>> >>> } catch (Exception e) { >>> throw new IllegalStateException(e); >>> } >>>
Re: Indexing and Searching Web Application
Hi Yes I am using the reopen method on indexreader. I am not closing the old indexer as per Ganesh's instruction. It seems to be working correctly so I presume it's ok not to close. Thanks Amin On 20 Jan 2009, at 19:27, "Angel, Eric" wrote: There's a reopen() method in the IndexReader class. You can use that. -Original Message- From: Amin Mohammed-Coleman [mailto:ami...@gmail.com] Sent: Tuesday, January 20, 2009 5:02 AM To: java-user@lucene.apache.org Subject: Re: Indexing and Searching Web Application Am I supposed to close the oldIndexReader? I just tried this and I get an exception stating that the IndexReader is closed. Cheers On Tue, Jan 20, 2009 at 9:33 AM, Ganesh wrote: Reopen the reader, only if it is modified. IndexReader oldIndexReader = indexSearcher.getIndexReader(); if (!oldIndexReader.isCurrent()) { IndexReader newIndexReader = oldIndexReader.reOpen(); oldIndexReader.close(); indexSearcher.close(); IndexSearcher indexSearch = new IndexSearcher(newIndexReader); } Regards Ganesh - Original Message ----- From: "Amin Mohammed-Coleman" < ami...@gmail.com> To: Sent: Tuesday, January 20, 2009 1:38 PM Subject: Re: Indexing and Searching Web Application Hi After your email I had a look around and came up with the below solution (I'm not sure if this is the right approach or there is a performance implication to doing this) public Summary[] search(SearchRequest searchRequest) { List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; List newIndexSearchers = new ArrayList(); try { for (IndexSearcher indexSearcher: searchers) { IndexReader indexReader = indexSearcher.getIndexReader().reopen(); IndexSearcher indexSearch = new IndexSearcher(indexReader); newIndexSearchers.add(indexSearch); } multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new IndexSearcher[] {})); QueryParser queryParser = new MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query query = queryParser.parse(searchRequest.getSearchTerm()); //TODO: Sort and Filters TopDocs topDocs = multiSearcher.search(query, 100); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = " +topDocs.totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } The searchers are configured in spring using which looks like this: lazy- init="true" > I set the dependencies on the DocumentSearcher class. Cheers Amin On 19 Jan 2009, at 21:45, Amin Mohammed-Coleman wrote: I make a call to my search class which looks like this: public Summary[] search(SearchRequest searchRequest) { List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; try { multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[] {})); QueryParser queryParser = new MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query query = queryParser.parse(searchRequest.getSearchTerm()); //TODO: Sort and Filters TopDocs topDocs = multiSearcher.search(query, 100); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = " +topDocs.totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } Do I need to do this explicitly? Cheers Amin On 19 Jan 2009, at 20:48, Greg Shackles wrote: After you make the commit to the index, are you reloading the index in the searchers? - Greg On Mon, Jan 19, 2009 at 3:29 PM, Amin Mohammed-Coleman < ami...@gmail.com wrote: Hi I have recently worked on developing an application which allows you to upload a file (which is indexed so you can search later). I have numerous tests to show that you can index and search documents (in some instances within the same test), however when I per
Re: Indexing and Searching Web Application
Hi Will give that a go. Thanks Sent from my iPhone On 21 Jan 2009, at 12:26, "Ganesh" wrote: I am closing the old reader and it is working fine for me. Refer to IndexReader.Reopen javadoc. ///Below is the code snipper from IndexReader.reopen javadoc IndexReader reader = ... ... IndexReader new = r.reopen(); if (new != reader) { ... // reader was reopened reader.close(); //Old reader is closed. } reader = new; Regards Ganesh - Original Message ----- From: "Amin Mohammed-Coleman" > To: Cc: Sent: Wednesday, January 21, 2009 1:07 AM Subject: Re: Indexing and Searching Web Application Hi Yes I am using the reopen method on indexreader. I am not closing the old indexer as per Ganesh's instruction. It seems to be working correctly so I presume it's ok not to close. Thanks Amin On 20 Jan 2009, at 19:27, "Angel, Eric" wrote: There's a reopen() method in the IndexReader class. You can use that. -Original Message- From: Amin Mohammed-Coleman [mailto:ami...@gmail.com] Sent: Tuesday, January 20, 2009 5:02 AM To: java-user@lucene.apache.org Subject: Re: Indexing and Searching Web Application Am I supposed to close the oldIndexReader? I just tried this and I get an exception stating that the IndexReader is closed. Cheers On Tue, Jan 20, 2009 at 9:33 AM, Ganesh wrote: Reopen the reader, only if it is modified. IndexReader oldIndexReader = indexSearcher.getIndexReader(); if (!oldIndexReader.isCurrent()) { IndexReader newIndexReader = oldIndexReader.reOpen(); oldIndexReader.close(); indexSearcher.close(); IndexSearcher indexSearch = new IndexSearcher(newIndexReader); } Regards Ganesh ----- Original Message - From: "Amin Mohammed-Coleman" < ami...@gmail.com> To: Sent: Tuesday, January 20, 2009 1:38 PM Subject: Re: Indexing and Searching Web Application Hi After your email I had a look around and came up with the below solution (I'm not sure if this is the right approach or there is a performance implication to doing this) public Summary[] search(SearchRequest searchRequest) { List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; List newIndexSearchers = new ArrayList(); try { for (IndexSearcher indexSearcher: searchers) { IndexReader indexReader = indexSearcher.getIndexReader().reopen(); IndexSearcher indexSearch = new IndexSearcher(indexReader); newIndexSearchers.add(indexSearch); } multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new IndexSearcher[] {})); QueryParser queryParser = new MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query query = queryParser.parse(searchRequest.getSearchTerm()); //TODO: Sort and Filters TopDocs topDocs = multiSearcher.search(query, 100); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = " +topDocs.totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } The searchers are configured in spring using which looks like this: lazy- init="true" > I set the dependencies on the DocumentSearcher class. Cheers Amin On 19 Jan 2009, at 21:45, Amin Mohammed-Coleman wrote: I make a call to my search class which looks like this: public Summary[] search(SearchRequest searchRequest) { List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; try { multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[] {})); QueryParser queryParser = new MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query query = queryParser.parse(searchRequest.getSearchTerm()); //TODO: Sort and Filters TopDocs topDocs = multiSearcher.search(query, 100); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = " +topDocs.totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for seach: "
Re: Indexing and Searching Web Application
Hi I did the following according to java docs: for (IndexSearcher indexSearcher: searchers) { IndexReader reader = indexSearcher.getIndexReader(); IndexReader newReader = reader.reopen(); if (newReader != reader) { reader.close(); } reader = newReader; IndexSearcher indexSearch = new IndexSearcher(reader); indexSearchers.add(indexSearch); } First search works ok, susequent search result in: org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed Cheers On Wed, Jan 21, 2009 at 1:47 PM, Amin Mohammed-Coleman wrote: > Hi > Will give that a go. > > Thanks > > Sent from my iPhone > > On 21 Jan 2009, at 12:26, "Ganesh" wrote: > > I am closing the old reader and it is working fine for me. Refer to >> IndexReader.Reopen javadoc. >> >> ///Below is the code snipper from IndexReader.reopen javadoc >> >> IndexReader reader = ... >> ... >> IndexReader new = r.reopen(); >> if (new != reader) { >> ... // reader was reopened >> reader.close(); //Old reader is closed. >> } >> reader = new; >> >> Regards >> Ganesh >> >> - Original Message - From: "Amin Mohammed-Coleman" < >> ami...@gmail.com> >> To: >> Cc: >> Sent: Wednesday, January 21, 2009 1:07 AM >> >> Subject: Re: Indexing and Searching Web Application >> >> >> Hi >>> >>> Yes I am using the reopen method on indexreader. I am not closing the >>> old indexer as per Ganesh's instruction. It seems to be working correctly >>> so I presume it's ok not to close. >>> >>> Thanks >>> >>> >>> Amin >>> >>> On 20 Jan 2009, at 19:27, "Angel, Eric" wrote: >>> >>> There's a reopen() method in the IndexReader class. You can use that. >>>> >>>> -Original Message- >>>> From: Amin Mohammed-Coleman [mailto:ami...@gmail.com] >>>> Sent: Tuesday, January 20, 2009 5:02 AM >>>> To: java-user@lucene.apache.org >>>> Subject: Re: Indexing and Searching Web Application >>>> >>>> Am I supposed to close the oldIndexReader? I just tried this and I get >>>> an >>>> exception stating that the IndexReader is closed. >>>> >>>> Cheers >>>> >>>> On Tue, Jan 20, 2009 at 9:33 AM, Ganesh wrote: >>>> >>>> Reopen the reader, only if it is modified. >>>>> >>>>> IndexReader oldIndexReader = indexSearcher.getIndexReader(); >>>>> if (!oldIndexReader.isCurrent()) { >>>>> IndexReader newIndexReader = oldIndexReader.reOpen(); >>>>> oldIndexReader.close(); >>>>> indexSearcher.close(); >>>>> IndexSearcher indexSearch = new IndexSearcher(newIndexReader); >>>>> } >>>>> >>>>> Regards >>>>> Ganesh >>>>> >>>>> - Original Message - From: "Amin Mohammed-Coleman" < >>>>> ami...@gmail.com> >>>>> To: >>>>> Sent: Tuesday, January 20, 2009 1:38 PM >>>>> Subject: Re: Indexing and Searching Web Application >>>>> >>>>> >>>>> >>>>> Hi >>>>> >>>>>> >>>>>> After your email I had a look around and came up with the below >>>>>> solution (I'm not sure if this is the right approach or there is a >>>>>> performance implication to doing this) >>>>>> >>>>>> public Summary[] search(SearchRequest searchRequest) { >>>>>> List summaryList = new ArrayList(); >>>>>> StopWatch stopWatch = new StopWatch("searchStopWatch"); >>>>>> stopWatch.start(); >>>>>> MultiSearcher multiSearcher = null; >>>>>> List newIndexSearchers = new >>>>>> ArrayList(); >>>>>> try { >>>>>> for (IndexSearcher indexSearcher: searchers) { >>>>>> IndexReader indexReader = indexSearcher.getIndexReader().reopen(); >>>>>> IndexSearcher indexSearch = new IndexSearcher(indexReader); >>>>>> newIndexSearchers.add(indexSearch); >>>>>> } >>>>>> >>>>>> multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new >>>>>> IndexSearcher[] {})); >>>>>> QueryParser queryParser = new >>>>>> MultiFieldQ
Re: Indexing and Searching Web Application
Hi, That is what I am doing with the line: indexSearchers.add(indexSearch); indexSearchers is an ArrayList that is constructed before the for loop: List indexSearchers = new ArrayList(); I then pass the indexSearchers to : multiSearcher = new MultiSearcher(indexSearchers.toArray(new IndexSearcher[] {})); Cheers On 21 Jan 2009, at 20:19, Ian Lea wrote: I haven't been following this thread, but shouldn't you be replacing the old searcher in your list of searchers rather than just adding the new one on the end? Could be wrong - I find the names in your code snippet rather confusing. -- Ian. On Wed, Jan 21, 2009 at 6:59 PM, Amin Mohammed-Coleman > wrote: Hi I did the following according to java docs: for (IndexSearcher indexSearcher: searchers) { IndexReader reader = indexSearcher.getIndexReader(); IndexReader newReader = reader.reopen(); if (newReader != reader) { reader.close(); } reader = newReader; IndexSearcher indexSearch = new IndexSearcher(reader); indexSearchers.add(indexSearch); } First search works ok, susequent search result in: org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed Cheers On Wed, Jan 21, 2009 at 1:47 PM, Amin Mohammed-Coleman >wrote: Hi Will give that a go. Thanks Sent from my iPhone On 21 Jan 2009, at 12:26, "Ganesh" wrote: I am closing the old reader and it is working fine for me. Refer to IndexReader.Reopen javadoc. ///Below is the code snipper from IndexReader.reopen javadoc IndexReader reader = ... ... IndexReader new = r.reopen(); if (new != reader) { ... // reader was reopened reader.close(); //Old reader is closed. } reader = new; Regards Ganesh - Original Message - From: "Amin Mohammed-Coleman" < ami...@gmail.com> To: Cc: Sent: Wednesday, January 21, 2009 1:07 AM Subject: Re: Indexing and Searching Web Application Hi Yes I am using the reopen method on indexreader. I am not closing the old indexer as per Ganesh's instruction. It seems to be working correctly so I presume it's ok not to close. Thanks Amin On 20 Jan 2009, at 19:27, "Angel, Eric" wrote: There's a reopen() method in the IndexReader class. You can use that. -Original Message- From: Amin Mohammed-Coleman [mailto:ami...@gmail.com] Sent: Tuesday, January 20, 2009 5:02 AM To: java-user@lucene.apache.org Subject: Re: Indexing and Searching Web Application Am I supposed to close the oldIndexReader? I just tried this and I get an exception stating that the IndexReader is closed. Cheers On Tue, Jan 20, 2009 at 9:33 AM, Ganesh wrote: Reopen the reader, only if it is modified. IndexReader oldIndexReader = indexSearcher.getIndexReader(); if (!oldIndexReader.isCurrent()) { IndexReader newIndexReader = oldIndexReader.reOpen(); oldIndexReader.close(); indexSearcher.close(); IndexSearcher indexSearch = new IndexSearcher(newIndexReader); } Regards Ganesh - Original Message - From: "Amin Mohammed-Coleman" < ami...@gmail.com> To: Sent: Tuesday, January 20, 2009 1:38 PM Subject: Re: Indexing and Searching Web Application Hi After your email I had a look around and came up with the below solution (I'm not sure if this is the right approach or there is a performance implication to doing this) public Summary[] search(SearchRequest searchRequest) { List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; List newIndexSearchers = new ArrayList(); try { for (IndexSearcher indexSearcher: searchers) { IndexReader indexReader = indexSearcher.getIndexReader().reopen(); IndexSearcher indexSearch = new IndexSearcher(indexReader); newIndexSearchers.add(indexSearch); } multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new IndexSearcher[] {})); QueryParser queryParser = new MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query query = queryParser.parse(searchRequest.getSearchTerm()); //TODO: Sort and Filters TopDocs topDocs = multiSearcher.search(query, 100); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = " +topDocs.totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } The searchers are configured in spring using which looks like this: class="
Re: Indexing and Searching Web Application
Hi I am trying to get an understanding and what the best practice is. I am not saying that I am right, it may well be that my code is wrong, that is why I am posting this. The original loop that I am iterating over is a spring injected dependency. I don't reuse that in the multisearcher. I create a new list (local variable) when I invoke the search method. So I'm not sure how I can be adding to an existing list. I presume it's a bad idea not to close the indexreader in this case. Cheers On 21 Jan 2009, at 20:43, Ian Lea wrote: Oh well, it's your code so I guess you know what it does. But I still think you're wrong. If your list contains 3 searchers at the top of the loop and all 3 need to be reopened then the list will contain 6 searchers at the end of the loop, and the first 3 will be for readers that you've just closed. Hence the already closed exception when you try to use them. -- Ian. On Wed, Jan 21, 2009 at 8:24 PM, Amin Mohammed-Coleman > wrote: Hi, That is what I am doing with the line: indexSearchers.add(indexSearch); indexSearchers is an ArrayList that is constructed before the for loop: List indexSearchers = new ArrayList(); I then pass the indexSearchers to : multiSearcher = new MultiSearcher(indexSearchers.toArray(new IndexSearcher[] {})); Cheers On 21 Jan 2009, at 20:19, Ian Lea wrote: I haven't been following this thread, but shouldn't you be replacing the old searcher in your list of searchers rather than just adding the new one on the end? Could be wrong - I find the names in your code snippet rather confusing. -- Ian. On Wed, Jan 21, 2009 at 6:59 PM, Amin Mohammed-Coleman > wrote: Hi I did the following according to java docs: for (IndexSearcher indexSearcher: searchers) { IndexReader reader = indexSearcher.getIndexReader(); IndexReader newReader = reader.reopen(); if (newReader != reader) { reader.close(); } reader = newReader; IndexSearcher indexSearch = new IndexSearcher(reader); indexSearchers.add(indexSearch); } First search works ok, susequent search result in: org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed Cheers On Wed, Jan 21, 2009 at 1:47 PM, Amin Mohammed-Coleman wrote: Hi Will give that a go. Thanks Sent from my iPhone On 21 Jan 2009, at 12:26, "Ganesh" wrote: I am closing the old reader and it is working fine for me. Refer to IndexReader.Reopen javadoc. ///Below is the code snipper from IndexReader.reopen javadoc IndexReader reader = ... ... IndexReader new = r.reopen(); if (new != reader) { ... // reader was reopened reader.close(); //Old reader is closed. } reader = new; Regards Ganesh - Original Message - From: "Amin Mohammed-Coleman" < ami...@gmail.com> To: Cc: Sent: Wednesday, January 21, 2009 1:07 AM Subject: Re: Indexing and Searching Web Application Hi Yes I am using the reopen method on indexreader. I am not closing the old indexer as per Ganesh's instruction. It seems to be working correctly so I presume it's ok not to close. Thanks Amin On 20 Jan 2009, at 19:27, "Angel, Eric" wrote: There's a reopen() method in the IndexReader class. You can use that. -Original Message- From: Amin Mohammed-Coleman [mailto:ami...@gmail.com] Sent: Tuesday, January 20, 2009 5:02 AM To: java-user@lucene.apache.org Subject: Re: Indexing and Searching Web Application Am I supposed to close the oldIndexReader? I just tried this and I get an exception stating that the IndexReader is closed. Cheers On Tue, Jan 20, 2009 at 9:33 AM, Ganesh wrote: Reopen the reader, only if it is modified. IndexReader oldIndexReader = indexSearcher.getIndexReader(); if (!oldIndexReader.isCurrent()) { IndexReader newIndexReader = oldIndexReader.reOpen(); oldIndexReader.close(); indexSearcher.close(); IndexSearcher indexSearch = new IndexSearcher(newIndexReader); } Regards Ganesh - Original Message - From: "Amin Mohammed-Coleman" < ami...@gmail.com> To: Sent: Tuesday, January 20, 2009 1:38 PM Subject: Re: Indexing and Searching Web Application Hi After your email I had a look around and came up with the below solution (I'm not sure if this is the right approach or there is a performance implication to doing this) public Summary[] search(SearchRequest searchRequest) { List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; List newIndexSearchers = new ArrayList(); try { for (IndexSearcher indexSearcher: searchers) { IndexReader indexReader = indexSearcher.getIndexReader().reopen(); IndexSearcher indexSearch = new IndexSearcher(indexReader); newIndexSearchers.add(indexSearch); } multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new IndexSearcher[] {})); QueryParser
Re: Indexing and Searching Web Application
Hi Thanks for your reply. You right it looks as the original list is the problem. The list I loop over is spring configured to return a list of index searcher. Each index searcher looks at different indexes. I would like to inject the list of index searchers as we may have requirement to add new searchers. This would mean configuring spring config file. Trying to remove old index searcher gives me concurrent modification exception. Hmmm. Is there another approach I can take? Cheers Sent from my iPhone On 21 Jan 2009, at 22:32, Erick Erickson wrote: NOTE: you're iterating over 'searchers' and adding to indexSearchers. Is that a typo? Assuming that it's not and your 'searchers' is the copy you talk about (so you can freely add?) you never delete from the underlying indexSearchers. But you do close elements because you're closing a reference to the searcher that points to the same underlying object. Assuming that's not the problem, here's what I'd suggest. Log the count of searchers just above your loop... print searchers.size(); (or whatever). for (IndexSearcher indexSearcher: searchers) { Ian claims that the first time you'll see some number X The second time you'll see X + Y. You have to be getting this list of servers from someplace. Wherever it is, it's (probably) persisted across calls, because if it isn't you wouldn't have any open readers to close. Are you sure your local variable isn't just a reference to the underlying (permanent) list? See inline comments for (IndexSearcher indexSearcher: searchers) { IndexReader reader = indexSearcher.getIndexReader(); IndexReader newReader = reader.reopen(); if (newReader != reader) { reader.close(); } [EOE} you have not removed the instance of the searcher from searchers (the var in your for loop) but you have closed it. So next time your code tries to use it, you've already closed it. reader = newReader; IndexSearcher indexSearch = new IndexSearcher(reader); [EOE] This adds the newly opened searcher to the end of your array. The original (closed) one is still there. indexSearchers.add(indexSearch); } [EOE] So if you use searchers anywhere from here on, it's got closed readers in it if you closed any of them. Best Erick On Wed, Jan 21, 2009 at 4:19 PM, Amin Mohammed-Coleman >wrote: Hi I am trying to get an understanding and what the best practice is. I am not saying that I am right, it may well be that my code is wrong, that is why I am posting this. The original loop that I am iterating over is a spring injected dependency. I don't reuse that in the multisearcher. I create a new list (local variable) when I invoke the search method. So I'm not sure how I can be adding to an existing list. I presume it's a bad idea not to close the indexreader in this case. Cheers On 21 Jan 2009, at 20:43, Ian Lea wrote: Oh well, it's your code so I guess you know what it does. But I still think you're wrong. If your list contains 3 searchers at the top of the loop and all 3 need to be reopened then the list will contain 6 searchers at the end of the loop, and the first 3 will be for readers that you've just closed. Hence the already closed exception when you try to use them. -- Ian. On Wed, Jan 21, 2009 at 8:24 PM, Amin Mohammed-Coleman > wrote: Hi, That is what I am doing with the line: indexSearchers.add(indexSearch); indexSearchers is an ArrayList that is constructed before the for loop: List indexSearchers = new ArrayList(); I then pass the indexSearchers to : multiSearcher = new MultiSearcher(indexSearchers.toArray(new IndexSearcher[] {})); Cheers On 21 Jan 2009, at 20:19, Ian Lea wrote: I haven't been following this thread, but shouldn't you be replacing the old searcher in your list of searchers rather than just adding the new one on the end? Could be wrong - I find the names in your code snippet rather confusing. -- Ian. On Wed, Jan 21, 2009 at 6:59 PM, Amin Mohammed-Coleman < ami...@gmail.com> wrote: Hi I did the following according to java docs: for (IndexSearcher indexSearcher: searchers) { IndexReader reader = indexSearcher.getIndexReader(); IndexReader newReader = reader.reopen(); if (newReader != reader) { reader.close(); } reader = newReader; IndexSearcher indexSearch = new IndexSearcher(reader); indexSearchers.add(indexSearch); } First search works ok, susequent search result in: org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed Cheers On Wed, Jan 21, 2009 at 1:47 PM, Amin Mohammed-Coleman wrote: Hi Will give that a go. Thanks Sent from my iPhone On 21 Jan 2009, at 12:26, "Ganesh" wrote: I am closing the old reader and it is working fine for me. Refer to IndexReader.Reopen javadoc. ///Below is the code
Re: Indexing and Searching Web Application
Hi Please ignore my last email. I have managed to work out how to fix the problem. Sent reply without morning coffee! Thanks Amin Sent from my iPhone On 21 Jan 2009, at 22:32, Erick Erickson wrote: NOTE: you're iterating over 'searchers' and adding to indexSearchers. Is that a typo? Assuming that it's not and your 'searchers' is the copy you talk about (so you can freely add?) you never delete from the underlying indexSearchers. But you do close elements because you're closing a reference to the searcher that points to the same underlying object. Assuming that's not the problem, here's what I'd suggest. Log the count of searchers just above your loop... print searchers.size(); (or whatever). for (IndexSearcher indexSearcher: searchers) { Ian claims that the first time you'll see some number X The second time you'll see X + Y. You have to be getting this list of servers from someplace. Wherever it is, it's (probably) persisted across calls, because if it isn't you wouldn't have any open readers to close. Are you sure your local variable isn't just a reference to the underlying (permanent) list? See inline comments for (IndexSearcher indexSearcher: searchers) { IndexReader reader = indexSearcher.getIndexReader(); IndexReader newReader = reader.reopen(); if (newReader != reader) { reader.close(); } [EOE} you have not removed the instance of the searcher from searchers (the var in your for loop) but you have closed it. So next time your code tries to use it, you've already closed it. reader = newReader; IndexSearcher indexSearch = new IndexSearcher(reader); [EOE] This adds the newly opened searcher to the end of your array. The original (closed) one is still there. indexSearchers.add(indexSearch); } [EOE] So if you use searchers anywhere from here on, it's got closed readers in it if you closed any of them. Best Erick On Wed, Jan 21, 2009 at 4:19 PM, Amin Mohammed-Coleman >wrote: Hi I am trying to get an understanding and what the best practice is. I am not saying that I am right, it may well be that my code is wrong, that is why I am posting this. The original loop that I am iterating over is a spring injected dependency. I don't reuse that in the multisearcher. I create a new list (local variable) when I invoke the search method. So I'm not sure how I can be adding to an existing list. I presume it's a bad idea not to close the indexreader in this case. Cheers On 21 Jan 2009, at 20:43, Ian Lea wrote: Oh well, it's your code so I guess you know what it does. But I still think you're wrong. If your list contains 3 searchers at the top of the loop and all 3 need to be reopened then the list will contain 6 searchers at the end of the loop, and the first 3 will be for readers that you've just closed. Hence the already closed exception when you try to use them. -- Ian. On Wed, Jan 21, 2009 at 8:24 PM, Amin Mohammed-Coleman > wrote: Hi, That is what I am doing with the line: indexSearchers.add(indexSearch); indexSearchers is an ArrayList that is constructed before the for loop: List indexSearchers = new ArrayList(); I then pass the indexSearchers to : multiSearcher = new MultiSearcher(indexSearchers.toArray(new IndexSearcher[] {})); Cheers On 21 Jan 2009, at 20:19, Ian Lea wrote: I haven't been following this thread, but shouldn't you be replacing the old searcher in your list of searchers rather than just adding the new one on the end? Could be wrong - I find the names in your code snippet rather confusing. -- Ian. On Wed, Jan 21, 2009 at 6:59 PM, Amin Mohammed-Coleman < ami...@gmail.com> wrote: Hi I did the following according to java docs: for (IndexSearcher indexSearcher: searchers) { IndexReader reader = indexSearcher.getIndexReader(); IndexReader newReader = reader.reopen(); if (newReader != reader) { reader.close(); } reader = newReader; IndexSearcher indexSearch = new IndexSearcher(reader); indexSearchers.add(indexSearch); } First search works ok, susequent search result in: org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed Cheers On Wed, Jan 21, 2009 at 1:47 PM, Amin Mohammed-Coleman wrote: Hi Will give that a go. Thanks Sent from my iPhone On 21 Jan 2009, at 12:26, "Ganesh" wrote: I am closing the old reader and it is working fine for me. Refer to IndexReader.Reopen javadoc. ///Below is the code snipper from IndexReader.reopen javadoc IndexReader reader = ... ... IndexReader new = r.reopen(); if (new != reader) { ... // reader was reopened reader.close(); //Old reader is closed. } reader = new; Regards Ganesh - Original Message - From: "Amin Mohammed-Coleman" < ami...@gmail.com> To: Cc: Sent: Wednesday, January 21, 2009 1
Field.Store.YES Question
Hi I'm probably going to get shot down for asking this simple question. Although I think I understand the basic concept of Field I feel there is something that I am missing and I was wondering if someone might help to clarify. You can store a field value in an index using Field.Store.YES or if the content is too large then you can exclude it be stored in the index using Field.Store.NO. How does Lucene know how to search for a term in an index if the value hasn't been stored in the index? I guess I can understand that if you don't store the field then you can't get the field and it's value using the document api. Is there a seperate part in the lucene document that the tokenised strings are stored and therefore Lucene knows where to look? Again I do apologise for asking this question...I just feel like I'm missing something (knew I shouldn't have had those tequilla shots!). Thanks Amin
Re: Field.Store.YES Question
Thanks guys for your replies! It's helped alot! Cheers Amin On Thu, Feb 5, 2009 at 9:28 AM, Ganesh wrote: > Field.Store.Yes is to store the field data as it is, so that it could be > retrieved to display results. > Field.Index.ANALYZED, tokenizes the field and stores the tokenized content. > > Regards > Ganesh > > - Original Message - From: "Amin Mohammed-Coleman" < > ami...@gmail.com> > To: > Sent: Thursday, February 05, 2009 2:00 PM > Subject: Field.Store.YES Question > > > > Hi >> >> I'm probably going to get shot down for asking this simple question. >> Although I think I understand the basic concept of Field I feel there is >> something that I am missing and I was wondering if someone might help to >> clarify. >> >> You can store a field value in an index using Field.Store.YES or if the >> content is too large then you can exclude it be stored in the index using >> Field.Store.NO. How does Lucene know how to search for a term in an >> index >> if the value hasn't been stored in the index? I guess I can understand >> that >> if you don't store the field then you can't get the field and it's value >> using the document api. >> >> Is there a seperate part in the lucene document that the tokenised strings >> are stored and therefore Lucene knows where to look? >> >> Again I do apologise for asking this question...I just feel like I'm >> missing >> something (knew I shouldn't have had those tequilla shots!). >> >> >> Thanks >> Amin >> >> > Send instant messages to your online friends http://in.messenger.yahoo.com > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Faceted Search using Lucene
Hi I am looking at building a faceted search using Lucene. I know that Solr comes with this built in, however I would like to try this by myself (something to add to my CV!). I have been looking around and I found that you can use the IndexReader and use TermVectors. This looks ok but I'm not sure how to filter the results so that a particular user can only see a subset of results. The next option I was looking at was something like Term term1 = new Term("brand", "ford"); Term term2 = new Term("brand", "vw"); Term[] termsArray = new Term[] { term1, term2 };un int[] docFreqs = indexSearcher.docFreqs(termsArray); The only problem here is that I have to provide the brand type each time a new brand is created. Again I'm not sure how I can filter the results here. It may be that I'm using the wrong api methods to do this. I would be grateful if I could get some advice on this. Cheers Amin P.S. I am basically trying to do something that displays the following Personal Contact (23) Business Contact (45) and so on..
Re: Faceted Search using Lucene
Hi Sorry to re send this email but I was wondering if I could get some advice on this. Cheers Amin On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman wrote: Hi I am looking at building a faceted search using Lucene. I know that Solr comes with this built in, however I would like to try this by myself (something to add to my CV!). I have been looking around and I found that you can use the IndexReader and use TermVectors. This looks ok but I'm not sure how to filter the results so that a particular user can only see a subset of results. The next option I was looking at was something like Term term1 = new Term("brand", "ford"); Term term2 = new Term("brand", "vw"); Term[] termsArray = new Term[] { term1, term2 };un int[] docFreqs = indexSearcher.docFreqs(termsArray); The only problem here is that I have to provide the brand type each time a new brand is created. Again I'm not sure how I can filter the results here. It may be that I'm using the wrong api methods to do this. I would be grateful if I could get some advice on this. Cheers Amin P.S. I am basically trying to do something that displays the following Personal Contact (23) Business Contact (45) and so on..
Re: Faceted Search using Lucene
Hi Thanks just what I needed! Cheers Amin On 22 Feb 2009, at 16:11, Marcelo Ochoa wrote: Hi Amin: Please take a look a this blog post: http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html Best regards, Marcelo. On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman > wrote: Hi Sorry to re send this email but I was wondering if I could get some advice on this. Cheers Amin On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman wrote: Hi I am looking at building a faceted search using Lucene. I know that Solr comes with this built in, however I would like to try this by myself (something to add to my CV!). I have been looking around and I found that you can use the IndexReader and use TermVectors. This looks ok but I'm not sure how to filter the results so that a particular user can only see a subset of results. The next option I was looking at was something like Term term1 = new Term("brand", "ford"); Term term2 = new Term("brand", "vw"); Term[] termsArray = new Term[] { term1, term2 };un int[] docFreqs = indexSearcher.docFreqs(termsArray); The only problem here is that I have to provide the brand type each time a new brand is created. Again I'm not sure how I can filter the results here. It may be that I'm using the wrong api methods to do this. I would be grateful if I could get some advice on this. Cheers Amin P.S. I am basically trying to do something that displays the following Personal Contact (23) Business Contact (45) and so on.. -- Marcelo F. Ochoa http://marceloochoa.blogspot.com/ http://marcelo.ochoa.googlepages.com/home __ Want to integrate Lucene and Oracle? http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html Is Oracle 11g REST ready? http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Faceted Search using Lucene
Hi I have been able to get the code working for my scenario, however I have a question and I was wondering if I could get some help. I have a list of IndexSearchers which are used in a MultiSearcher class. I use the indexsearchers to get each indexreader and put them into a MultiIndexReader. IndexReader[] readers = new IndexReader[searchables.length]; for (int i =0 ; i < searchables.length;i++) { IndexSearcher indexSearcher = (IndexSearcher)searchables[i]; readers[i] = indexSearcher.getIndexReader(); IndexReader newReader = readers[i].reopen(); if (newReader != readers[i]) { readers[i].close(); } readers[i] = newReader; } multiReader = new MultiReader(readers); OpenBitSetFacetHitCounter facetHitCounter = new OpenBitSetFacetHitCounter(); IndexSearcher indexSearcher = new IndexSearcher(multiReader); I then use the indexseacher to do the facet stuff. I end the code with closing the multireader. This is causing problems in another method where I do some other search as the indexreaders are closed. Is it ok to not close the multiindexreader or should I do some additional checks in the other method to see if the indexreader is closed? Cheers P.S. Hope that made sense...! On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman wrote: > Hi > > Thanks just what I needed! > > Cheers > Amin > > > On 22 Feb 2009, at 16:11, Marcelo Ochoa wrote: > > Hi Amin: >> Please take a look a this blog post: >> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html >> Best regards, Marcelo. >> >> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman >> wrote: >> >>> Hi >>> >>> Sorry to re send this email but I was wondering if I could get some >>> advice >>> on this. >>> >>> Cheers >>> >>> Amin >>> >>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman >>> wrote: >>> >>> Hi >>>> >>>> I am looking at building a faceted search using Lucene. I know that >>>> Solr >>>> comes with this built in, however I would like to try this by myself >>>> (something to add to my CV!). I have been looking around and I found >>>> that >>>> you can use the IndexReader and use TermVectors. This looks ok but I'm >>>> not >>>> sure how to filter the results so that a particular user can only see a >>>> subset of results. The next option I was looking at was something like >>>> >>>> Term term1 = new Term("brand", "ford"); >>>> Term term2 = new Term("brand", "vw"); >>>> Term[] termsArray = new Term[] { term1, term2 };un >>>> int[] docFreqs = indexSearcher.docFreqs(termsArray); >>>> >>>> The only problem here is that I have to provide the brand type each time >>>> a >>>> new brand is created. Again I'm not sure how I can filter the results >>>> here. >>>> It may be that I'm using the wrong api methods to do this. >>>> >>>> I would be grateful if I could get some advice on this. >>>> >>>> >>>> Cheers >>>> Amin >>>> >>>> P.S. I am basically trying to do something that displays the following >>>> >>>> Personal Contact (23) Business Contact (45) and so on.. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >> >> >> -- >> Marcelo F. Ochoa >> http://marceloochoa.blogspot.com/ >> http://marcelo.ochoa.googlepages.com/home >> __ >> Want to integrate Lucene and Oracle? >> >> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html >> Is Oracle 11g REST ready? >> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >>
Re: Faceted Search using Lucene
The reason for the indexreader.reopen is because I have a webapp which enables users to upload files and then search for the documents. If I don't reopen i'm concerned that the facet hit counter won't be updated. On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman wrote: > Hi > I have been able to get the code working for my scenario, however I have a > question and I was wondering if I could get some help. I have a list of > IndexSearchers which are used in a MultiSearcher class. I use the > indexsearchers to get each indexreader and put them into a MultiIndexReader. > > IndexReader[] readers = new IndexReader[searchables.length]; > > for (int i =0 ; i < searchables.length;i++) { > > IndexSearcher indexSearcher = (IndexSearcher)searchables[i]; > > readers[i] = indexSearcher.getIndexReader(); > > IndexReader newReader = readers[i].reopen(); > > if (newReader != readers[i]) { > > readers[i].close(); > > } > > readers[i] = newReader; > > > > } > > multiReader = new MultiReader(readers); > > OpenBitSetFacetHitCounter facetHitCounter = newOpenBitSetFacetHitCounter(); > > IndexSearcher indexSearcher = new IndexSearcher(multiReader); > > > I then use the indexseacher to do the facet stuff. I end the code with > closing the multireader. This is causing problems in another method where I > do some other search as the indexreaders are closed. Is it ok to not close > the multiindexreader or should I do some additional checks in the other > method to see if the indexreader is closed? > > > > Cheers > > > P.S. Hope that made sense...! > > > On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman > wrote: > >> Hi >> >> Thanks just what I needed! >> >> Cheers >> Amin >> >> >> On 22 Feb 2009, at 16:11, Marcelo Ochoa wrote: >> >> Hi Amin: >>> Please take a look a this blog post: >>> >>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html >>> Best regards, Marcelo. >>> >>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman >>> wrote: >>> >>>> Hi >>>> >>>> Sorry to re send this email but I was wondering if I could get some >>>> advice >>>> on this. >>>> >>>> Cheers >>>> >>>> Amin >>>> >>>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman >>>> wrote: >>>> >>>> Hi >>>>> >>>>> I am looking at building a faceted search using Lucene. I know that >>>>> Solr >>>>> comes with this built in, however I would like to try this by myself >>>>> (something to add to my CV!). I have been looking around and I found >>>>> that >>>>> you can use the IndexReader and use TermVectors. This looks ok but I'm >>>>> not >>>>> sure how to filter the results so that a particular user can only see a >>>>> subset of results. The next option I was looking at was something like >>>>> >>>>> Term term1 = new Term("brand", "ford"); >>>>> Term term2 = new Term("brand", "vw"); >>>>> Term[] termsArray = new Term[] { term1, term2 };un >>>>> int[] docFreqs = indexSearcher.docFreqs(termsArray); >>>>> >>>>> The only problem here is that I have to provide the brand type each >>>>> time a >>>>> new brand is created. Again I'm not sure how I can filter the results >>>>> here. >>>>> It may be that I'm using the wrong api methods to do this. >>>>> >>>>> I would be grateful if I could get some advice on this. >>>>> >>>>> >>>>> Cheers >>>>> Amin >>>>> >>>>> P.S. I am basically trying to do something that displays the following >>>>> >>>>> Personal Contact (23) Business Contact (45) and so on.. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Marcelo F. Ochoa >>> http://marceloochoa.blogspot.com/ >>> http://marcelo.ochoa.googlepages.com/home >>> __ >>> Want to integrate Lucene and Oracle? >>> >>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html >>> Is Oracle 11g REST ready? >>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >
Re: Faceted Search using Lucene
Hi Thanks for your reply. I have modified the code to the following: public Map getFacetHitCount(String searchTerm) { QueryParser queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer); Query baseQuery = null; try { if (!StringUtils.isBlank(searchTerm)) { baseQuery = queryParser.parse(searchTerm); LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + baseQuery.toString() +"'"); } else { LOGGER.debug("No base query. Using default, which is going to check for all documents of every type."); } } catch (ParseException e1) { throw new RuntimeException(e1); } Map subQueries = constructDocTypeSubQueriesMap(); Map facetHitCount = new HashMap(); MultiReader multiReader = null; try { Searchable[] searchables = this.searchers.toArray(new Searchable[] {}).clone(); IndexReader[] readers = new IndexReader[searchables.length]; for (int i =0 ; i < searchables.length;i++) { IndexSearcher indexSearcher = (IndexSearcher)searchables[i]; readers[i] = indexSearcher.getIndexReader(); Directory directory = readers[i].directory(); IndexReader indexReader = IndexReader.open(directory); readers[i] = indexReader; } multiReader = new MultiReader(readers); OpenBitSetFacetHitCounter facetHitCounter = new OpenBitSetFacetHitCounter(); IndexSearcher indexSearcher = new IndexSearcher(multiReader); if (baseQuery != null) { facetHitCounter.setBaseQuery(baseQuery); } facetHitCounter.setSearcher(indexSearcher); facetHitCounter.setSubQueries(subQueries); facetHitCount= facetHitCounter.getFacetHitCounts(); LOGGER.debug("Document Type Facet Hit Count '" + facetHitCount + "'"); } catch (Exception e) { throw new IllegalStateException(e); } finally { try { multiReader.close(); LOGGER.debug("Closed multi reader."); } catch (IOException e) { throw new IllegalStateException(e); } } return facetHitCount; } Does this make sense? I am new to lucene and working on a complete search solution so I would be grateful for any advice on what is best practice. Cheers On Thu, Feb 26, 2009 at 7:55 AM, Michael Stoppelman wrote: > If another thread is executing a query with the handle to one of readers[i] > you're going to kill it since the IndexReader is now closed. > Just don't call the IndexReader#close() method. If nothing is pointing at > the readers they should be garbage collected. Also, you might > want to warm up your new IndexSearcher before you switch to it, meaning run > a few queries on it before you swap the old one out. > > M > > > > On Tue, Feb 24, 2009 at 12:48 PM, Amin Mohammed-Coleman >wrote: > > > The reason for the indexreader.reopen is because I have a webapp which > > enables users to upload files and then search for the documents. If I > > don't > > reopen i'm concerned that the facet hit counter won't be updated. > > > > On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman > >wrote: > > > > > Hi > > > I have been able to get the code working for my scenario, however I > have > > a > > > question and I was wondering if I could get some help. I have a list > of > > > IndexSearchers which are used in a MultiSearcher class. I use the > > > indexsearchers to get each indexreader and put them into a > > MultiIndexReader. > > > > > > IndexReader[] readers = new IndexReader[searchables.length]; > > > > > > for (int i =0 ; i < searchables.length;i++) { > > > > > > IndexSearcher indexSearcher = (IndexSearcher)searchables[i]; > > > > > > readers[i] = indexSearcher.getIndexReader(); > > > > > > IndexReader newReader = readers[i].reopen(); > > > > > > if (newReader != readers[i]) { > > > > > > readers[i].close(); > > > > > > } > > > > > > readers[i] = newReader; > > > > > > > > > > > > } > > > > > > multiReader = new MultiReader(readers); > > > > > > OpenBitSetFacetHitCounter facetHitCounter = > > newOpenBitSetFacetHitCounter(); > > > > > > IndexSearcher indexSearcher = new IndexSearcher(multiReader); > > > > > > > > > I then use the indexseacher to do the facet stuff. I end the code with > > > closing the multireader. This is causing problems in another method > > where I > > > do some other search as the indexreaders are closed. Is it ok to not > > close > > > the multiindexreader or should I do some additional checks in the other > > > method to see if the indexreader is closed? > > > > > > >
Re: Faceted Search using Lucene
Hi Thanks for your reply. Without sound completely ...silly...how do i go abouts using the methods you mentioned... Cheers Amin On Thu, Feb 26, 2009 at 10:24 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Actually, it's best to use IndexReader.incRef/decRef to track the > IndexReader. > > You should not rely on GC to close your IndexReader since this can easily > tie up resources (eg open file descriptors) for too long. > > Mike > > > Michael Stoppelman wrote: > > If another thread is executing a query with the handle to one of >> readers[i] >> you're going to kill it since the IndexReader is now closed. >> Just don't call the IndexReader#close() method. If nothing is pointing at >> the readers they should be garbage collected. Also, you might >> want to warm up your new IndexSearcher before you switch to it, meaning >> run >> a few queries on it before you swap the old one out. >> >> M >> >> >> >> On Tue, Feb 24, 2009 at 12:48 PM, Amin Mohammed-Coleman > >wrote: >> >> The reason for the indexreader.reopen is because I have a webapp which >>> enables users to upload files and then search for the documents. If I >>> don't >>> reopen i'm concerned that the facet hit counter won't be updated. >>> >>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman >> >>>> wrote: >>>> >>> >>> Hi >>>> I have been able to get the code working for my scenario, however I have >>>> >>> a >>> >>>> question and I was wondering if I could get some help. I have a list of >>>> IndexSearchers which are used in a MultiSearcher class. I use the >>>> indexsearchers to get each indexreader and put them into a >>>> >>> MultiIndexReader. >>> >>>> >>>> IndexReader[] readers = new IndexReader[searchables.length]; >>>> >>>> for (int i =0 ; i < searchables.length;i++) { >>>> >>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i]; >>>> >>>> readers[i] = indexSearcher.getIndexReader(); >>>> >>>> IndexReader newReader = readers[i].reopen(); >>>> >>>> if (newReader != readers[i]) { >>>> >>>> readers[i].close(); >>>> >>>> } >>>> >>>> readers[i] = newReader; >>>> >>>> >>>> >>>> } >>>> >>>> multiReader = new MultiReader(readers); >>>> >>>> OpenBitSetFacetHitCounter facetHitCounter = >>>> >>> newOpenBitSetFacetHitCounter(); >>> >>>> >>>> IndexSearcher indexSearcher = new IndexSearcher(multiReader); >>>> >>>> >>>> I then use the indexseacher to do the facet stuff. I end the code with >>>> closing the multireader. This is causing problems in another method >>>> >>> where I >>> >>>> do some other search as the indexreaders are closed. Is it ok to not >>>> >>> close >>> >>>> the multiindexreader or should I do some additional checks in the other >>>> method to see if the indexreader is closed? >>>> >>>> >>>> >>>> Cheers >>>> >>>> >>>> P.S. Hope that made sense...! >>>> >>>> >>>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman < >>>> ami...@gmail.com >>>> wrote: >>>> >>>> Hi >>>>> >>>>> Thanks just what I needed! >>>>> >>>>> Cheers >>>>> Amin >>>>> >>>>> >>>>> On 22 Feb 2009, at 16:11, Marcelo Ochoa >>>>> >>>> wrote: >>> >>>> >>>>> Hi Amin: >>>>> >>>>>> Please take a look a this blog post: >>>>>> >>>>>> >>>>>> >>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html >>> >>>> Best regards, Marcelo. >>>>>> >>>>>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman < >>>>>> >>>>> ami...@gmail.com> >>> >>>> wrote: >>>>>> >>>>>> Hi >>>>>>> >>>>>>> Sorry to re send this email
Re: Faceted Search using Lucene
Hi Thanks for your help. I will modify my facet search and my other code to use the recommendations. Would it be ok to get a review of the completed code? I just want to make sure that I'm not doing anything that may cause any problems (threading, memory). Cheers On Thu, Feb 26, 2009 at 1:10 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > See below -- this is an excerpt from the upcoming Lucene in Action > revision (chapter 10). > > It's a simple class. Use it like this for searching: > > IndexSearcher searcher = manager.get(); > try { >searcher.search(...). >...render results... > } finally { >manager.release(searcher); >searcher = null; > } > > When you want to reopen (application dependent), call maybeReopen. > Subclass and define the warm() method if needed. > > NOTE: this hasn't yet been heavily tested (I just quickly revised it to use > incRef/decRef). > > Mike > > import java.io.IOException; > import java.util.HashMap; > > import org.apache.lucene.search.IndexSearcher; > import org.apache.lucene.index.IndexReader; > import org.apache.lucene.store.Directory; > > /** Utility class to get/refresh searchers when you are > * using multiple threads. */ > > public class SearcherManager { > > private IndexSearcher currentSearcher; //A > private Directory dir; > > public SearcherManager(Directory dir) throws IOException { >this.dir = dir; >currentSearcher = new IndexSearcher(IndexReader.open(dir)); //B > } > > public void warm(IndexSearcher searcher) {}//C > > public void maybeReopen() throws IOException { //D >long currentVersion = currentSearcher.getIndexReader().getVersion(); >if (IndexReader.getCurrentVersion(dir) != currentVersion) { > IndexReader newReader = currentSearcher.getIndexReader().reopen(); > assert newReader != currentSearcher.getIndexReader(); > IndexSearcher newSearcher = new IndexSearcher(newReader); > warm(newSearcher); > swapSearcher(newSearcher); >} > } > > public synchronized IndexSearcher get() { //E >currentSearcher.getIndexReader().incRef(); >return currentSearcher; > } > > public synchronized void release(IndexSearcher searcher) //F >throws IOException { >searcher.getIndexReader().decRef(); > } > > private synchronized void swapSearcher(IndexSearcher newSearcher) //G > throws IOException { >release(currentSearcher); >currentSearcher = newSearcher; > } > } > > /* > #A Current IndexSearcher > #B Create initial searcher > #C Implement in subclass to warm new searcher > #D Call this to reopen searcher if index changed > #E Returns current searcher > #F Release searcher > #G Swaps currentSearcher to new searcher > */ > > Mike > > > Amin Mohammed-Coleman wrote: > > Hi >> >> Thanks for your reply. Without sound completely ...silly...how do i go >> abouts using the methods you mentioned... >> >> Cheers >> Amin >> >> On Thu, Feb 26, 2009 at 10:24 AM, Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >> >>> Actually, it's best to use IndexReader.incRef/decRef to track the >>> IndexReader. >>> >>> You should not rely on GC to close your IndexReader since this can easily >>> tie up resources (eg open file descriptors) for too long. >>> >>> Mike >>> >>> >>> Michael Stoppelman wrote: >>> >>> If another thread is executing a query with the handle to one of >>> >>>> readers[i] >>>> you're going to kill it since the IndexReader is now closed. >>>> Just don't call the IndexReader#close() method. If nothing is pointing >>>> at >>>> the readers they should be garbage collected. Also, you might >>>> want to warm up your new IndexSearcher before you switch to it, meaning >>>> run >>>> a few queries on it before you swap the old one out. >>>> >>>> M >>>> >>>> >>>> >>>> On Tue, Feb 24, 2009 at 12:48 PM, Amin Mohammed-Coleman < >>>> ami...@gmail.com >>>> >>>>> wrote: >>>>> >>>> >>>> The reason for the indexreader.reopen is because I have a webapp which >>>> >>>>> enables users to upload files and then search for the documents. If I >>>>> don't >>>>> reopen i
Re: Faceted Search using Lucene
Hi I have modified my search code. Here is the following: [code] public Summary[] search(SearchRequest searchRequest) throwsSearchExecutionException { String searchTerm = searchRequest.getSearchTerm(); if (StringUtils.isBlank(searchTerm)) { throw new SearchExecutionException("Search string cannot be empty. There will be too many results to process."); } List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; List indexSearchers = new ArrayList(); boolean refreshSearchers = false; try { LOGGER.debug("Ensuring all index readers are up to date..."); for (IndexSearcher indexSearcher: searchers) { IndexReader reader = indexSearcher.getIndexReader(); reader.incRef(); Directory directory = reader.directory(); long currentVersion = reader.getVersion(); if (IndexReader.getCurrentVersion(directory) != currentVersion) { IndexReader newReader = reader.reopen(); if (newReader != reader) { reader.decRef(); refreshSearchers = true; } reader = newReader; } IndexSearcher indexSearch = new IndexSearcher(reader); indexSearchers.add(indexSearch); } if (refreshSearchers) { searchers.clear(); searchers = new ArrayList(indexSearchers); } LOGGER.debug("All Index Searchers are up to date. No of index searchers '" + indexSearchers.size() +"'"); multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[] {})); PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper( analyzer); analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(), newKeywordAnalyzer()); QueryParser queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzerWrapper); Query query = queryParser.parse(searchTerm); LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + query.toString() +"'"); Sort sort = null; sort = applySortIfApplicable(searchRequest); Filter[] filters =applyFiltersIfApplicable(searchRequest); ChainedFilter chainedFilter = null; if (filters != null) { chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); } TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs. totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } multiSearcher.close(); } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for document seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } [/code] Just some background: There is a list of indexsearchers that are injected via Spring. These searchers are configured again by Spring. As you can see the multisearcher is a local variable. I then have a variable that checks if a indexreader is not up to date. When this is set to true the indexsearchers are refreshed. I would be grateful on your thoughts. On Thu, Feb 26, 2009 at 1:35 PM, Amin Mohammed-Coleman wrote: > Hi > > Thanks for your help. I will modify my facet search and my other code to > use the recommendations. Would it be ok to get a review of the completed > code? I just want to make sure that I'm not doing anything that may cause > any problems (threading, memory). > > Cheers > > > On Thu, Feb 26, 2009 at 1:10 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> >> See below -- this is an excerpt from the upcoming Lucene in Action >> revision (chapter 10). >> >> It's a simple class. Use it like this for searching: >> >> IndexSearcher searcher = manager.get(); >> try { >>searcher.search(...). >>...render results... >> } finally { >>manager.release(searcher); >>searcher = null; >> } >> >> When you want to reopen (application dependent), call maybeReopen. >> Subclass and define the warm() method if needed. >> >> NOTE: this hasn't yet been heavily tested (I just quickly revised it to >> use >> incRef/decRef). >> >> Mike >> >> import java.io.IOException; >> import java.util.HashMap; >> >> import org.apache.lucene.search.IndexSearcher; >> import org.apache.lucene.index.IndexReader; >> import org.apache.lucene.store.Directory; >> >> /** Utility class to get/refresh searchers when you are >&
Re: Faceted Search using Lucene
Forgot to mention that the previous code that i sent was related to facet search. This is a general search method I have implemented (they can probably be combined...). On Thu, Feb 26, 2009 at 8:21 PM, Amin Mohammed-Coleman wrote: > Hi > I have modified my search code. Here is the following: > [code] > > public Summary[] search(SearchRequest searchRequest) > throwsSearchExecutionException { > > String searchTerm = searchRequest.getSearchTerm(); > > if (StringUtils.isBlank(searchTerm)) { > > throw new SearchExecutionException("Search string cannot be empty. There > will be too many results to process."); > > } > > List summaryList = new ArrayList(); > > StopWatch stopWatch = new StopWatch("searchStopWatch"); > > stopWatch.start(); > > MultiSearcher multiSearcher = null; > > List indexSearchers = new ArrayList(); > > boolean refreshSearchers = false; > > try { > > LOGGER.debug("Ensuring all index readers are up to date..."); > > for (IndexSearcher indexSearcher: searchers) { > > IndexReader reader = indexSearcher.getIndexReader(); > > reader.incRef(); > > Directory directory = reader.directory(); > > > > long currentVersion = reader.getVersion(); > > if (IndexReader.getCurrentVersion(directory) != currentVersion) { > > IndexReader newReader = reader.reopen(); > > if (newReader != reader) { > > reader.decRef(); > > refreshSearchers = true; > > } > > reader = newReader; > > } > > IndexSearcher indexSearch = new IndexSearcher(reader); > > indexSearchers.add(indexSearch); > > } > > if (refreshSearchers) { > > searchers.clear(); > > searchers = new ArrayList(indexSearchers); > > } > > LOGGER.debug("All Index Searchers are up to date. No of index searchers '"+ > indexSearchers.size() + > "'"); > > multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[] > {})); > > PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper( > analyzer); > > analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(), > newKeywordAnalyzer()); > > QueryParser queryParser = > newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), > analyzerWrapper); > > Query query = queryParser.parse(searchTerm); > > LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + > query.toString() +"'"); > > Sort sort = null; > > sort = applySortIfApplicable(searchRequest); > > Filter[] filters =applyFiltersIfApplicable(searchRequest); > > ChainedFilter chainedFilter = null; > > if (filters != null) { > > chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); > > } > > TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort); > > ScoreDoc[] scoreDocs = topDocs.scoreDocs; > > LOGGER.debug("total number of hits for [" + query.toString() + " ] = > "+topDocs. > totalHits); > > for (ScoreDoc scoreDoc : scoreDocs) { > > final Document doc = multiSearcher.doc(scoreDoc.doc); > > float score = scoreDoc.score; > > final BaseDocument baseDocument = new BaseDocument(doc, score); > > Summary documentSummary = new DocumentSummaryImpl(baseDocument); > > summaryList.add(documentSummary); > > } > > multiSearcher.close(); > > } catch (Exception e) { > > throw new IllegalStateException(e); > > } > > stopWatch.stop(); > > LOGGER.debug("total time taken for document seach: " + > stopWatch.getTotalTimeMillis() + " ms"); > > return summaryList.toArray(new Summary[] {}); > > } > > [/code] > > Just some background: > > There is a list of indexsearchers that are injected via Spring. These > searchers are configured again by Spring. As you can see the multisearcher > is a local variable. I then have a variable that checks if a indexreader is > not up to date. When this is set to true the indexsearchers are refreshed. > > I would be grateful on your thoughts. > > > On Thu, Feb 26, 2009 at 1:35 PM, Amin Mohammed-Coleman > wrote: > >> Hi >> >> Thanks for your help. I will modify my facet search and my other code to >> use the recommendations. Would it be ok to get a review of the completed >> code? I just want to make sure that I'm not doing anything that may cause >> any problems (threading, memory). >> >> Cheers >> >> >> On Thu, Feb 26, 2009 at 1:10 PM, Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >>&
Re: Faceted Search using Lucene
Hi Thanks for your input. I would like to have a go at doing this myself first, Solr may be an option. * You are creating a new Analyzer & QueryParser every time, also creating unnecessary garbage; instead, they should be created once & reused. -- I can moved the code out so that it is only created once and reused. * You always make a new IndexSearcher and a new MultiSearcher even when nothing has changed. This just generates unnecessary garbage which GC then must sweep up. -- This was something I thought about. I could move it out so that it's created once. However I presume inside my code i need to check whether the indexreaders are update to date. This needs to be synchronized as well I guess(?) * I don't see any synchronization -- it looks like two search requests are allowed into this method at the same time? Which is dangerous... eg both (or, more) will wastefully reopen the readers. -- So i need to extract the logic for reopening and provide a synchronisation mechanism. Ok. So I have some work to do. I'll refactor the code and see if I can get inline to your recommendations. On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > On a quick look, I think there are a few problems with the code: > > * I don't see any synchronization -- it looks like two search >requests are allowed into this method at the same time? Which is >dangerous... eg both (or, more) will wastefully reopen the >readers. > > * You are over-incRef'ing (the reader.incRef inside the loop) -- I >don't see a corresponding decRef. > > * You reopen and warm your searchers "live" (vs with BG thread); >meaning the unlucky search request that hits a reopen pays the >cost. This might be OK if the index is small enough that >reopening & warming takes very little time. But if index gets >large, making a random search pay that warming cost is not nice to >the end user. It erodes their trust in you. > > * You always make a new IndexSearcher and a new MultiSearcher even >when nothing has changed. This just generates unnecessary garbage >which GC then must sweep up. > > * You are creating a new Analyzer & QueryParser every time, also >creating unnecessary garbage; instead, they should be created once >& reused. > > You should consider simply using Solr -- it handles all this logic for > you and has been well debugged with time... > > Mike > > Amin Mohammed-Coleman wrote: > > The reason for the indexreader.reopen is because I have a webapp which >> enables users to upload files and then search for the documents. If I >> don't >> reopen i'm concerned that the facet hit counter won't be updated. >> >> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman > >wrote: >> >> Hi >>> I have been able to get the code working for my scenario, however I have >>> a >>> question and I was wondering if I could get some help. I have a list of >>> IndexSearchers which are used in a MultiSearcher class. I use the >>> indexsearchers to get each indexreader and put them into a >>> MultiIndexReader. >>> >>> IndexReader[] readers = new IndexReader[searchables.length]; >>> >>> for (int i =0 ; i < searchables.length;i++) { >>> >>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i]; >>> >>> readers[i] = indexSearcher.getIndexReader(); >>> >>> IndexReader newReader = readers[i].reopen(); >>> >>> if (newReader != readers[i]) { >>> >>> readers[i].close(); >>> >>> } >>> >>> readers[i] = newReader; >>> >>> >>> >>> } >>> >>> multiReader = new MultiReader(readers); >>> >>> OpenBitSetFacetHitCounter facetHitCounter = >>> newOpenBitSetFacetHitCounter(); >>> >>> IndexSearcher indexSearcher = new IndexSearcher(multiReader); >>> >>> >>> I then use the indexseacher to do the facet stuff. I end the code with >>> closing the multireader. This is causing problems in another method >>> where I >>> do some other search as the indexreaders are closed. Is it ok to not >>> close >>> the multiindexreader or should I do some additional checks in the other >>> method to see if the indexreader is closed? >>> >>> >>> >>> Cheers >>> >>> >>> P.S. Hope that made sense...! >>> >>> >>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed
Re: Faceted Search using Lucene
thanks. i will rewrite..in between giving my baby her feed and playing with the other child and my wife who wants me to do several other things! On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Amin Mohammed-Coleman wrote: > > Hi >> Thanks for your input. I would like to have a go at doing this myself >> first, Solr may be an option. >> >> * You are creating a new Analyzer & QueryParser every time, also >> creating unnecessary garbage; instead, they should be created once >> & reused. >> >> -- I can moved the code out so that it is only created once and reused. >> >> >> * You always make a new IndexSearcher and a new MultiSearcher even >> when nothing has changed. This just generates unnecessary garbage >> which GC then must sweep up. >> >> -- This was something I thought about. I could move it out so that it's >> created once. However I presume inside my code i need to check whether >> the >> indexreaders are update to date. This needs to be synchronized as well I >> guess(?) >> > > Yes you should synchronize the check for whether the IndexReader is > current. > > * I don't see any synchronization -- it looks like two search >> requests are allowed into this method at the same time? Which is >> dangerous... eg both (or, more) will wastefully reopen the >> readers. >> -- So i need to extract the logic for reopening and provide a >> synchronisation mechanism. >> > > Yes. > > > Ok. So I have some work to do. I'll refactor the code and see if I can >> get >> inline to your recommendations. >> >> >> On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >> >>> On a quick look, I think there are a few problems with the code: >>> >>> * I don't see any synchronization -- it looks like two search >>> requests are allowed into this method at the same time? Which is >>> dangerous... eg both (or, more) will wastefully reopen the >>> readers. >>> >>> * You are over-incRef'ing (the reader.incRef inside the loop) -- I >>> don't see a corresponding decRef. >>> >>> * You reopen and warm your searchers "live" (vs with BG thread); >>> meaning the unlucky search request that hits a reopen pays the >>> cost. This might be OK if the index is small enough that >>> reopening & warming takes very little time. But if index gets >>> large, making a random search pay that warming cost is not nice to >>> the end user. It erodes their trust in you. >>> >>> * You always make a new IndexSearcher and a new MultiSearcher even >>> when nothing has changed. This just generates unnecessary garbage >>> which GC then must sweep up. >>> >>> * You are creating a new Analyzer & QueryParser every time, also >>> creating unnecessary garbage; instead, they should be created once >>> & reused. >>> >>> You should consider simply using Solr -- it handles all this logic for >>> you and has been well debugged with time... >>> >>> Mike >>> >>> Amin Mohammed-Coleman wrote: >>> >>> The reason for the indexreader.reopen is because I have a webapp which >>> >>>> enables users to upload files and then search for the documents. If I >>>> don't >>>> reopen i'm concerned that the facet hit counter won't be updated. >>>> >>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman < >>>> ami...@gmail.com >>>> >>>>> wrote: >>>>> >>>> >>>> Hi >>>> >>>>> I have been able to get the code working for my scenario, however I >>>>> have >>>>> a >>>>> question and I was wondering if I could get some help. I have a list >>>>> of >>>>> IndexSearchers which are used in a MultiSearcher class. I use the >>>>> indexsearchers to get each indexreader and put them into a >>>>> MultiIndexReader. >>>>> >>>>> IndexReader[] readers = new IndexReader[searchables.length]; >>>>> >>>>> for (int i =0 ; i < searchables.length;i++) { >>>>> >>>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i]; >>>>> >>>>> readers[i] = indexSearcher.ge
Re: Faceted Search using Lucene
just a quick point: public void maybeReopen() throws IOException { //D long currentVersion = currentSearcher.getIndexReader().getVersion(); if (IndexReader.getCurrentVersion(dir) != currentVersion) { IndexReader newReader = currentSearcher.getIndexReader().reopen(); assert newReader != currentSearcher.getIndexReader(); IndexSearcher newSearcher = new IndexSearcher(newReader); warm(newSearcher); swapSearcher(newSearcher); } } should the above be synchronised? On Sun, Mar 1, 2009 at 1:25 PM, Amin Mohammed-Coleman wrote: > thanks. i will rewrite..in between giving my baby her feed and playing > with the other child and my wife who wants me to do several other things! > > > > On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> >> Amin Mohammed-Coleman wrote: >> >> Hi >>> Thanks for your input. I would like to have a go at doing this myself >>> first, Solr may be an option. >>> >>> * You are creating a new Analyzer & QueryParser every time, also >>> creating unnecessary garbage; instead, they should be created once >>> & reused. >>> >>> -- I can moved the code out so that it is only created once and reused. >>> >>> >>> * You always make a new IndexSearcher and a new MultiSearcher even >>> when nothing has changed. This just generates unnecessary garbage >>> which GC then must sweep up. >>> >>> -- This was something I thought about. I could move it out so that it's >>> created once. However I presume inside my code i need to check whether >>> the >>> indexreaders are update to date. This needs to be synchronized as well I >>> guess(?) >>> >> >> Yes you should synchronize the check for whether the IndexReader is >> current. >> >> * I don't see any synchronization -- it looks like two search >>> requests are allowed into this method at the same time? Which is >>> dangerous... eg both (or, more) will wastefully reopen the >>> readers. >>> -- So i need to extract the logic for reopening and provide a >>> synchronisation mechanism. >>> >> >> Yes. >> >> >> Ok. So I have some work to do. I'll refactor the code and see if I can >>> get >>> inline to your recommendations. >>> >>> >>> On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >>> >>>> On a quick look, I think there are a few problems with the code: >>>> >>>> * I don't see any synchronization -- it looks like two search >>>> requests are allowed into this method at the same time? Which is >>>> dangerous... eg both (or, more) will wastefully reopen the >>>> readers. >>>> >>>> * You are over-incRef'ing (the reader.incRef inside the loop) -- I >>>> don't see a corresponding decRef. >>>> >>>> * You reopen and warm your searchers "live" (vs with BG thread); >>>> meaning the unlucky search request that hits a reopen pays the >>>> cost. This might be OK if the index is small enough that >>>> reopening & warming takes very little time. But if index gets >>>> large, making a random search pay that warming cost is not nice to >>>> the end user. It erodes their trust in you. >>>> >>>> * You always make a new IndexSearcher and a new MultiSearcher even >>>> when nothing has changed. This just generates unnecessary garbage >>>> which GC then must sweep up. >>>> >>>> * You are creating a new Analyzer & QueryParser every time, also >>>> creating unnecessary garbage; instead, they should be created once >>>> & reused. >>>> >>>> You should consider simply using Solr -- it handles all this logic for >>>> you and has been well debugged with time... >>>> >>>> Mike >>>> >>>> Amin Mohammed-Coleman wrote: >>>> >>>> The reason for the indexreader.reopen is because I have a webapp which >>>> >>>>> enables users to upload files and then search for the documents. If I >>>>> don't >>>>> reopen i'm concerned that the facet hit counter won't be updated. >>>>> >>>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman < >>>>&
Re: Faceted Search using Lucene
Hi I've now done the following: public Summary[] search(final SearchRequest searchRequest) throwsSearchExecutionException { final String searchTerm = searchRequest.getSearchTerm(); if (StringUtils.isBlank(searchTerm)) { throw new SearchExecutionException("Search string cannot be empty. There will be too many results to process."); } List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); List indexSearchers = new ArrayList(); try { LOGGER.debug("Ensuring all index readers are up to date..."); maybeReopen(); LOGGER.debug("All Index Searchers are up to date. No of index searchers '" + indexSearchers.size() +"'"); Query query = queryParser.parse(searchTerm); LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + query.toString() +"'"); Sort sort = null; sort = applySortIfApplicable(searchRequest); Filter[] filters =applyFiltersIfApplicable(searchRequest); ChainedFilter chainedFilter = null; if (filters != null) { chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); } TopDocs topDocs = get().search(query,chainedFilter ,100,sort); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs. totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } multiSearcher.close(); } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for document seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } And have the following methods: @PostConstruct public void initialiseQueryParser() { PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper( analyzer); analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(), newKeywordAnalyzer()); queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzerWrapper); try { LOGGER.debug("Initialising multi searcher "); this.multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[] {})); LOGGER.debug("multi searcher initialised"); } catch (IOException e) { throw new IllegalStateException(e); } } Initialises mutltisearcher when this class is creared by spring. private synchronized void swapMultiSearcher(MultiSearcher newMultiSearcher) { try { release(multiSearcher); } catch (IOException e) { throw new IllegalStateException(e); } multiSearcher = newMultiSearcher; } public void maybeReopen() throws IOException { MultiSearcher newMultiSeacher = null; boolean refreshMultiSeacher = false; List indexSearchers = new ArrayList(); synchronized (searchers) { for (IndexSearcher indexSearcher: searchers) { IndexReader reader = indexSearcher.getIndexReader(); reader.incRef(); Directory directory = reader.directory(); long currentVersion = reader.getVersion(); if (IndexReader.getCurrentVersion(directory) != currentVersion) { IndexReader newReader = indexSearcher.getIndexReader().reopen(); if (newReader != reader) { reader.decRef(); refreshMultiSeacher = true; } reader = newReader; IndexSearcher newSearcher = new IndexSearcher(newReader); indexSearchers.add(newSearcher); } } } if (refreshMultiSeacher) { newMultiSeacher = new MultiSearcher(indexSearchers.toArray(newIndexSearcher[] {})); warm(newMultiSeacher); swapMultiSearcher(newMultiSeacher); } } private void warm(MultiSearcher newMultiSeacher) { } private synchronized MultiSearcher get() { for (IndexSearcher indexSearcher: searchers) { indexSearcher.getIndexReader().incRef(); } return multiSearcher; } private synchronized void release(MultiSearcher multiSearcher) throwsIOException { for (IndexSearcher indexSearcher: searchers) { indexSearcher.getIndexReader().decRef(); } } However I am now getting java.lang.IllegalStateException: org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed on the call: private synchronized MultiSearcher get() { for (IndexSearcher indexSearcher: searchers) { indexSearcher.getIndexReader().incRef(); } return multiSearcher; } I'm doing something wrong ..obviously..not sure where though.. Cheers On Sun, Mar 1, 2009 at 1:36 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > I was wondering the same thing ;) > > It's best to call this method from a single BG "warming" thread, in which > case it would not need its own synchronization. > > But, to be safe, I'll add intern
Re: Faceted Search using Lucene
sorrry I added release(multiSearcher); instead of multiSearcher.close(); On Sun, Mar 1, 2009 at 2:17 PM, Amin Mohammed-Coleman wrote: > Hi > I've now done the following: > > public Summary[] search(final SearchRequest searchRequest) > throwsSearchExecutionException { > > final String searchTerm = searchRequest.getSearchTerm(); > > if (StringUtils.isBlank(searchTerm)) { > > throw new SearchExecutionException("Search string cannot be empty. There > will be too many results to process."); > > } > > List summaryList = new ArrayList(); > > StopWatch stopWatch = new StopWatch("searchStopWatch"); > > stopWatch.start(); > > List indexSearchers = new ArrayList(); > > try { > > LOGGER.debug("Ensuring all index readers are up to date..."); > > maybeReopen(); > > LOGGER.debug("All Index Searchers are up to date. No of index searchers '"+ > indexSearchers.size() + > "'"); > > Query query = queryParser.parse(searchTerm); > > LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + > query.toString() +"'"); > > Sort sort = null; > > sort = applySortIfApplicable(searchRequest); > > Filter[] filters =applyFiltersIfApplicable(searchRequest); > > ChainedFilter chainedFilter = null; > > if (filters != null) { > > chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); > > } > > TopDocs topDocs = get().search(query,chainedFilter ,100,sort); > > ScoreDoc[] scoreDocs = topDocs.scoreDocs; > > LOGGER.debug("total number of hits for [" + query.toString() + " ] = > "+topDocs. > totalHits); > > for (ScoreDoc scoreDoc : scoreDocs) { > > final Document doc = multiSearcher.doc(scoreDoc.doc); > > float score = scoreDoc.score; > > final BaseDocument baseDocument = new BaseDocument(doc, score); > > Summary documentSummary = new DocumentSummaryImpl(baseDocument); > > summaryList.add(documentSummary); > > } > > multiSearcher.close(); > > } catch (Exception e) { > > throw new IllegalStateException(e); > > } > > stopWatch.stop(); > > LOGGER.debug("total time taken for document seach: " + > stopWatch.getTotalTimeMillis() + " ms"); > > return summaryList.toArray(new Summary[] {}); > > } > > > And have the following methods: > > @PostConstruct > > public void initialiseQueryParser() { > > PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper( > analyzer); > > analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(), > newKeywordAnalyzer()); > > queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), > analyzerWrapper); > > try { > > LOGGER.debug("Initialising multi searcher "); > > this.multiSearcher = new MultiSearcher(searchers.toArray(newIndexSearcher[] > {})); > > LOGGER.debug("multi searcher initialised"); > > } catch (IOException e) { > > throw new IllegalStateException(e); > > } > > } > > > Initialises mutltisearcher when this class is creared by spring. > > > private synchronized void swapMultiSearcher(MultiSearcher > newMultiSearcher) { > > try { > > release(multiSearcher); > > } catch (IOException e) { > > throw new IllegalStateException(e); > > } > > multiSearcher = newMultiSearcher; > > } > > public void maybeReopen() throws IOException { > > MultiSearcher newMultiSeacher = null; > > boolean refreshMultiSeacher = false; > > List indexSearchers = new ArrayList(); > > synchronized (searchers) { > > for (IndexSearcher indexSearcher: searchers) { > > IndexReader reader = indexSearcher.getIndexReader(); > > reader.incRef(); > > Directory directory = reader.directory(); > > long currentVersion = reader.getVersion(); > > if (IndexReader.getCurrentVersion(directory) != currentVersion) { > > IndexReader newReader = indexSearcher.getIndexReader().reopen(); > > if (newReader != reader) { > > reader.decRef(); > > refreshMultiSeacher = true; > > } > > reader = newReader; > > IndexSearcher newSearcher = new IndexSearcher(newReader); > > indexSearchers.add(newSearcher); > > } > > } > > } > > > > if (refreshMultiSeacher) { > > newMultiSeacher = new MultiSearcher(indexSearchers.toArray(newIndexSearcher[] > {})); > > warm(newMultiSeacher); > > swapMultiSearcher(newMultiSeacher); > > } > > > > } > > > private void warm(MultiSearcher newMultiSeac
Re: Faceted Search using Lucene
Hi again... Thanks for your patience, I modified the code to do the following: private void maybeReopen() throws Exception { startReopen(); try { MultiSearcher newMultiSeacher = get(); boolean refreshMultiSeacher = false; List indexSearchers = new ArrayList(); synchronized (searchers) { for (IndexSearcher indexSearcher: searchers) { IndexReader reader = indexSearcher.getIndexReader(); reader.incRef(); Directory directory = reader.directory(); long currentVersion = reader.getVersion(); if (IndexReader.getCurrentVersion(directory) != currentVersion) { IndexReader newReader = indexSearcher.getIndexReader().reopen(); if (newReader != reader) { reader.decRef(); refreshMultiSeacher = true; } reader = newReader; IndexSearcher newSearcher = new IndexSearcher(reader); indexSearchers.add(newSearcher); } } } if (refreshMultiSeacher) { try { newMultiSeacher = new MultiSearcher(indexSearchers.toArray(newIndexSearcher[] {})); warm(newMultiSeacher); swapMultiSearcher(newMultiSeacher); }finally { release(multiSearcher); } } } finally { doneReopen(); } } But I'm still getting an AlreadyCloseException this occurs when I call the get() method in the main search code. Cheers On Sun, Mar 1, 2009 at 2:24 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > OK new version of SearcherManager, that fixes maybeReopen() so that it can > be called from multiple threads. > > NOTE: it's still untested! > > Mike > > package lia.admin; > > import java.io.IOException; > import java.util.HashMap; > > import org.apache.lucene.search.IndexSearcher; > import org.apache.lucene.index.IndexReader; > import org.apache.lucene.store.Directory; > > /** Utility class to get/refresh searchers when you are > * using multiple threads. */ > > public class SearcherManager { > > private IndexSearcher currentSearcher; //A > private Directory dir; > > public SearcherManager(Directory dir) throws IOException { >this.dir = dir; >currentSearcher = new IndexSearcher(IndexReader.open(dir)); //B > } > > public void warm(IndexSearcher searcher) {}//C > > private boolean reopening; > > private synchronized void startReopen()//D >throws InterruptedException { >while (reopening) { > wait(); >} >reopening = true; > } > > private synchronized void doneReopen() { //E >reopening = false; >notifyAll(); > } > > public void maybeReopen() throws InterruptedException, IOException { //F > >startReopen(); > >try { > final IndexSearcher searcher = get(); > try { >long currentVersion = currentSearcher.getIndexReader().getVersion(); > //G >if (IndexReader.getCurrentVersion(dir) != currentVersion) { > //G > IndexReader newReader = currentSearcher.getIndexReader().reopen(); > //G > assert newReader != currentSearcher.getIndexReader(); > //G > IndexSearcher newSearcher = new IndexSearcher(newReader); > //G > warm(newSearcher); > //G > swapSearcher(newSearcher); > //G >} > } finally { >release(searcher); > } >} finally { > doneReopen(); >} > } > > public synchronized IndexSearcher get() { //H >currentSearcher.getIndexReader().incRef(); >return currentSearcher; > } > > public synchronized void release(IndexSearcher searcher) //I >throws IOException { >searcher.getIndexReader().decRef(); > } > > private synchronized void swapSearcher(IndexSearcher newSearcher) //J > throws IOException { >release(currentSearcher); >currentSearcher = newSearcher; > } > } > > /* > #A Current IndexSearcher > #B Create initial searcher > #C Implement in subclass to warm new searcher > #D Pauses until no other thread is reopening > #E Finish reopen and notify other threads > #F Reopen searcher if there are changes > #G Check index version and reopen, warm, swap if needed > #H Returns current searcher > #I Release searcher > #J Swaps currentSearcher to new searcher > */ > > Mike > > > On Mar 1, 2009, at 8:27 AM, Amin Mohammed-Coleman wrote: > > just a quick point: >> public void maybeReopen() throws IOException { //D >> long currentVersion = currentSearcher.getIndexReader().getVersion(); >> if (IndexReader.getCurrentVersion(dir) != currentVersion) { >>IndexReader newReader = currentSearcher.getIndexReader().reopen(); >>assert newReader != currentSearcher.getIndexReader(); >>I
Re: Faceted Search using Lucene
Hi Thanks again for helping on a Sunday! I have now modified my maybeOpen() to do the following: private void maybeReopen() throws Exception { LOGGER.debug("Initiating reopening of index readers..."); IndexSearcher[] indexSearchers = (IndexSearcher[]) multiSearcher .getSearchables(); for (IndexSearcher indexSearcher : indexSearchers) { IndexReader indexReader = indexSearcher.getIndexReader(); SearcherManager documentSearcherManager = new SearcherManager(indexReader.directory()); documentSearcherManager.maybeReopen(); } } And get() to: private synchronized MultiSearcher get() { IndexSearcher[] indexSearchers = (IndexSearcher[]) multiSearcher .getSearchables(); List indexSearchersList = new ArrayList(); for (IndexSearcher indexSearcher : indexSearchers) { IndexReader indexReader = indexSearcher.getIndexReader(); SearcherManager documentSearcherManager = null; try { documentSearcherManager = new SearcherManager(indexReader.directory()); } catch (IOException e) { throw new IllegalStateException(e); } indexSearchersList.add(documentSearcherManager.get()); } try { multiSearcher = new MultiSearcher(indexSearchersList.toArray(newIndexSearcher[] {})); } catch (IOException e) { throw new IllegalStateException(e); } return multiSearcher; } This makes all my test pass. I am using the SearchManager that you recommended. Does this look ok? On Sun, Mar 1, 2009 at 2:38 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Your maybeReopen has an excess incRef(). > > I'm not sure how you open the searchers in the first place? The list > starts as empty, and nothing populates it? > > When you do the initial population, you need an incRef. > > I think you're hitting IllegalStateException because maybeReopen is > closing a reader before get() can get it (since they synchronize on > different objects). > > I'd recommend switching to the SearcherManager class. Instantiate one > for each of your searchers. On each search request, go through them > and call maybeReopen(), and then call get() and gather each > IndexSearcher instance into a new array. Then, make a new > MultiSearcher (opposite of what I said before): while that creates a > small amount of garbage, it'll keep your code simpler (good > tradeoff). > > Mike > > Amin Mohammed-Coleman wrote: > > sorrry I added >> >> release(multiSearcher); >> >> >> instead of multiSearcher.close(); >> >> On Sun, Mar 1, 2009 at 2:17 PM, Amin Mohammed-Coleman > >wrote: >> >> Hi >>> I've now done the following: >>> >>> public Summary[] search(final SearchRequest searchRequest) >>> throwsSearchExecutionException { >>> >>> final String searchTerm = searchRequest.getSearchTerm(); >>> >>> if (StringUtils.isBlank(searchTerm)) { >>> >>> throw new SearchExecutionException("Search string cannot be empty. There >>> will be too many results to process."); >>> >>> } >>> >>> List summaryList = new ArrayList(); >>> >>> StopWatch stopWatch = new StopWatch("searchStopWatch"); >>> >>> stopWatch.start(); >>> >>> List indexSearchers = new ArrayList(); >>> >>> try { >>> >>> LOGGER.debug("Ensuring all index readers are up to date..."); >>> >>> maybeReopen(); >>> >>> LOGGER.debug("All Index Searchers are up to date. No of index searchers >>> '"+ indexSearchers.size() + >>> "'"); >>> >>> Query query = queryParser.parse(searchTerm); >>> >>> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + >>> query.toString() +"'"); >>> >>> Sort sort = null; >>> >>> sort = applySortIfApplicable(searchRequest); >>> >>> Filter[] filters =applyFiltersIfApplicable(searchRequest); >>> >>> ChainedFilter chainedFilter = null; >>> >>> if (filters != null) { >>> >>> chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); >>> >>> } >>> >>> TopDocs topDocs = get().search(query,chainedFilter ,100,sort); >>> >>> ScoreDoc[] scoreDocs = topDocs.scoreDocs; >>> >>> LOGGER.debug("total number of hits for [" + query.toString() + " ] = >>> "+topDocs. >>> totalHits); >>> >>> for (ScoreDoc scoreDoc : scoreDocs) { >>> >>> final Document doc = multiSearcher.doc(scoreDoc.doc); >>>
Re: Faceted Search using Lucene
Sorry...i'm getting slightly confused. I have a PostConstruct which is where I should create an array of SearchManagers (per indexSeacher). From there I initialise the multisearcher using the get(). After which I need to call maybeReopen for each IndexSearcher. So I'll do the following: @PostConstruct public void initialiseDocumentSearcher() { PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper( analyzer); analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(), newKeywordAnalyzer()); queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzerWrapper); try { LOGGER.debug("Initialising multi searcher "); documentSearcherManagers = new DocumentSearcherManager[searchers.size()]; for (int i = 0; i < searchers.size() ;i++) { IndexSearcher indexSearcher = searchers.get(i); Directory directory = indexSearcher.getIndexReader().directory(); DocumentSearcherManager documentSearcherManager = newDocumentSearcherManager(directory); documentSearcherManagers[i]=documentSearcherManager; } LOGGER.debug("multi searcher initialised"); } catch (IOException e) { throw new IllegalStateException(e); } } This initialises search managers. I then have methods: private void maybeReopen() throws Exception { LOGGER.debug("Initiating reopening of index readers..."); for (DocumentSearcherManager documentSearcherManager : documentSearcherManagers) { documentSearcherManager.maybeReopen(); } } private void release() throws Exception { for (DocumentSearcherManager documentSearcherManager : documentSearcherManagers) { documentSearcherManager.release(documentSearcherManager.get()); } } private MultiSearcher get() { List listOfIndexSeachers = new ArrayList(); for (DocumentSearcherManager documentSearcherManager : documentSearcherManagers) { listOfIndexSeachers.add(documentSearcherManager.get()); } try { multiSearcher = new MultiSearcher(listOfIndexSeachers.toArray(newIndexSearcher[] {})); } catch (IOException e) { throw new IllegalStateException(e); } return multiSearcher; } These methods are used in the following manner in the search code: public Summary[] search(final SearchRequest searchRequest) throwsSearchExecutionException { final String searchTerm = searchRequest.getSearchTerm(); if (StringUtils.isBlank(searchTerm)) { throw new SearchExecutionException("Search string cannot be empty. There will be too many results to process."); } List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); List indexSearchers = new ArrayList(); try { LOGGER.debug("Ensuring all index readers are up to date..."); maybeReopen(); LOGGER.debug("All Index Searchers are up to date. No of index searchers '" + indexSearchers.size() +"'"); Query query = queryParser.parse(searchTerm); LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + query.toString() +"'"); Sort sort = null; sort = applySortIfApplicable(searchRequest); Filter[] filters =applyFiltersIfApplicable(searchRequest); ChainedFilter chainedFilter = null; if (filters != null) { chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); } TopDocs topDocs = get().search(query,chainedFilter ,100,sort); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs. totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = get().doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } release(); } catch (Exception e) { throw new IllegalStateException(e); } stopWatch.stop(); LOGGER.debug("total time taken for document seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } Does this look better? Again..I really really appreciate your help! On Sun, Mar 1, 2009 at 4:18 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > This is not quite right -- you should only create SearcherManager once > (per Direcotry) at startup/app load, not with every search request. > > And I don't see release -- it must call SearcherManager.release of > each of the IndexSearchers previously returned from get(). > > Mike > > Amin Mohammed-Coleman wrote: > > Hi >> Thanks again for helping on a Sunday! >> >> I have now modified my maybeOpen() to do the following: >> >> private void maybeReopen() throws Exception { >> >> LOGGER.debug("Initiating reopening of index readers..."); >> >
Re: Faceted Search using Lucene
Hi The searchers are injected into the class via Spring. So when a client calls the class it is fully configured with a list of index searchers. However I have removed this list and instead injecting a list of directories which are passed to the DocumentSearchManager. DocumentSearchManager is SearchManager (should've mentioned that earlier). So finally I have modified by release code to do the following: private void release(MultiSearcher multiSeacher) throws Exception { IndexSearcher[] indexSearchers = (IndexSearcher[]) multiSeacher.getSearchables(); for(int i =0 ; i < indexSearchers.length;i++) { documentSearcherManagers[i].release(indexSearchers[i]); } } and it's use looks like this: public Summary[] search(final SearchRequest searchRequest) throwsSearchExecutionException { final String searchTerm = searchRequest.getSearchTerm(); if (StringUtils.isBlank(searchTerm)) { throw new SearchExecutionException("Search string cannot be empty. There will be too many results to process."); } List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); List indexSearchers = new ArrayList(); try { LOGGER.debug("Ensuring all index readers are up to date..."); maybeReopen(); LOGGER.debug("All Index Searchers are up to date. No of index searchers '" + indexSearchers.size() +"'"); Query query = queryParser.parse(searchTerm); LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + query.toString() +"'"); Sort sort = null; sort = applySortIfApplicable(searchRequest); Filter[] filters =applyFiltersIfApplicable(searchRequest); ChainedFilter chainedFilter = null; if (filters != null) { chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); } TopDocs topDocs = get().search(query,chainedFilter ,100,sort); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs. totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = get().doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } finally { release(get()); } stopWatch.stop(); LOGGER.debug("total time taken for document seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } So the final post construct constructs the DocumentSearchMangers with the list of directories..looking like this @PostConstruct public void initialiseDocumentSearcher() { PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper( analyzer); analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(), newKeywordAnalyzer()); queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzerWrapper); try { LOGGER.debug("Initialising multi searcher "); documentSearcherManagers = new DocumentSearcherManager[directories.size()]; for (int i = 0; i < directories.size() ;i++) { Directory directory = directories.get(i); DocumentSearcherManager documentSearcherManager = newDocumentSearcherManager(directory); documentSearcherManagers[i]=documentSearcherManager; } LOGGER.debug("multi searcher initialised"); } catch (IOException e) { throw new IllegalStateException(e); } } Cheers Amin On Sun, Mar 1, 2009 at 6:15 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > I don't understand where searchers comes from, prior to > initializeDocumentSearcher? You should, instead, simply create the > SearcherManager (from your Directory instances). You don't need any > searchers during initialize. > > Is DocumentSearcherManager the same as SearcherManager (just renamed)? > > The release method is wrong -- you're calling .get() and then > immediately release. Instead, you should step through the searchers > from your MultiSearcher and release them to each SearcherManager. > > You should call your release() in a finally clause. > > Mike > > Amin Mohammed-Coleman wrote: > > Sorry...i'm getting slightly confused. >> I have a PostConstruct which is where I should create an array of >> SearchManagers (per indexSeacher). From there I initialise the >> multisearcher using the get(). After which I need to call maybeReopen for >> each IndexSearcher. So I'll do the following: >> >> @PostConstruct >> >> public void initialiseDocumentSearcher() { >> >> PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper( >> analyzer); >&
Re: Faceted Search using Lucene
Hi there Good morning! Here is the final search code: public Summary[] search(final SearchRequest searchRequest) throwsSearchExecutionException { final String searchTerm = searchRequest.getSearchTerm(); if (StringUtils.isBlank(searchTerm)) { throw new SearchExecutionException("Search string cannot be empty. There will be too many results to process."); } List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; try { LOGGER.debug("Ensuring all index readers are up to date..."); maybeReopen(); Query query = queryParser.parse(searchTerm); LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + query.toString() +"'"); Sort sort = null; sort = applySortIfApplicable(searchRequest); Filter[] filters =applyFiltersIfApplicable(searchRequest); ChainedFilter chainedFilter = null; if (filters != null) { chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); } multiSearcher = get(); TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs. totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } finally { if (multiSearcher != null) { release(multiSearcher); } } stopWatch.stop(); LOGGER.debug("total time taken for document seach: " + stopWatch.getTotalTimeMillis() + " ms"); return summaryList.toArray(new Summary[] {}); } I hope this makes sense...thanks again! Cheers Amin On Sun, Mar 1, 2009 at 8:09 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > You're calling get() too many times. For every call to get() you must > match with a call to release(). > > So, once at the front of your search method you should: > > MultiSearcher searcher = get(); > > then use that searcher to do searching, retrieve docs, etc. > > Then in the finally clause, pass that searcher to release. > > So, only one call to get() and one matching call to release(). > > Mike > > Amin Mohammed-Coleman wrote: > > Hi >> The searchers are injected into the class via Spring. So when a client >> calls the class it is fully configured with a list of index searchers. >> However I have removed this list and instead injecting a list of >> directories which are passed to the DocumentSearchManager. >> DocumentSearchManager is SearchManager (should've mentioned that earlier). >> So finally I have modified by release code to do the following: >> >> private void release(MultiSearcher multiSeacher) throws Exception { >> >> IndexSearcher[] indexSearchers = (IndexSearcher[]) >> multiSeacher.getSearchables(); >> >> for(int i =0 ; i < indexSearchers.length;i++) { >> >> documentSearcherManagers[i].release(indexSearchers[i]); >> >> } >> >> } >> >> >> and it's use looks like this: >> >> >> public Summary[] search(final SearchRequest searchRequest) >> throwsSearchExecutionException { >> >> final String searchTerm = searchRequest.getSearchTerm(); >> >> if (StringUtils.isBlank(searchTerm)) { >> >> throw new SearchExecutionException("Search string cannot be empty. There >> will be too many results to process."); >> >> } >> >> List summaryList = new ArrayList(); >> >> StopWatch stopWatch = new StopWatch("searchStopWatch"); >> >> stopWatch.start(); >> >> List indexSearchers = new ArrayList(); >> >> try { >> >> LOGGER.debug("Ensuring all index readers are up to date..."); >> >> maybeReopen(); >> >> LOGGER.debug("All Index Searchers are up to date. No of index searchers '" >> + >> indexSearchers.size() +"'"); >> >> Query query = queryParser.parse(searchTerm); >> >> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + >> query.toString() +"'"); >> >> Sort sort = null; >> >> sort = applySortIfApplicable(searchRequest); >> >> Filter[] filters =applyFiltersIfApplicable(searchRequest); >> >> ChainedFilter chainedFilter = null; >> >> if
Re: Faceted Search using Lucene
I noticed that if i do the get() before the maybeReopen then I get no results. But otherwise I can change it further. On Mon, Mar 2, 2009 at 11:46 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > > There is no such thing as final code -- code is alive and is always > changing ;) > > It looks good to me. > > Though one trivial thing is: I would move the code in the try clause up to > and including the multiSearcher=get() out above the try. I always attempt > to "shrink wrap" what's inside a try clause to the minimum that needs to be > there. Ie, your code that creates a query, finds the right sort & filter to > use, etc, can all happen outside the try, because you have not yet acquired > the multiSearcher. > > If you do that, you also don't need the null check in the finally clause, > because multiSearcher must be non-null on entering the try. > > Mike > > Amin Mohammed-Coleman wrote: > > Hi there >> Good morning! Here is the final search code: >> >> public Summary[] search(final SearchRequest searchRequest) >> throwsSearchExecutionException { >> >> final String searchTerm = searchRequest.getSearchTerm(); >> >> if (StringUtils.isBlank(searchTerm)) { >> >> throw new SearchExecutionException("Search string cannot be empty. There >> will be too many results to process."); >> >> } >> >> List summaryList = new ArrayList(); >> >> StopWatch stopWatch = new StopWatch("searchStopWatch"); >> >> stopWatch.start(); >> >> MultiSearcher multiSearcher = null; >> >> try { >> >> LOGGER.debug("Ensuring all index readers are up to date..."); >> >> maybeReopen(); >> >> Query query = queryParser.parse(searchTerm); >> >> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + >> query.toString() +"'"); >> >> Sort sort = null; >> >> sort = applySortIfApplicable(searchRequest); >> >> Filter[] filters =applyFiltersIfApplicable(searchRequest); >> >> ChainedFilter chainedFilter = null; >> >> if (filters != null) { >> >> chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); >> >> } >> >> multiSearcher = get(); >> >> TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort); >> >> ScoreDoc[] scoreDocs = topDocs.scoreDocs; >> >> LOGGER.debug("total number of hits for [" + query.toString() + " ] = >> "+topDocs. >> totalHits); >> >> for (ScoreDoc scoreDoc : scoreDocs) { >> >> final Document doc = multiSearcher.doc(scoreDoc.doc); >> >> float score = scoreDoc.score; >> >> final BaseDocument baseDocument = new BaseDocument(doc, score); >> >> Summary documentSummary = new DocumentSummaryImpl(baseDocument); >> >> summaryList.add(documentSummary); >> >> } >> >> } catch (Exception e) { >> >> throw new IllegalStateException(e); >> >> } finally { >> >> if (multiSearcher != null) { >> >> release(multiSearcher); >> >> } >> >> } >> >> stopWatch.stop(); >> >> LOGGER.debug("total time taken for document seach: " + >> stopWatch.getTotalTimeMillis() + " ms"); >> >> return summaryList.toArray(new Summary[] {}); >> >> } >> >> >> >> I hope this makes sense...thanks again! >> >> >> Cheers >> >> Amin >> >> >> >> On Sun, Mar 1, 2009 at 8:09 PM, Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >> >>> You're calling get() too many times. For every call to get() you must >>> match with a call to release(). >>> >>> So, once at the front of your search method you should: >>> >>> MultiSearcher searcher = get(); >>> >>> then use that searcher to do searching, retrieve docs, etc. >>> >>> Then in the finally clause, pass that searcher to release. >>> >>> So, only one call to get() and one matching call to release(). >>> >>> Mike >>> >>> Amin Mohammed-Coleman wrote: >>> >>> Hi >>> >>>> The searchers are injected into the class via Spring. So when a client >>>> calls the class it is fully configured with a list of index searchers. >>>> However I have removed this list and instead injecting a list of >>>> directories which a
Re: Faceted Search using Lucene
Nope. If i remove the maybeReopen the search doesn't work. It only works when i cal maybeReopen followed by get(). Cheers Amin On Mon, Mar 2, 2009 at 12:56 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > That's not right; something must be wrong. > > get() before maybeReopen() should simply let you search based on the > searcher before reopening. > > If you just do get() and don't call maybeReopen() does it work? > > > Mike > > Amin Mohammed-Coleman wrote: > > I noticed that if i do the get() before the maybeReopen then I get no >> results. But otherwise I can change it further. >> >> On Mon, Mar 2, 2009 at 11:46 AM, Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >> >>> There is no such thing as final code -- code is alive and is always >>> changing ;) >>> >>> It looks good to me. >>> >>> Though one trivial thing is: I would move the code in the try clause up >>> to >>> and including the multiSearcher=get() out above the try. I always >>> attempt >>> to "shrink wrap" what's inside a try clause to the minimum that needs to >>> be >>> there. Ie, your code that creates a query, finds the right sort & filter >>> to >>> use, etc, can all happen outside the try, because you have not yet >>> acquired >>> the multiSearcher. >>> >>> If you do that, you also don't need the null check in the finally clause, >>> because multiSearcher must be non-null on entering the try. >>> >>> Mike >>> >>> Amin Mohammed-Coleman wrote: >>> >>> Hi there >>> >>>> Good morning! Here is the final search code: >>>> >>>> public Summary[] search(final SearchRequest searchRequest) >>>> throwsSearchExecutionException { >>>> >>>> final String searchTerm = searchRequest.getSearchTerm(); >>>> >>>> if (StringUtils.isBlank(searchTerm)) { >>>> >>>> throw new SearchExecutionException("Search string cannot be empty. There >>>> will be too many results to process."); >>>> >>>> } >>>> >>>> List summaryList = new ArrayList(); >>>> >>>> StopWatch stopWatch = new StopWatch("searchStopWatch"); >>>> >>>> stopWatch.start(); >>>> >>>> MultiSearcher multiSearcher = null; >>>> >>>> try { >>>> >>>> LOGGER.debug("Ensuring all index readers are up to date..."); >>>> >>>> maybeReopen(); >>>> >>>> Query query = queryParser.parse(searchTerm); >>>> >>>> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + >>>> query.toString() +"'"); >>>> >>>> Sort sort = null; >>>> >>>> sort = applySortIfApplicable(searchRequest); >>>> >>>> Filter[] filters =applyFiltersIfApplicable(searchRequest); >>>> >>>> ChainedFilter chainedFilter = null; >>>> >>>> if (filters != null) { >>>> >>>> chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); >>>> >>>> } >>>> >>>> multiSearcher = get(); >>>> >>>> TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort); >>>> >>>> ScoreDoc[] scoreDocs = topDocs.scoreDocs; >>>> >>>> LOGGER.debug("total number of hits for [" + query.toString() + " ] = >>>> "+topDocs. >>>> totalHits); >>>> >>>> for (ScoreDoc scoreDoc : scoreDocs) { >>>> >>>> final Document doc = multiSearcher.doc(scoreDoc.doc); >>>> >>>> float score = scoreDoc.score; >>>> >>>> final BaseDocument baseDocument = new BaseDocument(doc, score); >>>> >>>> Summary documentSummary = new DocumentSummaryImpl(baseDocument); >>>> >>>> summaryList.add(documentSummary); >>>> >>>> } >>>> >>>> } catch (Exception e) { >>>> >>>> throw new IllegalStateException(e); >>>> >>>> } finally { >>>> >>>> if (multiSearcher != null) { >>>> >>>> release(multiSearcher); >>>> >>>> } >>>> >>>> } >>
Re: Faceted Search using Lucene
In my test case I have a set up method that should populate the indexes before I start using the document searcher. I will start adding some more debug statements. So basically I should be able to do: get() followed by maybeReopen. I will let you know what the outcome is. Cheers Amin On Mon, Mar 2, 2009 at 1:39 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Is it possible that when you first create the SearcherManager, there is no > index in each Directory? > > If not... you better start adding diagnostics. EG inside your get(), print > out the numDocs() of each IndexReader you get from the SearcherManager? > > Something is wrong and it's best to explain it... > > > Mike > > Amin Mohammed-Coleman wrote: > > Nope. If i remove the maybeReopen the search doesn't work. It only works >> when i cal maybeReopen followed by get(). >> >> Cheers >> Amin >> >> On Mon, Mar 2, 2009 at 12:56 PM, Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >> >>> That's not right; something must be wrong. >>> >>> get() before maybeReopen() should simply let you search based on the >>> searcher before reopening. >>> >>> If you just do get() and don't call maybeReopen() does it work? >>> >>> >>> Mike >>> >>> Amin Mohammed-Coleman wrote: >>> >>> I noticed that if i do the get() before the maybeReopen then I get no >>> >>>> results. But otherwise I can change it further. >>>> >>>> On Mon, Mar 2, 2009 at 11:46 AM, Michael McCandless < >>>> luc...@mikemccandless.com> wrote: >>>> >>>> >>>> There is no such thing as final code -- code is alive and is always >>>>> changing ;) >>>>> >>>>> It looks good to me. >>>>> >>>>> Though one trivial thing is: I would move the code in the try clause up >>>>> to >>>>> and including the multiSearcher=get() out above the try. I always >>>>> attempt >>>>> to "shrink wrap" what's inside a try clause to the minimum that needs >>>>> to >>>>> be >>>>> there. Ie, your code that creates a query, finds the right sort & >>>>> filter >>>>> to >>>>> use, etc, can all happen outside the try, because you have not yet >>>>> acquired >>>>> the multiSearcher. >>>>> >>>>> If you do that, you also don't need the null check in the finally >>>>> clause, >>>>> because multiSearcher must be non-null on entering the try. >>>>> >>>>> Mike >>>>> >>>>> Amin Mohammed-Coleman wrote: >>>>> >>>>> Hi there >>>>> >>>>> Good morning! Here is the final search code: >>>>>> >>>>>> public Summary[] search(final SearchRequest searchRequest) >>>>>> throwsSearchExecutionException { >>>>>> >>>>>> final String searchTerm = searchRequest.getSearchTerm(); >>>>>> >>>>>> if (StringUtils.isBlank(searchTerm)) { >>>>>> >>>>>> throw new SearchExecutionException("Search string cannot be empty. >>>>>> There >>>>>> will be too many results to process."); >>>>>> >>>>>> } >>>>>> >>>>>> List summaryList = new ArrayList(); >>>>>> >>>>>> StopWatch stopWatch = new StopWatch("searchStopWatch"); >>>>>> >>>>>> stopWatch.start(); >>>>>> >>>>>> MultiSearcher multiSearcher = null; >>>>>> >>>>>> try { >>>>>> >>>>>> LOGGER.debug("Ensuring all index readers are up to date..."); >>>>>> >>>>>> maybeReopen(); >>>>>> >>>>>> Query query = queryParser.parse(searchTerm); >>>>>> >>>>>> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + >>>>>> query.toString() +"'"); >>>>>> >>>>>> Sort sort = null; >>>>>> >>>>>> sort = applySortIfApplicable(searchRequest); >>>>>> >>>>>> Filter
Re: Faceted Search using Lucene
Hi Just out of curiosity does it not make sense to call maybeReopen and then call get()? If I call get() then I have a new mulitsearcher, so a call to maybeopen won't reinitialise the multi searcher. Unless I pass the multi searcher into the maybereopen method. But somehow that doesn't make sense. I maybe missing something here. Cheers Amin On 2 Mar 2009, at 15:48, Amin Mohammed-Coleman wrote: I'm seeing some interesting behviour when i do get() first followed by maybeReopen then there are no documents in the directory (directory that i am interested in. When i do the maybeReopen and then get() then the doc count is correct. I can post stats later. Weird... On Mon, Mar 2, 2009 at 2:17 PM, Amin Mohammed-Coleman > wrote: oh dear...i think i may cry...i'll debug. On Mon, Mar 2, 2009 at 2:15 PM, Michael McCandless > wrote: Or even just get() with no call to maybeReopen(). That should work fine as well. Mike Amin Mohammed-Coleman wrote: In my test case I have a set up method that should populate the indexes before I start using the document searcher. I will start adding some more debug statements. So basically I should be able to do: get() followed by maybeReopen. I will let you know what the outcome is. Cheers Amin On Mon, Mar 2, 2009 at 1:39 PM, Michael McCandless < luc...@mikemccandless.com> wrote: Is it possible that when you first create the SearcherManager, there is no index in each Directory? If not... you better start adding diagnostics. EG inside your get(), print out the numDocs() of each IndexReader you get from the SearcherManager? Something is wrong and it's best to explain it... Mike Amin Mohammed-Coleman wrote: Nope. If i remove the maybeReopen the search doesn't work. It only works when i cal maybeReopen followed by get(). Cheers Amin On Mon, Mar 2, 2009 at 12:56 PM, Michael McCandless < luc...@mikemccandless.com> wrote: That's not right; something must be wrong. get() before maybeReopen() should simply let you search based on the searcher before reopening. If you just do get() and don't call maybeReopen() does it work? Mike Amin Mohammed-Coleman wrote: I noticed that if i do the get() before the maybeReopen then I get no results. But otherwise I can change it further. On Mon, Mar 2, 2009 at 11:46 AM, Michael McCandless < luc...@mikemccandless.com> wrote: There is no such thing as final code -- code is alive and is always changing ;) It looks good to me. Though one trivial thing is: I would move the code in the try clause up to and including the multiSearcher=get() out above the try. I always attempt to "shrink wrap" what's inside a try clause to the minimum that needs to be there. Ie, your code that creates a query, finds the right sort & filter to use, etc, can all happen outside the try, because you have not yet acquired the multiSearcher. If you do that, you also don't need the null check in the finally clause, because multiSearcher must be non-null on entering the try. Mike Amin Mohammed-Coleman wrote: Hi there Good morning! Here is the final search code: public Summary[] search(final SearchRequest searchRequest) throwsSearchExecutionException { final String searchTerm = searchRequest.getSearchTerm(); if (StringUtils.isBlank(searchTerm)) { throw new SearchExecutionException("Search string cannot be empty. There will be too many results to process."); } List summaryList = new ArrayList(); StopWatch stopWatch = new StopWatch("searchStopWatch"); stopWatch.start(); MultiSearcher multiSearcher = null; try { LOGGER.debug("Ensuring all index readers are up to date..."); maybeReopen(); Query query = queryParser.parse(searchTerm); LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + query.toString() +"'"); Sort sort = null; sort = applySortIfApplicable(searchRequest); Filter[] filters =applyFiltersIfApplicable(searchRequest); ChainedFilter chainedFilter = null; if (filters != null) { chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); } multiSearcher = get(); TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort); ScoreDoc[] scoreDocs = topDocs.scoreDocs; LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs. totalHits); for (ScoreDoc scoreDoc : scoreDocs) { final Document doc = multiSearcher.doc(scoreDoc.doc); float score = scoreDoc.score; final BaseDocument baseDocument = new BaseDocument(doc, score); Summary documentSummary = new DocumentSummaryImpl(baseDocument); summaryList.add(documentSummary); } } catch (Exception e) { throw new IllegalStateException(e); } finally { if (multiSearcher != null) { release(multiSearcher); } } stopWatch.stop(); LOGGER.debug("total time taken for document seach:
Re: Faceted Search using Lucene
queries pay the reopen/warming cost). > > If you call maybeReopen() after get(), then that search will not see the > newly opened readers, but the next search will. > > I'm just thinking that since you see no results with get() alone, debug > that case first. Then put back the maybeReopen(). > > Can you post your full code at this point? > > > Mike > > Amin Mohammed-Coleman wrote: > > Hi >> >> Just out of curiosity does it not make sense to call maybeReopen and then >> call get()? If I call get() then I have a new mulitsearcher, so a call to >> maybeopen won't reinitialise the multi searcher. Unless I pass the multi >> searcher into the maybereopen method. But somehow that doesn't make sense. I >> maybe missing something here. >> >> >> Cheers >> >> Amin >> >> On 2 Mar 2009, at 15:48, Amin Mohammed-Coleman wrote: >> >> I'm seeing some interesting behviour when i do get() first followed by >>> maybeReopen then there are no documents in the directory (directory that i >>> am interested in. When i do the maybeReopen and then get() then the doc >>> count is correct. I can post stats later. >>> >>> Weird... >>> >>> On Mon, Mar 2, 2009 at 2:17 PM, Amin Mohammed-Coleman >>> wrote: >>> oh dear...i think i may cry...i'll debug. >>> >>> >>> On Mon, Mar 2, 2009 at 2:15 PM, Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >>> Or even just get() with no call to maybeReopen(). That should work fine >>> as well. >>> >>> >>> Mike >>> >>> Amin Mohammed-Coleman wrote: >>> >>> In my test case I have a set up method that should populate the indexes >>> before I start using the document searcher. I will start adding some >>> more >>> debug statements. So basically I should be able to do: get() followed by >>> maybeReopen. >>> >>> I will let you know what the outcome is. >>> >>> >>> Cheers >>> Amin >>> >>> On Mon, Mar 2, 2009 at 1:39 PM, Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >>> >>> Is it possible that when you first create the SearcherManager, there is >>> no >>> index in each Directory? >>> >>> If not... you better start adding diagnostics. EG inside your get(), >>> print >>> out the numDocs() of each IndexReader you get from the SearcherManager? >>> >>> Something is wrong and it's best to explain it... >>> >>> >>> Mike >>> >>> Amin Mohammed-Coleman wrote: >>> >>> Nope. If i remove the maybeReopen the search doesn't work. It only works >>> when i cal maybeReopen followed by get(). >>> >>> Cheers >>> Amin >>> >>> On Mon, Mar 2, 2009 at 12:56 PM, Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >>> >>> That's not right; something must be wrong. >>> >>> get() before maybeReopen() should simply let you search based on the >>> searcher before reopening. >>> >>> If you just do get() and don't call maybeReopen() does it work? >>> >>> >>> Mike >>> >>> Amin Mohammed-Coleman wrote: >>> >>> I noticed that if i do the get() before the maybeReopen then I get no >>> >>> results. But otherwise I can change it further. >>> >>> On Mon, Mar 2, 2009 at 11:46 AM, Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >>> >>> There is no such thing as final code -- code is alive and is always >>> changing ;) >>> >>> It looks good to me. >>> >>> Though one trivial thing is: I would move the code in the try clause up >>> to >>> and including the multiSearcher=get() out above the try. I always >>> attempt >>> to "shrink wrap" what's inside a try clause to the minimum that needs >>> to >>> be >>> there. Ie, your code that creates a query, finds the right sort & >>> filter >>> to >>> use, etc, can all happen outside the try, because you have not yet >>> acquired >>> the multiSearcher. >>> >>> If you do that, you also don't need the null check in the finally >>> clause, >>> because multiSearcher must be non-null on entering the t
Re: Faceted Search using Lucene
I think that is the case. When my SearchManager is initialised the directories are empty so when I do a get() nothing is present. Subsequent calls seem to work. Is there something I can do? or do I accept this or just do a maybeReopen and do a get(). As you mentioned it depends on timiing but I would be keen to know what the best practice would be in this situation... Cheers On Mon, Mar 2, 2009 at 8:43 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Well the code looks fine. > > I can't explain why you see no search results if you don't call > maybeReopen() in get, unless at the time you first create SearcherManager > the Directories each have an empty index in them. > > Mike > > Amin Mohammed-Coleman wrote: > > Hi >> Here is the code that I am using, I've modified the get() method to >> include >> the maybeReopen() call. Again I'm not sure if this is a good idea. >> >> public Summary[] search(final SearchRequest searchRequest) >> throwsSearchExecutionException { >> >> final String searchTerm = searchRequest.getSearchTerm(); >> >> if (StringUtils.isBlank(searchTerm)) { >> >> throw new SearchExecutionException("Search string cannot be empty. There >> will be too many results to process."); >> >> } >> >> List summaryList = new ArrayList(); >> >> StopWatch stopWatch = new StopWatch("searchStopWatch"); >> >> stopWatch.start(); >> >> MultiSearcher multiSearcher = get(); >> >> try { >> >> LOGGER.debug("Ensuring all index readers are up to date..."); >> >> Query query = queryParser.parse(searchTerm); >> >> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" + >> query.toString() +"'"); >> >> Sort sort = null; >> >> sort = applySortIfApplicable(searchRequest); >> >> Filter[] filters =applyFiltersIfApplicable(searchRequest); >> >> ChainedFilter chainedFilter = null; >> >> if (filters != null) { >> >> chainedFilter = new ChainedFilter(filters, ChainedFilter.OR); >> >> } >> >> TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort); >> >> ScoreDoc[] scoreDocs = topDocs.scoreDocs; >> >> LOGGER.debug("total number of hits for [" + query.toString() + " ] = >> "+topDocs. >> totalHits); >> >> for (ScoreDoc scoreDoc : scoreDocs) { >> >> final Document doc = multiSearcher.doc(scoreDoc.doc); >> >> float score = scoreDoc.score; >> >> final BaseDocument baseDocument = new BaseDocument(doc, score); >> >> Summary documentSummary = new DocumentSummaryImpl(baseDocument); >> >> summaryList.add(documentSummary); >> >> } >> >> } catch (Exception e) { >> >> throw new IllegalStateException(e); >> >> } finally { >> >> if (multiSearcher != null) { >> >> release(multiSearcher); >> >> } >> >> } >> >> stopWatch.stop(); >> >> LOGGER.debug("total time taken for document seach: " + >> stopWatch.getTotalTimeMillis() + " ms"); >> >> return summaryList.toArray(new Summary[] {}); >> >> } >> >> >> @Autowired >> >> public void setDirectories(@Qualifier("directories")ListFactoryBean >> listFactoryBean) throws Exception { >> >> this.directories = (List) listFactoryBean.getObject(); >> >> } >> >> @PostConstruct >> >> public void initialiseDocumentSearcher() { >> >> StopWatch stopWatch = new StopWatch("document-search-initialiser"); >> >> stopWatch.start(); >> >> PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper( >> analyzer); >> >> analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(), >> newKeywordAnalyzer()); >> >> queryParser = >> newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), >> analyzerWrapper); >> >> try { >> >> LOGGER.debug("Initialising document searcher "); >> >> documentSearcherManagers = new >> DocumentSearcherManager[directories.size()]; >> >> for (int i = 0; i < directories.size() ;i++) { >> >> Directory directory = directories.get(i); >> >> DocumentSearcherManager documentSearcherManager = >> newDocumentSearcherManager(directory); >> >> documentSearcherManagers[i]=documentSearcherManager; >> &g
Lucene Highlighting and Dynamic Summaries
Hi I am currently indexing documents (pdf, ms word, etc) that are uploaded, these documents can be searched and what the search returns to the user are summaries of the documents. Currently the summaries are extracted when indexing the file (summary constructed by taking the first 10 lines of the document and stored in the index as field). This is not ideal (static summary), and I was wondering if it would be possible to create a dynamic summary when a hit is found and highlight the terms found. The content of the document is not stored in the index. So basically what I'm looking to do is: 1) PDF indexed 2) PDF body contains the word "search" 3) Do a search and return the hit 4) Construct a summary with the term "search" included. I'm not sure how to go about doing this (I presume it is possible). I would be grateful for any advice. Cheers Amin
Re: Lucene Highlighting and Dynamic Summaries
hi that's what i was thinking about. i would need to get the file and extract the text again and then pass through the highlighter. The other option is storing the content in the index the downside being index is going to be large. Which would be the recommended approach? Cheers Amin On Sat, Mar 7, 2009 at 10:50 AM, Erik Hatcher wrote: > With the caveat that if you're not storing the text you want highlighted, > you'll have to retrieve it somehow and send it into the Highlighter > yourself. > >Erik > > > On Mar 7, 2009, at 5:40 AM, Michael McCandless wrote: > > >> You should look at contrib/highlighter, which does exactly this. >> >> Mike >> >> Amin Mohammed-Coleman wrote: >> >> Hi >>> I am currently indexing documents (pdf, ms word, etc) that are uploaded, >>> these documents can be searched and what the search returns to the user >>> are >>> summaries of the documents. Currently the summaries are extracted when >>> indexing the file (summary constructed by taking the first 10 lines of >>> the >>> document and stored in the index as field). This is not ideal (static >>> summary), and I was wondering if it would be possible to create a dynamic >>> summary when a hit is found and highlight the terms found. The content >>> of >>> the document is not stored in the index. >>> >>> So basically what I'm looking to do is: >>> >>> 1) PDF indexed >>> 2) PDF body contains the word "search" >>> 3) Do a search and return the hit >>> 4) Construct a summary with the term "search" included. >>> >>> I'm not sure how to go about doing this (I presume it is possible). I >>> would >>> be grateful for any advice. >>> >>> >>> Cheers >>> Amin >>> >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Lucene Highlighting and Dynamic Summaries
cool. i will use compression and store in index. is there anything special i need to for decompressing the text? i presume i can just do doc.get("content")? thanks for your advice all! On Sat, Mar 7, 2009 at 11:50 AM, Uwe Schindler wrote: > You could store the text contents compressed; I think extracting text from > PDF files is much more time-intensive than decompressing a stored field. > And > text-only contents often compress very good. In my opinion, if the > (uncompressed) contents of the docs are not very large (so I mean several > megabytes each), I would prefer storing it in index. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Erik Hatcher [mailto:e...@ehatchersolutions.com] > > Sent: Saturday, March 07, 2009 12:46 PM > > To: java-user@lucene.apache.org > > Subject: Re: Lucene Highlighting and Dynamic Summaries > > > > It depends :) > > > > It's a trade-off. If storing is not prohibitive, I recommend that as > > it makes life easier for highlighting. > > > > Erik > > > > On Mar 7, 2009, at 6:37 AM, Amin Mohammed-Coleman wrote: > > > > > hi > > > that's what i was thinking about. i would need to get the file and > > > extract > > > the text again and then pass through the highlighter. The other > > > option is > > > storing the content in the index the downside being index is going > > > to be > > > large. Which would be the recommended approach? > > > > > > Cheers > > > > > > Amin > > > > > > On Sat, Mar 7, 2009 at 10:50 AM, Erik Hatcher > > > > >wrote: > > > > > >> With the caveat that if you're not storing the text you want > > >> highlighted, > > >> you'll have to retrieve it somehow and send it into the Highlighter > > >> yourself. > > >> > > >> Erik > > >> > > >> > > >> On Mar 7, 2009, at 5:40 AM, Michael McCandless wrote: > > >> > > >> > > >>> You should look at contrib/highlighter, which does exactly this. > > >>> > > >>> Mike > > >>> > > >>> Amin Mohammed-Coleman wrote: > > >>> > > >>> Hi > > >>>> I am currently indexing documents (pdf, ms word, etc) that are > > >>>> uploaded, > > >>>> these documents can be searched and what the search returns to > > >>>> the user > > >>>> are > > >>>> summaries of the documents. Currently the summaries are > > >>>> extracted when > > >>>> indexing the file (summary constructed by taking the first 10 > > >>>> lines of > > >>>> the > > >>>> document and stored in the index as field). This is not ideal > > >>>> (static > > >>>> summary), and I was wondering if it would be possible to create a > > >>>> dynamic > > >>>> summary when a hit is found and highlight the terms found. The > > >>>> content > > >>>> of > > >>>> the document is not stored in the index. > > >>>> > > >>>> So basically what I'm looking to do is: > > >>>> > > >>>> 1) PDF indexed > > >>>> 2) PDF body contains the word "search" > > >>>> 3) Do a search and return the hit > > >>>> 4) Construct a summary with the term "search" included. > > >>>> > > >>>> I'm not sure how to go about doing this (I presume it is > > >>>> possible). I > > >>>> would > > >>>> be grateful for any advice. > > >>>> > > >>>> > > >>>> Cheers > > >>>> Amin > > >>>> > > >>> > > >>> > > >>> - > > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >>> > > >> > > >> > > >> - > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > >> > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Lucene Highlighting and Dynamic Summaries
Thanks! The final piece that I needed to do for the project! Cheers Amin On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: > > cool. i will use compression and store in index. is there anything > > special > > i need to for decompressing the text? i presume i can just do > > doc.get("content")? > > thanks for your advice all! > > No just use Field.Store.COMPRESS when adding to index and Document.get() > when fetching. The decompression is automatically done. > > You may think, why not enable compression for all fields? The case is, that > this is an overhead for very small and short fields. So you should only use > it for large contents (it's the same like compressing very small files as > ZIP/GZIP: These files mostly get larger than without compression). > > Uwe > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Lucene Highlighting and Dynamic Summaries
Hi Got it working! Thanks again for your help! Amin On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman wrote: > Thanks! The final piece that I needed to do for the project! > Cheers > > Amin > > On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: > >> > cool. i will use compression and store in index. is there anything >> > special >> > i need to for decompressing the text? i presume i can just do >> > doc.get("content")? >> > thanks for your advice all! >> >> No just use Field.Store.COMPRESS when adding to index and Document.get() >> when fetching. The decompression is automatically done. >> >> You may think, why not enable compression for all fields? The case is, >> that >> this is an overhead for very small and short fields. So you should only >> use >> it for large contents (it's the same like compressing very small files as >> ZIP/GZIP: These files mostly get larger than without compression). >> >> Uwe >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >
Re: Lucene Highlighting and Dynamic Summaries
Hi I am seeing some strange behaviour with the highlighter and I'm wondering if anyone else is experiencing this. In certain instances I don't get a summary being generated. I perform the search and the search returns the correct document. I can see that the lucene document contains the text in the field. However after doing: SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("", ""); //required for highlighting Query query2 = multiSearcher.rewrite(query); Highlighter highlighter = new Highlighter(simpleHTMLFormatter, newQueryScorer(query2)); ... String text= doc.get(FieldNameEnum.BODY.getDescription()); TokenStream tokenStream = analyzer .tokenStream(FieldNameEnum.BODY.getDescription(), new StringReader(text)); String result = highlighter.getBestFragments(tokenStream, text, 3, "..."); the string result is empty. This is very strange, if i try a different term that exists in the document then I get a summary. For example I have a word document that contains the term "document" and "aspectj". If I search for "document" I get the correct document but no highlighted summary. However if I search using "aspectj" I get the same doucment with highlighted summary. Just to mentioned I do rewrite the original query before performing the highlighting. I'm not sure what i'm missing here. Any help would be appreciated. Cheers Amin On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman wrote: > Hi > Got it working! Thanks again for your help! > > > Amin > > > On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman > wrote: > >> Thanks! The final piece that I needed to do for the project! >> Cheers >> >> Amin >> >> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: >> >>> > cool. i will use compression and store in index. is there anything >>> > special >>> > i need to for decompressing the text? i presume i can just do >>> > doc.get("content")? >>> > thanks for your advice all! >>> >>> No just use Field.Store.COMPRESS when adding to index and Document.get() >>> when fetching. The decompression is automatically done. >>> >>> You may think, why not enable compression for all fields? The case is, >>> that >>> this is an overhead for very small and short fields. So you should only >>> use >>> it for large contents (it's the same like compressing very small files as >>> ZIP/GZIP: These files mostly get larger than without compression). >>> >>> Uwe >>> >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >
Re: Lucene Highlighting and Dynamic Summaries
Hi Apologies for re sending this mail. Just wondering if anyone has experienced the below. I'm not sure if this could happen due nature of document. It does seem strange one term search returns summary while another does not even though same document is being returned. I'm asking this so I can code around this if is normal. Apologies again for re sending this mail Cheers Amin Sent from my iPhone On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman wrote: Hi I am seeing some strange behaviour with the highlighter and I'm wondering if anyone else is experiencing this. In certain instances I don't get a summary being generated. I perform the search and the search returns the correct document. I can see that the lucene document contains the text in the field. However after doing: SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("", ""); //required for highlighting Query query2 = multiSearcher.rewrite(query); Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query2)); ... String text= doc.get(FieldNameEnum.BODY.getDescription()); TokenStream tokenStream = analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new StringReader(text)); String result = highlighter.getBestFragments(tokenStream, text, 3, "..."); the string result is empty. This is very strange, if i try a different term that exists in the document then I get a summary. For example I have a word document that contains the term "document" and "aspectj". If I search for "document" I get the correct document but no highlighted summary. However if I search using "aspectj" I get the same doucment with highlighted summary. Just to mentioned I do rewrite the original query before performing the highlighting. I'm not sure what i'm missing here. Any help would be appreciated. Cheers Amin On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman > wrote: Hi Got it working! Thanks again for your help! Amin On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman > wrote: Thanks! The final piece that I needed to do for the project! Cheers Amin On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: > cool. i will use compression and store in index. is there anything > special > i need to for decompressing the text? i presume i can just do > doc.get("content")? > thanks for your advice all! No just use Field.Store.COMPRESS when adding to index and Document.get() when fetching. The decompression is automatically done. You may think, why not enable compression for all fields? The case is, that this is an overhead for very small and short fields. So you should only use it for large contents (it's the same like compressing very small files as ZIP/GZIP: These files mostly get larger than without compression). Uwe - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lucene Highlighting and Dynamic Summaries
Hi Please find attadched a test case plus a document. Just to mention this occurs sometimes for other files. Cheers Amin On Wed, Mar 11, 2009 at 6:11 PM, markharw00d wrote: > If you can supply a Junit test that recreates the problem I think we can > start to make progress on this. > > > > Amin Mohammed-Coleman wrote: > >> Hi >> >> Apologies for re sending this mail. Just wondering if anyone has >> experienced the below. I'm not sure if this could happen due nature of >> document. It does seem strange one term search returns summary while another >> does not even though same document is being returned. >> >> I'm asking this so I can code around this if is normal. >> >> >> Apologies again for re sending this mail >> >> Cheers >> >> Amin >> >> Sent from my iPhone >> >> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman wrote: >> >> Hi >>> >>> I am seeing some strange behaviour with the highlighter and I'm wondering >>> if anyone else is experiencing this. In certain instances I don't get a >>> summary being generated. I perform the search and the search returns the >>> correct document. I can see that the lucene document contains the text in >>> the field. However after doing: >>> >>>SimpleHTMLFormatter simpleHTMLFormatter = new >>> SimpleHTMLFormatter("", ""); >>>//required for highlighting >>>Query query2 = multiSearcher.rewrite(query); >>>Highlighter highlighter = new Highlighter(simpleHTMLFormatter, >>> new QueryScorer(query2)); >>> ... >>> >>> String text= doc.get(FieldNameEnum.BODY.getDescription()); >>>TokenStream tokenStream = >>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new >>> StringReader(text)); >>>String result = highlighter.getBestFragments(tokenStream, >>> text, 3, "..."); >>> >>> >>> the string result is empty. This is very strange, if i try a different >>> term that exists in the document then I get a summary. For example I have a >>> word document that contains the term "document" and "aspectj". If I search >>> for "document" I get the correct document but no highlighted summary. >>> However if I search using "aspectj" I get the same doucment with >>> highlighted summary. >>> >>> Just to mentioned I do rewrite the original query before performing the >>> highlighting. >>> >>> I'm not sure what i'm missing here. Any help would be appreciated. >>> >>> Cheers >>> Amin >>> >>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman >>> wrote: >>> Hi >>> >>> Got it working! Thanks again for your help! >>> >>> >>> Amin >>> >>> >>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman >>> wrote: >>> Thanks! The final piece that I needed to do for the project! >>> >>> Cheers >>> >>> Amin >>> >>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: >>> > cool. i will use compression and store in index. is there anything >>> > special >>> > i need to for decompressing the text? i presume i can just do >>> > doc.get("content")? >>> > thanks for your advice all! >>> >>> No just use Field.Store.COMPRESS when adding to index and Document.get() >>> when fetching. The decompression is automatically done. >>> >>> You may think, why not enable compression for all fields? The case is, >>> that >>> this is an overhead for very small and short fields. So you should only >>> use >>> it for large contents (it's the same like compressing very small files as >>> ZIP/GZIP: These files mostly get larger than without compression). >>> >>> Uwe >>> >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> >>> >>> >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database: >> 270.11.10/1995 - Release Date: 03/11/09 08:28:00 >> >> >> > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lucene Highlighting and Dynamic Summaries
Hi Did both attachments not come through? Cheers Amin On Thu, Mar 12, 2009 at 9:52 AM, mark harwood wrote: > The attachment didn't make it through here. Can you add it as an attachment > to a new JIRA issue? > > Thanks, > Mark > > > > > > ________ > From: Amin Mohammed-Coleman > To: java-user@lucene.apache.org > Sent: Thursday, 12 March, 2009 7:47:20 > Subject: Re: Lucene Highlighting and Dynamic Summaries > > Hi > > Please find attadched a test case plus a document. Just to mention this > occurs sometimes for other files. > > > Cheers > Amin > > > On Wed, Mar 11, 2009 at 6:11 PM, markharw00d > wrote: > > If you can supply a Junit test that recreates the problem I think we can > start to make progress on this. > > > > Amin Mohammed-Coleman wrote: > > Hi > > Apologies for re sending this mail. Just wondering if anyone has > experienced the below.. I'm not sure if this could happen due nature of > document. It does seem strange one term search returns summary while another > does not even though same document is being returned. > > I'm asking this so I can code around this if is normal. > > > Apologies again for re sending this mail > > Cheers > > Amin > > Sent from my iPhone > > On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman wrote: > > > Hi > > I am seeing some strange behaviour with the highlighter and I'm wondering > if anyone else is experiencing this. In certain instances I don't get a > summary being generated. I perform the search and the search returns the > correct document. I can see that the lucene document contains the text in > the field. However after doing: > > SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter(" class=\"highlight\">", ""); > //required for highlighting > Query query2 = multiSearcher.rewrite(query); > Highlighter highlighter = new Highlighter(simpleHTMLFormatter, > new QueryScorer(query2)); > ... > > String text= doc.get(FieldNameEnum.BODY.getDescription()); > TokenStream tokenStream = > analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new > StringReader(text)); > String result = highlighter.getBestFragments(tokenStream, > text, 3, "..."); > > > the string result is empty. This is very strange, if i try a different > term that exists in the document then I get a summary. For example I have a > word document that contains the term "document" and "aspectj". If I search > for "document" I get the correct document but no highlighted summary. > However if I search using "aspectj" I get the same doucment with > highlighted summary. > > Just to mentioned I do rewrite the original query before performing the > highlighting. > > I'm not sure what i'm missing here. Any help would be appreciated. > > Cheers > Amin > > On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman > wrote: > Hi > > Got it working! Thanks again for your help! > > > Amin > > > On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman > wrote: > Thanks! The final piece that I needed to do for the project! > > Cheers > > Amin > > On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: > > cool. i will use compression and store in index. is there anything > > special > > i need to for decompressing the text? i presume i can just do > > doc.get("content")? > > thanks for your advice all! > > No just use Field.Store.COMPRESS when adding to index and Document.get() > when fetching. The decompression is automatically done. > > You may think, why not enable compression for all fields? The case is, that > this is an overhead for very small and short fields. So you should only use > it for large contents (it's the same like compressing very small files as > ZIP/GZIP: These files mostly get larger than without compression). > > Uwe > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database: > 270.11.10/1995 - Release Date: 03/11/09 08:28:00 > > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > >
Re: Lucene Highlighting and Dynamic Summaries
JIRA raised: https://issues.apache.org/jira/browse/LUCENE-1559 Thanks On Thu, Mar 12, 2009 at 11:29 AM, Amin Mohammed-Coleman wrote: > Hi > > Did both attachments not come through? > > Cheers > Amin > > > On Thu, Mar 12, 2009 at 9:52 AM, mark harwood wrote: > >> The attachment didn't make it through here. Can you add it as an >> attachment to a new JIRA issue? >> >> Thanks, >> Mark >> >> >> >> >> >> >> From: Amin Mohammed-Coleman >> To: java-user@lucene.apache.org >> Sent: Thursday, 12 March, 2009 7:47:20 >> Subject: Re: Lucene Highlighting and Dynamic Summaries >> >> Hi >> >> Please find attadched a test case plus a document. Just to mention this >> occurs sometimes for other files. >> >> >> Cheers >> Amin >> >> >> On Wed, Mar 11, 2009 at 6:11 PM, markharw00d >> wrote: >> >> If you can supply a Junit test that recreates the problem I think we can >> start to make progress on this. >> >> >> >> Amin Mohammed-Coleman wrote: >> >> Hi >> >> Apologies for re sending this mail. Just wondering if anyone has >> experienced the below.. I'm not sure if this could happen due nature of >> document. It does seem strange one term search returns summary while another >> does not even though same document is being returned. >> >> I'm asking this so I can code around this if is normal. >> >> >> Apologies again for re sending this mail >> >> Cheers >> >> Amin >> >> Sent from my iPhone >> >> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman wrote: >> >> >> Hi >> >> I am seeing some strange behaviour with the highlighter and I'm wondering >> if anyone else is experiencing this. In certain instances I don't get a >> summary being generated. I perform the search and the search returns the >> correct document. I can see that the lucene document contains the text in >> the field. However after doing: >> >> SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("> class=\"highlight\">", ""); >> //required for highlighting >> Query query2 = multiSearcher.rewrite(query); >> Highlighter highlighter = new Highlighter(simpleHTMLFormatter, >> new QueryScorer(query2)); >> ... >> >> String text= doc.get(FieldNameEnum.BODY.getDescription()); >> TokenStream tokenStream = >> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new >> StringReader(text)); >> String result = highlighter.getBestFragments(tokenStream, >> text, 3, "..."); >> >> >> the string result is empty. This is very strange, if i try a different >> term that exists in the document then I get a summary. For example I have a >> word document that contains the term "document" and "aspectj". If I search >> for "document" I get the correct document but no highlighted summary. >> However if I search using "aspectj" I get the same doucment with >> highlighted summary. >> >> Just to mentioned I do rewrite the original query before performing the >> highlighting. >> >> I'm not sure what i'm missing here. Any help would be appreciated. >> >> Cheers >> Amin >> >> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman >> wrote: >> Hi >> >> Got it working! Thanks again for your help! >> >> >> Amin >> >> >> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman >> wrote: >> Thanks! The final piece that I needed to do for the project! >> >> Cheers >> >> Amin >> >> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: >> > cool. i will use compression and store in index. is there anything >> > special >> > i need to for decompressing the text? i presume i can just do >> > doc.get("content")? >> > thanks for your advice all! >> >> No just use Field.Store.COMPRESS when adding to index and Document.get() >> when fetching. The decompression is automatically done. >> >> You may think, why not enable compression for all fields? The case is, >> that >> this is an overhead for very small and short fields. So you should only >> use >> it for large contents (it's the same like compressing very small files as >> ZIP/GZIP: These files m
Re: Lucene Highlighting and Dynamic Summaries
Hi I have found that it is not issue with POI. I extracted text using PoI but differenlty and the term is extracted properly. When I store the text and retrieve it the term exists. However running the text through highlighter doesn't work I will post test case with plain text file on JIRA. Currently on a cramped train! Cheers On 11 Mar 2009, at 18:11, markharw00d wrote: If you can supply a Junit test that recreates the problem I think we can start to make progress on this. Amin Mohammed-Coleman wrote: Hi Apologies for re sending this mail. Just wondering if anyone has experienced the below. I'm not sure if this could happen due nature of document. It does seem strange one term search returns summary while another does not even though same document is being returned. I'm asking this so I can code around this if is normal. Apologies again for re sending this mail Cheers Amin Sent from my iPhone On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman wrote: Hi I am seeing some strange behaviour with the highlighter and I'm wondering if anyone else is experiencing this. In certain instances I don't get a summary being generated. I perform the search and the search returns the correct document. I can see that the lucene document contains the text in the field. However after doing: SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("", ""); //required for highlighting Query query2 = multiSearcher.rewrite(query); Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query2)); ... String text= doc.get(FieldNameEnum.BODY.getDescription()); TokenStream tokenStream = analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new StringReader(text)); String result = highlighter.getBestFragments(tokenStream, text, 3, "..."); the string result is empty. This is very strange, if i try a different term that exists in the document then I get a summary. For example I have a word document that contains the term "document" and "aspectj". If I search for "document" I get the correct document but no highlighted summary. However if I search using "aspectj" I get the same doucment with highlighted summary. Just to mentioned I do rewrite the original query before performing the highlighting. I'm not sure what i'm missing here. Any help would be appreciated. Cheers Amin On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman > wrote: Hi Got it working! Thanks again for your help! Amin On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman > wrote: Thanks! The final piece that I needed to do for the project! Cheers Amin On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: > cool. i will use compression and store in index. is there anything > special > i need to for decompressing the text? i presume i can just do > doc.get("content")? > thanks for your advice all! No just use Field.Store.COMPRESS when adding to index and Document.get() when fetching. The decompression is automatically done. You may think, why not enable compression for all fields? The case is, that this is an overhead for very small and short fields. So you should only use it for large contents (it's the same like compressing very small files as ZIP/GZIP: These files mostly get larger than without compression). Uwe --- -- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --- - No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database: 270.11.10/1995 - Release Date: 03/11/09 08:28:00 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lucene Highlighting and Dynamic Summaries
JIRA updated. Includes new testcase which shows highlighter not working as expected. On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman wrote: > Hi > > I have found that it is not issue with POI. I extracted text using PoI but > differenlty and the term is extracted properly. When I store the text and > retrieve it the term exists. However running the text through highlighter > doesn't work > > I will post test case with plain text file on JIRA. Currently on a cramped > train! > > Cheers > > > > On 11 Mar 2009, at 18:11, markharw00d wrote: > > If you can supply a Junit test that recreates the problem I think we can >> start to make progress on this. >> >> >> >> Amin Mohammed-Coleman wrote: >> >>> Hi >>> >>> Apologies for re sending this mail. Just wondering if anyone has >>> experienced the below. I'm not sure if this could happen due nature of >>> document. It does seem strange one term search returns summary while another >>> does not even though same document is being returned. >>> >>> I'm asking this so I can code around this if is normal. >>> >>> >>> Apologies again for re sending this mail >>> >>> Cheers >>> >>> Amin >>> >>> Sent from my iPhone >>> >>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman wrote: >>> >>> Hi >>>> >>>> I am seeing some strange behaviour with the highlighter and I'm >>>> wondering if anyone else is experiencing this. In certain instances I >>>> don't >>>> get a summary being generated. I perform the search and the search returns >>>> the correct document. I can see that the lucene document contains the text >>>> in the field. However after doing: >>>> >>>> SimpleHTMLFormatter simpleHTMLFormatter = new >>>> SimpleHTMLFormatter("", ""); >>>> //required for highlighting >>>> Query query2 = multiSearcher.rewrite(query); >>>> Highlighter highlighter = new Highlighter(simpleHTMLFormatter, >>>> new QueryScorer(query2)); >>>> ... >>>> >>>> String text= doc.get(FieldNameEnum.BODY.getDescription()); >>>> TokenStream tokenStream = >>>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new >>>> StringReader(text)); >>>> String result = highlighter.getBestFragments(tokenStream, >>>> text, 3, "..."); >>>> >>>> >>>> the string result is empty. This is very strange, if i try a different >>>> term that exists in the document then I get a summary. For example I have >>>> a >>>> word document that contains the term "document" and "aspectj". If I search >>>> for "document" I get the correct document but no highlighted summary. >>>> However if I search using "aspectj" I get the same doucment with >>>> highlighted summary. >>>> >>>> Just to mentioned I do rewrite the original query before performing the >>>> highlighting. >>>> >>>> I'm not sure what i'm missing here. Any help would be appreciated. >>>> >>>> Cheers >>>> Amin >>>> >>>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman >>>> wrote: >>>> Hi >>>> >>>> Got it working! Thanks again for your help! >>>> >>>> >>>> Amin >>>> >>>> >>>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman < >>>> ami...@gmail.com> wrote: >>>> Thanks! The final piece that I needed to do for the project! >>>> >>>> Cheers >>>> >>>> Amin >>>> >>>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: >>>> > cool. i will use compression and store in index. is there anything >>>> > special >>>> > i need to for decompressing the text? i presume i can just do >>>> > doc.get("content")? >>>> > thanks for your advice all! >>>> >>>> No just use Field.Store.COMPRESS when adding to index and Document.get() >>>> when fetching. The decompression is automatically done. >>>> >>>> You may think, why not enable compression for all fields? The case is, >>>> that >>>> this is an overhead for very small and short fields. So you should only >>>> use >>>> it for large contents (it's the same like compressing very small files >>>> as >>>> ZIP/GZIP: These files mostly get larger than without compression). >>>> >>>> Uwe >>>> >>>> >>>> - >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> No virus found in this incoming message. >>> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database: >>> 270.11.10/1995 - Release Date: 03/11/09 08:28:00 >>> >>> >>> >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >>
Re: Lucene Highlighting and Dynamic Summaries
I did the following: highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE); which works. On Thu, Mar 12, 2009 at 6:41 PM, Amin Mohammed-Coleman wrote: > JIRA updated. Includes new testcase which shows highlighter not working as > expected. > > > On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman > wrote: > >> Hi >> >> I have found that it is not issue with POI. I extracted text using PoI but >> differenlty and the term is extracted properly. When I store the text and >> retrieve it the term exists. However running the text through highlighter >> doesn't work >> >> I will post test case with plain text file on JIRA. Currently on a cramped >> train! >> >> Cheers >> >> >> >> On 11 Mar 2009, at 18:11, markharw00d wrote: >> >> If you can supply a Junit test that recreates the problem I think we can >>> start to make progress on this. >>> >>> >>> >>> Amin Mohammed-Coleman wrote: >>> >>>> Hi >>>> >>>> Apologies for re sending this mail. Just wondering if anyone has >>>> experienced the below. I'm not sure if this could happen due nature of >>>> document. It does seem strange one term search returns summary while >>>> another >>>> does not even though same document is being returned. >>>> >>>> I'm asking this so I can code around this if is normal. >>>> >>>> >>>> Apologies again for re sending this mail >>>> >>>> Cheers >>>> >>>> Amin >>>> >>>> Sent from my iPhone >>>> >>>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman >>>> wrote: >>>> >>>> Hi >>>>> >>>>> I am seeing some strange behaviour with the highlighter and I'm >>>>> wondering if anyone else is experiencing this. In certain instances I >>>>> don't >>>>> get a summary being generated. I perform the search and the search >>>>> returns >>>>> the correct document. I can see that the lucene document contains the >>>>> text >>>>> in the field. However after doing: >>>>> >>>>> SimpleHTMLFormatter simpleHTMLFormatter = new >>>>> SimpleHTMLFormatter("", ""); >>>>> //required for highlighting >>>>> Query query2 = multiSearcher.rewrite(query); >>>>> Highlighter highlighter = new >>>>> Highlighter(simpleHTMLFormatter, new QueryScorer(query2)); >>>>> ... >>>>> >>>>> String text= doc.get(FieldNameEnum.BODY.getDescription()); >>>>> TokenStream tokenStream = >>>>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new >>>>> StringReader(text)); >>>>> String result = highlighter.getBestFragments(tokenStream, >>>>> text, 3, "..."); >>>>> >>>>> >>>>> the string result is empty. This is very strange, if i try a different >>>>> term that exists in the document then I get a summary. For example I >>>>> have a >>>>> word document that contains the term "document" and "aspectj". If I >>>>> search >>>>> for "document" I get the correct document but no highlighted summary. >>>>> However if I search using "aspectj" I get the same doucment with >>>>> highlighted summary. >>>>> >>>>> Just to mentioned I do rewrite the original query before performing the >>>>> highlighting. >>>>> >>>>> I'm not sure what i'm missing here. Any help would be appreciated. >>>>> >>>>> Cheers >>>>> Amin >>>>> >>>>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman < >>>>> ami...@gmail.com> wrote: >>>>> Hi >>>>> >>>>> Got it working! Thanks again for your help! >>>>> >>>>> >>>>> Amin >>>>> >>>>> >>>>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman < >>>>> ami...@gmail.com> wrote: >>>>> Thanks! The final piece that I needed to do for the project! >>>>> >>>>> Cheers >>>>&g
Re: Lucene Highlighting and Dynamic Summaries
Hi I think that would be good. Probably a silly thing to ask but I guess there is a performance implication by setting it to max value. Is there a general setting that other developers use? Cheers Amin On 12 Mar 2009, at 22:03, Michael McCandless wrote: IndexWriter has such behavior too, and because it was such a common trap (developers could not understand why their content was being truncated), we made that setting explicit, up front so you were aware of it. I think this in general is a reasonable approach for settings that "lose" stuff (content, highlighted terms, etc.). Maybe we should do the same for highlighter? Mike Amin Mohammed-Coleman wrote: I did the following: highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE); which works. On Thu, Mar 12, 2009 at 6:41 PM, Amin Mohammed-Coleman >wrote: JIRA updated. Includes new testcase which shows highlighter not working as expected. On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman >wrote: Hi I have found that it is not issue with POI. I extracted text using PoI but differenlty and the term is extracted properly. When I store the text and retrieve it the term exists. However running the text through highlighter doesn't work I will post test case with plain text file on JIRA. Currently on a cramped train! Cheers On 11 Mar 2009, at 18:11, markharw00d wrote: If you can supply a Junit test that recreates the problem I think we can start to make progress on this. Amin Mohammed-Coleman wrote: Hi Apologies for re sending this mail. Just wondering if anyone has experienced the below. I'm not sure if this could happen due nature of document. It does seem strange one term search returns summary while another does not even though same document is being returned. I'm asking this so I can code around this if is normal. Apologies again for re sending this mail Cheers Amin Sent from my iPhone On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman wrote: Hi I am seeing some strange behaviour with the highlighter and I'm wondering if anyone else is experiencing this. In certain instances I don't get a summary being generated. I perform the search and the search returns the correct document. I can see that the lucene document contains the text in the field. However after doing: SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("", "span>"); //required for highlighting Query query2 = multiSearcher.rewrite(query); Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query2)); ... String text= doc.get(FieldNameEnum.BODY.getDescription()); TokenStream tokenStream = analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new StringReader(text)); String result = highlighter.getBestFragments(tokenStream, text, 3, "..."); the string result is empty. This is very strange, if i try a different term that exists in the document then I get a summary. For example I have a word document that contains the term "document" and "aspectj". If I search for "document" I get the correct document but no highlighted summary. However if I search using "aspectj" I get the same doucment with highlighted summary. Just to mentioned I do rewrite the original query before performing the highlighting. I'm not sure what i'm missing here. Any help would be appreciated. Cheers Amin On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman < ami...@gmail.com> wrote: Hi Got it working! Thanks again for your help! Amin On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman < ami...@gmail.com> wrote: Thanks! The final piece that I needed to do for the project! Cheers Amin On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler wrote: cool. i will use compression and store in index. is there anything special i need to for decompressing the text? i presume i can just do doc.get("content")? thanks for your advice all! No just use Field.Store.COMPRESS when adding to index and Document.get() when fetching. The decompression is automatically done. You may think, why not enable compression for all fields? The case is, that this is an overhead for very small and short fields. So you should only use it for large contents (it's the same like compressing very small files as ZIP/GZIP: These files mostly get larger than without compression). Uwe --- --- --- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user- h...@lucene.apache.org --- --- --- --- No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.0.237 / Virus Data
Re: Lucene Highlighting and Dynamic Summaries
Sweet! When will this highlighter be available? Can I use this now? Cheers! On Fri, Mar 13, 2009 at 10:10 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Amin Mohammed-Coleman wrote: > > I think that would be good. >> > > I'll open an issue. > > Probably a silly thing to ask but I guess there is a performance >> implication by setting it to max value. >> > > Right. And it's tough choosing a default in situations like this -- > performance vs losing stuff. > > However, there's a new highlighter: > >https://issues.apache.org/jira/browse/LUCENE-1522 > > which looks like it may have promising performance and no default "loses > highlighted terms" limit, I think. > > Mike > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Lucene Highlighting and Dynamic Summaries
Absolutely! I have received considerable help from the community and there are so many more stuff I want to ask! Cheers! Amin On Fri, Mar 13, 2009 at 10:41 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > > Well, it's not yet committed. > > You can use it now by pulling the patch attached to the issue & testing it > yourself. If you do so, please report back! This is how Lucene improves. > > I'm hoping we can include it in 2.9... > > Mike > > > On Mar 13, 2009, at 6:35 AM, Amin Mohammed-Coleman wrote: > > Sweet! When will this highlighter be available? Can I use this now? >> >> Cheers! >> >> >> On Fri, Mar 13, 2009 at 10:10 AM, Michael McCandless < >> luc...@mikemccandless.com> wrote: >> >> >>> Amin Mohammed-Coleman wrote: >>> >>> I think that would be good. >>> >>>> >>>> >>> I'll open an issue. >>> >>> Probably a silly thing to ask but I guess there is a performance >>> >>>> implication by setting it to max value. >>>> >>>> >>> Right. And it's tough choosing a default in situations like this -- >>> performance vs losing stuff. >>> >>> However, there's a new highlighter: >>> >>> https://issues.apache.org/jira/browse/LUCENE-1522 >>> >>> which looks like it may have promising performance and no default "loses >>> highlighted terms" limit, I think. >>> >>> Mike >>> >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Lucene Highlighting and Dynamic Summaries
Ok. I tried to apply the patch(s) and completely messed it up (user error). Is there a full example of the highlighter that is available that I can apply and test? Cheers Amin On Fri, Mar 13, 2009 at 12:09 PM, Amin Mohammed-Coleman wrote: > Absolutely! I have received considerable help from the community and there > are so many more stuff I want to ask! > > Cheers! > > Amin > > > On Fri, Mar 13, 2009 at 10:41 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> >> Well, it's not yet committed. >> >> You can use it now by pulling the patch attached to the issue & testing it >> yourself. If you do so, please report back! This is how Lucene improves. >> >> I'm hoping we can include it in 2.9... >> >> Mike >> >> >> On Mar 13, 2009, at 6:35 AM, Amin Mohammed-Coleman wrote: >> >> Sweet! When will this highlighter be available? Can I use this now? >>> >>> Cheers! >>> >>> >>> On Fri, Mar 13, 2009 at 10:10 AM, Michael McCandless < >>> luc...@mikemccandless.com> wrote: >>> >>> >>>> Amin Mohammed-Coleman wrote: >>>> >>>> I think that would be good. >>>> >>>>> >>>>> >>>> I'll open an issue. >>>> >>>> Probably a silly thing to ask but I guess there is a performance >>>> >>>>> implication by setting it to max value. >>>>> >>>>> >>>> Right. And it's tough choosing a default in situations like this -- >>>> performance vs losing stuff. >>>> >>>> However, there's a new highlighter: >>>> >>>> https://issues.apache.org/jira/browse/LUCENE-1522 >>>> >>>> which looks like it may have promising performance and no default "loses >>>> highlighted terms" limit, I think. >>>> >>>> Mike >>>> >>>> >>>> - >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >
Pagination with MultiSearcher
Hi I'm looking at trying to implement pagination for my search project. I've been google-ing for a solution. So far no luck. I've seen implementations of HitCollector which looks promising, however my search method has to completely change. For example I'm currently using the following: search ( query, filter,int, sort) If I use a HitCollector there isn't a search to apply query,hitcollector,sort and filter, unless I'm supposed to apply sort and filter in the hit collector. I would be grateul if anyone could advise me what approach to take. One a side note I just want to thank you all for helping me with many of my issues. I'm hoping this is my last question! Thanks for your patience! Cheers Amin - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: how to index keyword and value
Why don't you create a Lucene document that represents a Person and then index the fields name, age, phone number, etc. Search on the name and then get the corresponding phone number from the search. Cheers Amin On Sun, Mar 15, 2009 at 10:56 AM, Seid Mohammed wrote: > I want to Index Person_Name and associated phone number. > Example: Abebe ===>+2519112332 > later, When I search for Abebe, it should display +2519112332 > any hint > > seid M > > -- > "RABI ZIDNI ILMA" > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Pagination with MultiSearcher
HI Erick Thanks for your reply, glad to see I'm not the only person working/developing on a Sunday! I'm not sure how the FieldSortedHitQueue works and how it can be applied to the search method exposed by MultiSearcher. Would it be possible to clarify abit more or even point to some reference documentation? Cheers Amin On Sun, Mar 15, 2009 at 1:08 PM, Erick Erickson wrote: > You could do something with FieldSortedHitQueue as a post-search > sort, but I wonder if this would work for you... > > public TopFieldDocs > < > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/TopFieldDocs.html > > > *search*(Query < > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Query.html > > > query, > Filter > < > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Filter.html > > > filter, > int n, > Sort > < > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Sort.html > > > sort) >throws IOException > <http://java.sun.com/j2se/1.4/docs/api/java/io/IOException.html> > > > Best > Erick > > > On Sun, Mar 15, 2009 at 2:12 AM, Amin Mohammed-Coleman >wrote: > > > Hi > > > > I'm looking at trying to implement pagination for my search project. I've > > been google-ing for a solution. So far no luck. I've seen implementations > of > > HitCollector which looks promising, however my search method has to > > completely change. > > > > For example I'm currently using the following: > > > > search ( query, filter,int, sort) > > > > If I use a HitCollector there isn't a search to apply > > query,hitcollector,sort and filter, unless I'm supposed to apply sort and > > filter in the hit collector. > > > > I would be grateul if anyone could advise me what approach to take. > > > > One a side note I just want to thank you all for helping me with many of > my > > issues. I'm hoping this is my last question! Thanks for your patience! > > > > > > Cheers > > > > Amin > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > >
Re: how to index keyword and value
When you create a query to the searcher you can specify which field to search on for example: Query query = queryParser.parse(searchTerm); QueryParser is constructed like this: QueryParser queryParser = new AnalyzingQueryParser<http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html#AnalyzingQueryParser(java.lang.String,%20org.apache.lucene.analysis.Analyzer)>("name",new StandardAnalyzer()); Pass the query to the IndexSearcher and you get hits. From the hits you can get the documents and from each matching doucment you can get the phone number field (if you store the number in the index). HTH On Sun, Mar 15, 2009 at 1:32 PM, Seid Mohammed wrote: > dear Erick, that one I have tried the very begining on playing lucene. > I know how to create documents, but my question is I want to create > documents with fields such as person-name and phone-number and so on. > while searching, i will submit a person name so that it will return me > the phone number of that person. > > hope you get my problem > > Thanks a lot > > Seid M > > On 3/15/09, Erick Erickson wrote: > > Have you tried working through the getting started guide at > > http://lucene.apache.org/java/2_4_1/gettingstarted.html? That > > should give you a good idea of how to create a document in Lucene. > > > > > > Best > > Erick > > > > On Sun, Mar 15, 2009 at 8:49 AM, Seid Mohammed > wrote: > > > >> that is exactly my question > >> how can I do that? > >> > >> thanks a lot > >> Seid M > >> > >> On 3/15/09, Amin Mohammed-Coleman wrote: > >> > Why don't you create a Lucene document that represents a Person and > then > >> > index the fields name, age, phone number, etc. Search on the name and > >> then > >> > get the corresponding phone number from the search. > >> > Cheers > >> > Amin > >> > > >> > On Sun, Mar 15, 2009 at 10:56 AM, Seid Mohammed > >> wrote: > >> > > >> >> I want to Index Person_Name and associated phone number. > >> >> Example: Abebe ===>+2519112332 > >> >> later, When I search for Abebe, it should display +2519112332 > >> >> any hint > >> >> > >> >> seid M > >> >> > >> >> -- > >> >> "RABI ZIDNI ILMA" > >> >> > >> >> - > >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> >> > >> >> > >> > > >> > >> > >> -- > >> "RABI ZIDNI ILMA" > >> > >> - > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > > > -- > "RABI ZIDNI ILMA" > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Pagination with MultiSearcher
Hi Erick I've seen the following: TopDocCollector collector = new TopDocCollector(hitsPerPage) and then pass the collector to the seacher. But I'm not sure how I increment the hitsPerPage. Also how do I get the total results returned? In relation to sorting I could basically use Collections.sort(..) or something similar. My search returns a collection of summary objects which I could sort at that stage rather than passing it to the search code. This would mean I could use a collector to do this. Cheers Amin On Mon, Mar 16, 2009 at 1:42 AM, Erick Erickson wrote: > Basically, the FileSortedHitQueue is just a sorting mechanism you > implement yourself. But I can't help but think that there's an easier > way, although I'll have to admit I haven't used MultiSearcher enough > to offer much guidance. That'll teach me to send something off > on Sunday that I don't really understand well enough > > Sorry 'bout that > Erick > > On Sun, Mar 15, 2009 at 9:15 AM, Amin Mohammed-Coleman >wrote: > > > HI Erick > > Thanks for your reply, glad to see I'm not the only person > > working/developing on a Sunday! I'm not sure how the FieldSortedHitQueue > > works and how it can be applied to the search method exposed by > > MultiSearcher. Would it be possible to clarify abit more or even point > to > > some reference documentation? > > > > Cheers > > Amin > > > > On Sun, Mar 15, 2009 at 1:08 PM, Erick Erickson > >wrote: > > > > > You could do something with FieldSortedHitQueue as a post-search > > > sort, but I wonder if this would work for you... > > > > > > public TopFieldDocs > > > < > > > > > > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/TopFieldDocs.html > > > > > > > *search*(Query < > > > > > > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Query.html > > > > > > > query, > > > Filter > > > < > > > > > > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Filter.html > > > > > > > filter, > > > int n, > > > Sort > > > < > > > > > > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Sort.html > > > > > > > sort) > > >throws IOException > > > <http://java.sun.com/j2se/1.4/docs/api/java/io/IOException.html> > > > > > > > > > Best > > > Erick > > > > > > > > > On Sun, Mar 15, 2009 at 2:12 AM, Amin Mohammed-Coleman < > ami...@gmail.com > > > >wrote: > > > > > > > Hi > > > > > > > > I'm looking at trying to implement pagination for my search project. > > I've > > > > been google-ing for a solution. So far no luck. I've seen > > implementations > > > of > > > > HitCollector which looks promising, however my search method has to > > > > completely change. > > > > > > > > For example I'm currently using the following: > > > > > > > > search ( query, filter,int, sort) > > > > > > > > If I use a HitCollector there isn't a search to apply > > > > query,hitcollector,sort and filter, unless I'm supposed to apply sort > > and > > > > filter in the hit collector. > > > > > > > > I would be grateul if anyone could advise me what approach to take. > > > > > > > > One a side note I just want to thank you all for helping me with many > > of > > > my > > > > issues. I'm hoping this is my last question! Thanks for your > patience! > > > > > > > > > > > > Cheers > > > > > > > > Amin > > > > > > > > > > > > - > > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > > > > > > >
Re: Pagination with MultiSearcher
Hi I've come across the PageHitCollector class from the: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200707.mbox/%3c070320071521.6119.468a6964000b3e7517e72205884484070a9c0701030...@comcast.net%3e I'm looking at using this in the multisearcher class, and do: search(query,filter,pageHitCollector) I intend to use comparators to do the sorting and use collections.sort(). I would be grateful for any feedback on whether this is a good approach. Cheers Amin On Mon, Mar 16, 2009 at 8:03 AM, Amin Mohammed-Coleman wrote: > Hi Erick > > I've seen the following: > > TopDocCollector collector = new TopDocCollector(hitsPerPage) and then pass > the collector to the seacher. But I'm not sure how I increment the > hitsPerPage. Also how do I get the total results returned? > > In relation to sorting I could basically use Collections.sort(..) or > something similar. My search returns a collection of summary objects which > I could sort at that stage rather than passing it to the search code. This > would mean I could use a collector to do this. > > Cheers > Amin > > > > > On Mon, Mar 16, 2009 at 1:42 AM, Erick Erickson > wrote: > >> Basically, the FileSortedHitQueue is just a sorting mechanism you >> implement yourself. But I can't help but think that there's an easier >> way, although I'll have to admit I haven't used MultiSearcher enough >> to offer much guidance. That'll teach me to send something off >> on Sunday that I don't really understand well enough >> >> Sorry 'bout that >> Erick >> >> On Sun, Mar 15, 2009 at 9:15 AM, Amin Mohammed-Coleman > >wrote: >> >> > HI Erick >> > Thanks for your reply, glad to see I'm not the only person >> > working/developing on a Sunday! I'm not sure how the >> FieldSortedHitQueue >> > works and how it can be applied to the search method exposed by >> > MultiSearcher. Would it be possible to clarify abit more or even point >> to >> > some reference documentation? >> > >> > Cheers >> > Amin >> > >> > On Sun, Mar 15, 2009 at 1:08 PM, Erick Erickson < >> erickerick...@gmail.com >> > >wrote: >> > >> > > You could do something with FieldSortedHitQueue as a post-search >> > > sort, but I wonder if this would work for you... >> > > >> > > public TopFieldDocs >> > > < >> > > >> > >> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/TopFieldDocs.html >> > > > >> > > *search*(Query < >> > > >> > >> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Query.html >> > > > >> > > query, >> > > Filter >> > > < >> > > >> > >> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Filter.html >> > > > >> > > filter, >> > > int n, >> > > Sort >> > > < >> > > >> > >> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Sort.html >> > > > >> > > sort) >> > >throws IOException >> > > <http://java.sun.com/j2se/1.4/docs/api/java/io/IOException.html> >> > > >> > > >> > > Best >> > > Erick >> > > >> > > >> > > On Sun, Mar 15, 2009 at 2:12 AM, Amin Mohammed-Coleman < >> ami...@gmail.com >> > > >wrote: >> > > >> > > > Hi >> > > > >> > > > I'm looking at trying to implement pagination for my search project. >> > I've >> > > > been google-ing for a solution. So far no luck. I've seen >> > implementations >> > > of >> > > > HitCollector which looks promising, however my search method has to >> > > > completely change. >> > > > >> > > > For example I'm currently using the following: >> > > > >> > > > search ( query, filter,int, sort) >> > > > >> > > > If I use a HitCollector there isn't a search to apply >> > > > query,hitcollector,sort and filter, unless I'm supposed to apply >> sort >> > and >> > > > filter in the hit collector. >> > > > >> > > > I would be grateul if anyone could advise me what approach to take. >> > > > >> > > > One a side note I just want to thank you all for helping me with >> many >> > of >> > > my >> > > > issues. I'm hoping this is my last question! Thanks for your >> patience! >> > > > >> > > > >> > > > Cheers >> > > > >> > > > Amin >> > > > >> > > > >> > > > >> - >> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > >> > > > >> > > >> > >> > >
Re: Pagination with MultiSearcher
Hi I've implemented the solution using the PageHitCounter from the link and I have noticed that in certain instances I get a 0 score for queries like "document OR aspectj". has anyone else experienced this? Cheers Amin On Mon, Mar 16, 2009 at 8:07 PM, Amin Mohammed-Coleman wrote: > Hi > I've come across the PageHitCollector class from the: > > > http://mail-archives.apache.org/mod_mbox/lucene-java-user/200707.mbox/%3c070320071521.6119.468a6964000b3e7517e72205884484070a9c0701030...@comcast.net%3e > > I'm looking at using this in the multisearcher class, and do: > > search(query,filter,pageHitCollector) > > I intend to use comparators to do the sorting and use collections.sort(). > > I would be grateful for any feedback on whether this is a good approach. > > Cheers > Amin > > On Mon, Mar 16, 2009 at 8:03 AM, Amin Mohammed-Coleman > wrote: > >> Hi Erick >> >> I've seen the following: >> >> TopDocCollector collector = new TopDocCollector(hitsPerPage) and then pass >> the collector to the seacher. But I'm not sure how I increment the >> hitsPerPage. Also how do I get the total results returned? >> >> In relation to sorting I could basically use Collections.sort(..) or >> something similar. My search returns a collection of summary objects which >> I could sort at that stage rather than passing it to the search code. This >> would mean I could use a collector to do this. >> >> Cheers >> Amin >> >> >> >> >> On Mon, Mar 16, 2009 at 1:42 AM, Erick Erickson >> wrote: >> >>> Basically, the FileSortedHitQueue is just a sorting mechanism you >>> implement yourself. But I can't help but think that there's an easier >>> way, although I'll have to admit I haven't used MultiSearcher enough >>> to offer much guidance. That'll teach me to send something off >>> on Sunday that I don't really understand well enough >>> >>> Sorry 'bout that >>> Erick >>> >>> On Sun, Mar 15, 2009 at 9:15 AM, Amin Mohammed-Coleman >> >wrote: >>> >>> > HI Erick >>> > Thanks for your reply, glad to see I'm not the only person >>> > working/developing on a Sunday! I'm not sure how the >>> FieldSortedHitQueue >>> > works and how it can be applied to the search method exposed by >>> > MultiSearcher. Would it be possible to clarify abit more or even point >>> to >>> > some reference documentation? >>> > >>> > Cheers >>> > Amin >>> > >>> > On Sun, Mar 15, 2009 at 1:08 PM, Erick Erickson < >>> erickerick...@gmail.com >>> > >wrote: >>> > >>> > > You could do something with FieldSortedHitQueue as a post-search >>> > > sort, but I wonder if this would work for you... >>> > > >>> > > public TopFieldDocs >>> > > < >>> > > >>> > >>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/TopFieldDocs.html >>> > > > >>> > > *search*(Query < >>> > > >>> > >>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Query.html >>> > > > >>> > > query, >>> > > Filter >>> > > < >>> > > >>> > >>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Filter.html >>> > > > >>> > > filter, >>> > > int n, >>> > > Sort >>> > > < >>> > > >>> > >>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Sort.html >>> > > > >>> > > sort) >>> > >throws IOException >>> > > <http://java.sun.com/j2se/1.4/docs/api/java/io/IOException.html> >>> > > >>> > > >>> > > Best >>> > > Erick >>> > > >>> > > >>> > > On Sun, Mar 15, 2009 at 2:12 AM, Amin Mohammed-Coleman < >>> ami...@gmail.com >>> > > >wrote: >>> > > >>> > > > Hi >>> > > > >>> > > > I'm looking at trying to implement pagination for my search >>> project. >>> > I've >>> > > > been google-ing for a solution. So far no luck. I've seen >>> > implementations >>> > > of >>> > > > HitCollector which looks promising, however my search method has to >>> > > > completely change. >>> > > > >>> > > > For example I'm currently using the following: >>> > > > >>> > > > search ( query, filter,int, sort) >>> > > > >>> > > > If I use a HitCollector there isn't a search to apply >>> > > > query,hitcollector,sort and filter, unless I'm supposed to apply >>> sort >>> > and >>> > > > filter in the hit collector. >>> > > > >>> > > > I would be grateul if anyone could advise me what approach to take. >>> > > > >>> > > > One a side note I just want to thank you all for helping me with >>> many >>> > of >>> > > my >>> > > > issues. I'm hoping this is my last question! Thanks for your >>> patience! >>> > > > >>> > > > >>> > > > Cheers >>> > > > >>> > > > Amin >>> > > > >>> > > > >>> > > > >>> - >>> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org >>> > > > >>> > > > >>> > > >>> > >>> >> >> >
Re: Pagination with MultiSearcher
Hi Please ignore the problem I raised. User error ! Sorry Amin On 19 Mar 2009, at 09:41, Amin Mohammed-Coleman wrote: Hi I've implemented the solution using the PageHitCounter from the link and I have noticed that in certain instances I get a 0 score for queries like "document OR aspectj". has anyone else experienced this? Cheers Amin On Mon, Mar 16, 2009 at 8:07 PM, Amin Mohammed-Coleman > wrote: Hi I've come across the PageHitCollector class from the: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200707.mbox/%3c070320071521.6119.468a6964000b3e7517e72205884484070a9c0701030...@comcast.net%3e I'm looking at using this in the multisearcher class, and do: search(query,filter,pageHitCollector) I intend to use comparators to do the sorting and use collections.sort(). I would be grateful for any feedback on whether this is a good approach. Cheers Amin On Mon, Mar 16, 2009 at 8:03 AM, Amin Mohammed-Coleman > wrote: Hi Erick I've seen the following: TopDocCollector collector = new TopDocCollector(hitsPerPage) and then pass the collector to the seacher. But I'm not sure how I increment the hitsPerPage. Also how do I get the total results returned? In relation to sorting I could basically use Collections.sort(..) or something similar. My search returns a collection of summary objects which I could sort at that stage rather than passing it to the search code. This would mean I could use a collector to do this. Cheers Amin On Mon, Mar 16, 2009 at 1:42 AM, Erick Erickson > wrote: Basically, the FileSortedHitQueue is just a sorting mechanism you implement yourself. But I can't help but think that there's an easier way, although I'll have to admit I haven't used MultiSearcher enough to offer much guidance. That'll teach me to send something off on Sunday that I don't really understand well enough.... Sorry 'bout that Erick On Sun, Mar 15, 2009 at 9:15 AM, Amin Mohammed-Coleman >wrote: > HI Erick > Thanks for your reply, glad to see I'm not the only person > working/developing on a Sunday! I'm not sure how the FieldSortedHitQueue > works and how it can be applied to the search method exposed by > MultiSearcher. Would it be possible to clarify abit more or even point to > some reference documentation? > > Cheers > Amin > > On Sun, Mar 15, 2009 at 1:08 PM, Erick Erickson >wrote: > > > You could do something with FieldSortedHitQueue as a post-search > > sort, but I wonder if this would work for you... > > > > public TopFieldDocs > > < > > > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/TopFieldDocs.html > > > > > *search*(Query < > > > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Query.html > > > > > query, > > Filter > > < > > > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Filter.html > > > > > filter, > > int n, > > Sort > > < > > > http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Sort.html > > > > > sort) > >throws IOException > > <http://java.sun.com/j2se/1.4/docs/api/java/io/IOException.html> > > > > > > Best > > Erick > > > > > > On Sun, Mar 15, 2009 at 2:12 AM, Amin Mohammed-Coleman > >wrote: > > > > > Hi > > > > > > I'm looking at trying to implement pagination for my search project. > I've > > > been google-ing for a solution. So far no luck. I've seen > implementations > > of > > > HitCollector which looks promising, however my search method has to > > > completely change. > > > > > > For example I'm currently using the following: > > > > > > search ( query, filter,int, sort) > > > > > > If I use a HitCollector there isn't a search to apply > > > query,hitcollector,sort and filter, unless I'm supposed to apply sort > and > > > filter in the hit collector. > > > > > > I would be grateul if anyone could advise me what approach to take. > > > > > > One a side note I just want to thank you all for helping me with many > of > > my > > > issues. I'm hoping this is my last question! Thanks for your patience! > > > > > > > > > Cheers > > > > > > Amin > > > > > > > > > - > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user- h...@lucene.apache.org > > > > > > > > >
Similarity and Lucene
Hi If I choose to subclass the default similarity, do I need to apply the same subclassed Similarity to IndexReader, IndexWriter and IndexSearcher? I am interested in doing the below: Similarity sim = new DefaultSimilarity() { public float lengthNorm(String field, int numTerms) { if(field.equals("body")) return (float) (0.1 * Math.log(numTerms)); else return super.lengthNorm(field, numTerms); } } [taken from http://www.lucenetutorial.com/advanced-topics/scoring.html] Is this approach advisable? Cheers Amin
Re: Similarity and Lucene
Allthough (I could be wrong) but I'm wondering if the lenthNorm is the correct one I should be overriding. I'm interested in the number of times a term occurs found in a document (more occurance the higher the score) which I believe is coord. I may well be i am barking up the wrong tree. Cheers Amin On Fri, Mar 20, 2009 at 4:20 PM, Amin Mohammed-Coleman wrote: > Hi > > If I choose to subclass the default similarity, do I need to apply the > same subclassed Similarity to IndexReader, IndexWriter and IndexSearcher? > > I am interested in doing the below: > > Similarity sim = new DefaultSimilarity() { > public float lengthNorm(String field, int numTerms) { > if(field.equals("body")) return (float) (0.1 * Math.log(numTerms)); > else return super.lengthNorm(field, numTerms); > } > } > > [taken from http://www.lucenetutorial.com/advanced-topics/scoring.html] > > Is this approach advisable? > > > Cheers > Amin >
Re: Performance tips on searching
Hi How do you expose a pagination without a customized hit collector. The multi searcher does not expose a method for hit collector and sort. Maybe this is not an issue for people ... Cheers Amin On 20 Mar 2009, at 17:25, "Uwe Schindler" wrote: Why not use a MultiSearcher an all single searchers? Or a Searcher on a MultiReader consisting of all IndexReaders? With that you do not need to merge the results. By the way: instead of creating a TopDocCollector, you could also call directly, Searcher.search(Query query, Filter filter, int n, Sort sort) Searcher.search(Query query, Filter filter, int n) Filter can be null. It's shorter and if sorting is also involved, simplier to handle (you do not need to switch between ToDocCollector and TopFieldDocCollector). Important: With Lucene 2.9, the searches will be faster using this API (because then each index segment uses an own collector). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Friday, March 20, 2009 6:02 PM To: java-user@lucene.apache.org Subject: Performance tips on searching Hi, my code receives a search query from the web, there are 5 different searches that can be searched on - each index is searched with a single IndexSearcher referenced in a map. it parses then performs the search and return the best 10 results, with scores readjusted over the results so that the best score returns 1.0. Am I performing the optiminal search methods to do what I want ? thanks Paul IndexSearcher searcher = searchers.get(indexName); QueryParser parser = new QueryParser(indexName, analyzer); TopDocCollector collector = new TopDocCollector(10); try { searcher.search(parser.parse(query), collector); } catch (ParseException e) { } Results results = new Results(); results.totalHits = collector.getTotalHits(); TopDocs topDocs = collector.topDocs(); ScoreDoc docs[] = topDocs.scoreDocs; float maxScore = topDocs.getMaxScore(); for (int i = 0; i < docs.length; i++) { Result result = new Result(); result.score = docs[i].score / maxScore; result.doc = searcher.doc(docs[i].doc); results.results.add(result); } return results; - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Performance tips on searching
Hi I wrote last week about the best way to paginate. I will reply back with that email if that ok. This isn't my thread and I don't want to deviate from the original topic. Cheers Amin On 20 Mar 2009, at 17:50, "Uwe Schindler" wrote: No, the MultiSearcher also exposes all methods, IndexSearcher/Seracher exposes (it inherits it from the superclass IndexSearcher). And a call to the collector is never sortable, because the sorting is done *inside* the hit collector. Where is your problem with pagination? Normally you choose n to be paginationoffset+count and then display Scoredocs between n .. n +count-1. There is no TopDocCollector that can only collect results 100 to 109. To display results 100 to 109, you need to collect all results up to 109, so call with n=110 and then display scoredoc[100]..scoredoc[109] This is exactly how the old Hits worked. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Amin Mohammed-Coleman [mailto:ami...@gmail.com] Sent: Friday, March 20, 2009 6:43 PM To: java-user@lucene.apache.org Cc: ; Subject: Re: Performance tips on searching Hi How do you expose a pagination without a customized hit collector. The multi searcher does not expose a method for hit collector and sort. Maybe this is not an issue for people ... Cheers Amin On 20 Mar 2009, at 17:25, "Uwe Schindler" wrote: Why not use a MultiSearcher an all single searchers? Or a Searcher on a MultiReader consisting of all IndexReaders? With that you do not need to merge the results. By the way: instead of creating a TopDocCollector, you could also call directly, Searcher.search(Query query, Filter filter, int n, Sort sort) Searcher.search(Query query, Filter filter, int n) Filter can be null. It's shorter and if sorting is also involved, simplier to handle (you do not need to switch between ToDocCollector and TopFieldDocCollector). Important: With Lucene 2.9, the searches will be faster using this API (because then each index segment uses an own collector). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Paul Taylor [mailto:paul_t...@fastmail.fm] Sent: Friday, March 20, 2009 6:02 PM To: java-user@lucene.apache.org Subject: Performance tips on searching Hi, my code receives a search query from the web, there are 5 different searches that can be searched on - each index is searched with a single IndexSearcher referenced in a map. it parses then performs the search and return the best 10 results, with scores readjusted over the results so that the best score returns 1.0. Am I performing the optiminal search methods to do what I want ? thanks Paul IndexSearcher searcher = searchers.get(indexName); QueryParser parser = new QueryParser(indexName, analyzer); TopDocCollector collector = new TopDocCollector(10); try { searcher.search(parser.parse(query), collector); } catch (ParseException e) { } Results results = new Results(); results.totalHits = collector.getTotalHits(); TopDocs topDocs = collector.topDocs(); ScoreDoc docs[] = topDocs.scoreDocs; float maxScore = topDocs.getMaxScore(); for (int i = 0; i < docs.length; i++) { Result result = new Result(); result.score = docs[i].score / maxScore; result.doc = searcher.doc(docs[i].doc); results.results.add(result); } return results; --- -- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --- -- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: question about grouping text
Hi I was wondering if soemthing like LingPipe or Gate (for text extraction) might be an idea? I've started looking at it and I'm just thinking it may be applicable (I maybe wrong). Cheers Amin On Wed, Mar 25, 2009 at 4:18 PM, Grant Ingersoll wrote: > Hi MFM, > > This comes down to a preprocessing step that you would have to do before > putting into Lucene, although I suppose you might be able to identify it > during analysis and use the TeeTokenFilter and the SinkTokenizer. Once you > do this, then you can add them as fields on a Document. I know that's not a > great help, but not much Lucene can do b/c it is application specific. > > Document/field wise, I would probably have: > Document > question > answer > > Then, when you search in the question field, you can also retrieve the > answer. > > -Grant > > > On Mar 24, 2009, at 4:04 PM, MFM wrote: > > >> I have been able to successfully index and search text from structured >> documents like PDF and MS Word. I am having a real hard time trying to >> figure out how to group the index strings together e.g. if my document had >> a >> question and answer in a table, the search will produce the text with the >> question based on the keyword. How would I group or associate the question >> and answer as part of the indexing ? I have tried using POI to read thru >> the >> MS Word file and try and group them, but then it gets really intense into >> pattern matching. >> >> Thanks >> MFM >> -- >> View this message in context: >> http://www.nabble.com/question-about-grouping-text-tp22682433p22682433.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Syncing lucene index with a database
Hi I was going to suggest looking at hibernate search. It comes with event listeners that modify your indexes when the persistent entity changes. It use lucene under the hood so if you need to access lucene the you can. Indexing can be done sync or async and the documentation shows how to set jms. There other benefits of hibernate search which you find on the site and documentation. HTH Amin On 27 Mar 2009, at 00:03, Tim Williams wrote: On Thu, Mar 26, 2009 at 6:28 PM, Matt Schraeder wrote: I'm new to Lucene and just beginning my project of adding it to our web app. We are indexing data from a MS SQL 2000 database and building full-text search from it. Everything I have read says that building the index is a resource heavy operation so we should use it sparingly. For the most part the database table we are working from is updated once a day so as soon as the table itself is updated we can rebuild our Lucene indexes. However, there are a few feilds that get updated with a cronjob every 15 minutes. In terms of speed and efficiency, what would be a better system for keeping our data synced between the database and Lucene? Of course one option would be to rebuild the Lucene index each time the cronjob runs to keep the database and Lucene index synced. We could either return the entire database table, loop through the rows, get a row's document in lucene remove/readd it, and do that for each row. Alternatively after we update the main table we return just the rows that were changed, loop through those and remove/readd them in lucene, and do that for just the rows that have changed. Alternatively I have thought of using Lucene purely for search to return just the primary key of items from our database table, then query the database for those items and get the most up to date data from the database to actually display our search results. This would let us use Lucene's superior searching capabilities and searching speed, but would still require us to pull the data to be displayed from the database. Another option is that we could do the same, but only return the fields that could change frequently. This would use Lucene to store and index the majority of what is displayed on a search results page, only using the database to return the 2 or 3 fields that might change in a search for each row that lucene returns. I'm honestly not sure what the "proper" choice should be, or if it really depends on our own test cases. Is it perfectly okay to run an index update every 15 minutes? How much difference would it make in terms of search time to search with lucene AND pull from the database? My main issue with searching with lucene but getting the actual data from the database is that it seems like that would make our current search system that is entirely database driven to run slower. Not sure what ORM framework, if any, you might be using, but some colleagues have had some success using Hibernate Search[1] for this sorta thing. I've not used it, just a pointer in case you haven't come across it... seems that it would keep you above some low-level details if it fits... --tim [1] - http://www.hibernate.org/410.html - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org