hit exception flushing segment _0 - IndexWriter configuration

2010-08-01 Thread Amin Mohammed-Coleman
Hi

I am currently building an application whereby there is a remote index server 
(yes it probably does sound like Solr :)) and users use my API to send 
documents to the indexing server for indexing.  The 2 methods primarily used is 
add and commit. So the user can send requests for documents to be added to the 
index and then can call commit.  I did a test where i simulated a user calling 
the add method 10 times and then in a separate method call invoked commit.   
The thing I noticed when i turned the verbose setting for the IndexWriter was:

hit exception flushing segment _0

It may be worth mention the settings I have for my index writer:

mergeFactor ="100" 
maxMergeDocs = "999" 


When i use my api to add 102 documents and then in a separate method call 
invoke a commit I get no exception.  So I was wondering what is the best 
setting for the mergeFactor, and should i be experiencing this exception after 
requesting a commit after adding 10 documents to the index? 


Any help would be appreciated.


Thanks
Amin
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: hit exception flushing segment _0 - IndexWriter configuration

2010-08-03 Thread Amin Mohammed-Coleman
Hi

Apologies for re sending this email but I was just wondering if any one might 
be able to advise on the below. I'm not sure if I've provided enough info. 

Again any help would be appreciated. 

Amin

Sent from my iPhone

On 1 Aug 2010, at 20:00, Amin Mohammed-Coleman  wrote:

> Hi
> 
> I am currently building an application whereby there is a remote index server 
> (yes it probably does sound like Solr :)) and users use my API to send 
> documents to the indexing server for indexing.  The 2 methods primarily used 
> is add and commit. So the user can send requests for documents to be added to 
> the index and then can call commit.  I did a test where i simulated a user 
> calling the add method 10 times and then in a separate method call invoked 
> commit.   The thing I noticed when i turned the verbose setting for the 
> IndexWriter was:
> 
> hit exception flushing segment _0
> 
> It may be worth mention the settings I have for my index writer:
> 
> mergeFactor ="100" 
> maxMergeDocs = "999" 
> 
> 
> When i use my api to add 102 documents and then in a separate method call 
> invoke a commit I get no exception.  So I was wondering what is the best 
> setting for the mergeFactor, and should i be experiencing this exception 
> after requesting a commit after adding 10 documents to the index? 
> 
> 
> Any help would be appreciated.
> 
> 
> Thanks
> Amin

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: hit exception flushing segment _0 - IndexWriter configuration

2010-08-03 Thread Amin Mohammed-Coleman
Somewhat embarrassingly I can't seem to reproduce the problem anymore!  I've 
tried to reproduce it for the last hour now and no luck.  Sorry about that. If 
it happens again then I'll post back to the list.

Thanks for your time.

Amin


On 3 Aug 2010, at 22:35, Michael McCandless wrote:

> Can you post the full exception?  And also the log output from
> IndexWriter.setInfoStream.
> 
> Mike
> 
> On Tue, Aug 3, 2010 at 5:28 PM, Amin Mohammed-Coleman  
> wrote:
>> Hi
>> 
>> Apologies for re sending this email but I was just wondering if any one 
>> might be able to advise on the below. I'm not sure if I've provided enough 
>> info.
>> 
>> Again any help would be appreciated.
>> 
>> Amin
>> 
>> Sent from my iPhone
>> 
>> On 1 Aug 2010, at 20:00, Amin Mohammed-Coleman  wrote:
>> 
>>> Hi
>>> 
>>> I am currently building an application whereby there is a remote index 
>>> server (yes it probably does sound like Solr :)) and users use my API to 
>>> send documents to the indexing server for indexing.  The 2 methods 
>>> primarily used is add and commit. So the user can send requests for 
>>> documents to be added to the index and then can call commit.  I did a test 
>>> where i simulated a user calling the add method 10 times and then in a 
>>> separate method call invoked commit.   The thing I noticed when i turned 
>>> the verbose setting for the IndexWriter was:
>>> 
>>> hit exception flushing segment _0
>>> 
>>> It may be worth mention the settings I have for my index writer:
>>> 
>>> mergeFactor ="100"
>>> maxMergeDocs = "999"
>>> 
>>> 
>>> When i use my api to add 102 documents and then in a separate method call 
>>> invoke a commit I get no exception.  So I was wondering what is the best 
>>> setting for the mergeFactor, and should i be experiencing this exception 
>>> after requesting a commit after adding 10 documents to the index?
>>> 
>>> 
>>> Any help would be appreciated.
>>> 
>>> 
>>> Thanks
>>> Amin
>> 
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Batch Operation and Commit

2010-08-26 Thread Amin Mohammed-Coleman
Hi


I have a list of batch tasks that need to be executed.  Each batch contains 
1000 documents and basically I use a RAMDirectory based index writer, and at 
the end of adding 1000 documents to the memory  i perform the following:

ramWriter.commit();
indexWriter.addIndexesNoOptimize(ramWriter.getDirectory());
ramWriter.close();



Do I then need to explicitly do an indexWriter.commit()?  It seems as though if 
I don't do an explicit commit the documents aren't added to the index (I've 
inspected via Luke).  I would've thought that the 
indexWriter.addIndexesNoOptimize would not require me to call the commit 
explicitly.  Is this a correct assumption? or should i call the commit 
explicitly for my disk based index writer?  

The main idea behind this is that each batch can be executed in a seperate 
thread and there is only on shared index writer.  

Any help would be appreciated.


Thanks
Amin
  
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Batch Operation and Commit

2010-08-26 Thread Amin Mohammed-Coleman
Hi Erick

Thanks for your response.  I used the Lucene in Action 1st edition as a 
reference for batch indexing. I've just got my copy of the 2nd edition which 
mentions that there is no point in using RAM directory.  Not saying I don't 
trust you :).

I'll update my code to use the normal fs directory for batch.


Thanks
Amin

On 26 Aug 2010, at 19:33, Erick Erickson wrote:

> I'm going to sidestep your question and ask why you're using
> a RAMDirectory in the first place. People often think it'll
> speed up their indexing because it's in RAM, but the
> normal FS-based indexing caches in RAM too, and you
> can use various settings governing segments, ramusage
> etc. to control how often you flush to disk. So unless you're
> certain you need to, I'd just forget the whole RAM thing .
> 
> You must close your indexwriter OR commit the changes
> before you can see your changes, see IndexWriter.close/commit
> 
> Best
> Erick
> 
> 
> 
> On Thu, Aug 26, 2010 at 10:42 AM, Amin Mohammed-Coleman 
> wrote:
> 
>> Hi
>> 
>> 
>> I have a list of batch tasks that need to be executed.  Each batch contains
>> 1000 documents and basically I use a RAMDirectory based index writer, and at
>> the end of adding 1000 documents to the memory  i perform the following:
>> 
>>   ramWriter.commit();
>>   indexWriter.addIndexesNoOptimize(ramWriter.getDirectory());
>>   ramWriter.close();
>> 
>> 
>> 
>> Do I then need to explicitly do an indexWriter.commit()?  It seems as
>> though if I don't do an explicit commit the documents aren't added to the
>> index (I've inspected via Luke).  I would've thought that the
>> indexWriter.addIndexesNoOptimize would not require me to call the commit
>> explicitly.  Is this a correct assumption? or should i call the commit
>> explicitly for my disk based index writer?
>> 
>> The main idea behind this is that each batch can be executed in a seperate
>> thread and there is only on shared index writer.
>> 
>> Any help would be appreciated.
>> 
>> 
>> Thanks
>> Amin
>> 
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



TermRangeQuery

2010-11-26 Thread Amin Mohammed-Coleman
Hi All

I was wondering whether I can use TermRangeQuery for my use case.  I have a 
collection of ids (represented as XDF-123) and I would like to do a search for 
all the ids (might be in the range of 1) and for each matching id I want to 
get the corresponding data that is stored in the index (for example the 
document contains id and string value).  I am using a custom collector to 
collect that string value for each match.  Is it ok to use a TermRangeQuery for 
the ids rather than creating a massive query string?


Thanks
Amin
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: TermRangeQuery

2010-11-26 Thread Amin Mohammed-Coleman
Hi

Basically test my ids look like:

AAA-231
AAD-234
ADD-123

Didn't now about the collator, i was going to do a custom sort based on the 
number part of the id.


Thanks
Amin

On 26 Nov 2010, at 14:39, Ian Lea wrote:

> Absolutely, as long as your ids will sort as you expect.
> 
> I'm not clear what you mean by XDF-123 but if you've got
> 
> AAA-123
> AAA-124
> ...
> ABC-123
> ABC-234
> etc.
> 
> then you'll be fine.  If they don't sort so neatly you can use the
> TermRangeQuery constructor that takes a Collator but note the
> performance warning in the javadocs.
> 
> 
> --
> Ian.
> 
> 
> On Fri, Nov 26, 2010 at 2:18 PM, Amin Mohammed-Coleman  
> wrote:
>> Hi All
>> 
>> I was wondering whether I can use TermRangeQuery for my use case.  I have a 
>> collection of ids (represented as XDF-123) and I would like to do a search 
>> for all the ids (might be in the range of 1) and for each matching id I 
>> want to get the corresponding data that is stored in the index (for example 
>> the document contains id and string value).  I am using a custom collector 
>> to collect that string value for each match.  Is it ok to use a 
>> TermRangeQuery for the ids rather than creating a massive query string?
>> 
>> 
>> Thanks
>> Amin
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: TermRangeQuery

2010-11-26 Thread Amin Mohammed-Coleman
Hi

Unfortunately my range query approach did not work.   It seems to be related to 
the ids themselves.  The list has ids that look this:


ID-NYC-1234
ID-LND-1234
TX-NYC-1334
TX-NYC-BBC-123

The ids may range from 90 to 1000.  Is there another approach I could take?  I 
tried building a string with all the ids and set them against a field for 
example:

dataId: ID-NYC-123 dataId: ID-NYC-1234

but that's not a great approach I know...

any help would be appreciated.

Thanks
Amin



On 26 Nov 2010, at 14:39, Ian Lea wrote:

> Absolutely, as long as your ids will sort as you expect.
> 
> I'm not clear what you mean by XDF-123 but if you've got
> 
> AAA-123
> AAA-124
> ...
> ABC-123
> ABC-234
> etc.
> 
> then you'll be fine.  If they don't sort so neatly you can use the
> TermRangeQuery constructor that takes a Collator but note the
> performance warning in the javadocs.
> 
> 
> --
> Ian.
> 
> 
> On Fri, Nov 26, 2010 at 2:18 PM, Amin Mohammed-Coleman  
> wrote:
>> Hi All
>> 
>> I was wondering whether I can use TermRangeQuery for my use case.  I have a 
>> collection of ids (represented as XDF-123) and I would like to do a search 
>> for all the ids (might be in the range of 1) and for each matching id I 
>> want to get the corresponding data that is stored in the index (for example 
>> the document contains id and string value).  I am using a custom collector 
>> to collect that string value for each match.  Is it ok to use a 
>> TermRangeQuery for the ids rather than creating a massive query string?
>> 
>> 
>> Thanks
>> Amin
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: TermRangeQuery

2010-11-26 Thread Amin Mohammed-Coleman
Essentially I'd like to construct a query which is almost like SQL in clause.  
The lucene document contains the id and a string value. I'd like to get the 
string value based on the id key.  The ids may range within 1000. Is this 
possible to do?

Thanks
Amin

Sent from my iPhone

On 26 Nov 2010, at 20:18, Ian Lea  wrote:

> What sort of ranges are you trying to use?  Maybe you could store a
> separate field, just for these queries, with some normalized form of
> the ids, with all numbers padded out to the same length etc.
> 
> --
> Ian.
> 
> On Fri, Nov 26, 2010 at 4:34 PM, Amin Mohammed-Coleman  
> wrote:
>> Hi
>> 
>> Unfortunately my range query approach did not work.   It seems to be related 
>> to the ids themselves.  The list has ids that look this:
>> 
>> 
>> ID-NYC-1234
>> ID-LND-1234
>> TX-NYC-1334
>> TX-NYC-BBC-123
>> 
>> The ids may range from 90 to 1000.  Is there another approach I could take?  
>> I tried building a string with all the ids and set them against a field for 
>> example:
>> 
>> dataId: ID-NYC-123 dataId: ID-NYC-1234
>> 
>> but that's not a great approach I know...
>> 
>> any help would be appreciated.
>> 
>> Thanks
>> Amin
>> 
>> 
>> 
>> On 26 Nov 2010, at 14:39, Ian Lea wrote:
>> 
>>> Absolutely, as long as your ids will sort as you expect.
>>> 
>>> I'm not clear what you mean by XDF-123 but if you've got
>>> 
>>> AAA-123
>>> AAA-124
>>> ...
>>> ABC-123
>>> ABC-234
>>> etc.
>>> 
>>> then you'll be fine.  If they don't sort so neatly you can use the
>>> TermRangeQuery constructor that takes a Collator but note the
>>> performance warning in the javadocs.
>>> 
>>> 
>>> --
>>> Ian.
>>> 
>>> 
>>> On Fri, Nov 26, 2010 at 2:18 PM, Amin Mohammed-Coleman  
>>> wrote:
>>>> Hi All
>>>> 
>>>> I was wondering whether I can use TermRangeQuery for my use case.  I have 
>>>> a collection of ids (represented as XDF-123) and I would like to do a 
>>>> search for all the ids (might be in the range of 1) and for each 
>>>> matching id I want to get the corresponding data that is stored in the 
>>>> index (for example the document contains id and string value).  I am using 
>>>> a custom collector to collect that string value for each match.  Is it ok 
>>>> to use a TermRangeQuery for the ids rather than creating a massive query 
>>>> string?
>>>> 
>>>> 
>>>> Thanks
>>>> Amin
>>>> -
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>> 
>>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: TermRangeQuery

2010-11-28 Thread Amin Mohammed-Coleman
Hi

I'll explain my use case more and then explain the out come of my 
implementation:

I have lucene documents that look like this:


Field name   Field Value
dataId TYX-CC-124
categoryCATEGORY A


What I would like to do is for a given collection of dataIds I'd like to get 
it's corresponding category.  The collection of ids that will be passed into my 
method will vary.  Also the id prefix (TYX-CC) will be different for different 
groups invoking my method.  For example I may have ids that look like below:

TX-CC-124
AVC-FF-124

and so on.

So I tried sorting the list of ids before creating the range query based on the 
numeric part of the id. This did not work as the number of ids returned from 
the query was greater than the input ids.  

You mentioned padding the number part of the ids but will that work in the case 
of the following:

aa-01
bb-01

If pass in aa-01 as the lower range, the query will return the result for bb-01 
as well (unless I have mis understood the usage of the range query).   

In order to get things moving i decided to create a boolean query and 
essentially batch the queries to avoid hitting too many clause exception.   So 
for each 1000 ids I create a boolean query with the ids being passed in.   This 
may not be the best approach but I can't seem to get my head around how the 
range query can be used considering the numeric part of the id is essentially 
not unique.  


Thanks
Amin



On 28 Nov 2010, at 18:19, Erick Erickson wrote:

> Why won't Ian's suggestion work? You haven't really given us a clue what it
> is about
> your attempt that didn't work. The expected and actual output would be
> useful...
> 
> But Ian's notion is the well-known issue that lexical and numeric sorting
> aren't
> at all the same. You'd get reasonable results if you left-padded the number
> portion of the IDs with 0 out to 4 spaces, thus
> aa-90   -> aa-0090
> aa-123 -> aa-0123
> aa-1000 > aa-1000
> 
> and your range queries should work. You might have to transform them
> back when displayed. Or you could add them to your document twice.
> Once in a "hidden" field, the one you searched against in your range query
> and the other to display. This latter wouldn't bloat your index (much) since
> you would store one and index the other
> 
> Best
> Erick
> 
> On Fri, Nov 26, 2010 at 5:01 PM, Amin Mohammed-Coleman 
> wrote:
> 
>> Essentially I'd like to construct a query which is almost like SQL in
>> clause.  The lucene document contains the id and a string value. I'd like to
>> get the string value based on the id key.  The ids may range within 1000. Is
>> this possible to do?
>> 
>> Thanks
>> Amin
>> 
>> Sent from my iPhone
>> 
>> On 26 Nov 2010, at 20:18, Ian Lea  wrote:
>> 
>>> What sort of ranges are you trying to use?  Maybe you could store a
>>> separate field, just for these queries, with some normalized form of
>>> the ids, with all numbers padded out to the same length etc.
>>> 
>>> --
>>> Ian.
>>> 
>>> On Fri, Nov 26, 2010 at 4:34 PM, Amin Mohammed-Coleman 
>> wrote:
>>>> Hi
>>>> 
>>>> Unfortunately my range query approach did not work.   It seems to be
>> related to the ids themselves.  The list has ids that look this:
>>>> 
>>>> 
>>>> ID-NYC-1234
>>>> ID-LND-1234
>>>> TX-NYC-1334
>>>> TX-NYC-BBC-123
>>>> 
>>>> The ids may range from 90 to 1000.  Is there another approach I could
>> take?  I tried building a string with all the ids and set them against a
>> field for example:
>>>> 
>>>> dataId: ID-NYC-123 dataId: ID-NYC-1234
>>>> 
>>>> but that's not a great approach I know...
>>>> 
>>>> any help would be appreciated.
>>>> 
>>>> Thanks
>>>> Amin
>>>> 
>>>> 
>>>> 
>>>> On 26 Nov 2010, at 14:39, Ian Lea wrote:
>>>> 
>>>>> Absolutely, as long as your ids will sort as you expect.
>>>>> 
>>>>> I'm not clear what you mean by XDF-123 but if you've got
>>>>> 
>>>>> AAA-123
>>>>> AAA-124
>>>>> ...
>>>>> ABC-123
>>>>> ABC-234
>>>>> etc.
>>>>> 
>>>>> then you'll be fine.  If they don't sort so neatly you can use the
>>>>> TermRangeQuery constructor that takes a Collator but note the
>>>>> p

Wildcard Case Sensitivity

2011-01-20 Thread Amin Mohammed-Coleman
Hi

Apologies up front if this question has been asked before.

I have a document which contains a field that stores an untokenized value such 
as TEST_TYPE.  The analyser used is StandardAnalyzer and I pass the same 
analyzer into the query.  I perform the following query : fieldName:TEST_*, 
however this does not return any results.  Is this the expected behaviour?  Can 
I use capital letters in my wildcard query or do i need to do some processing 
before passing it to the query parser? 


Any help would be appreciated.

Thanks
Amin
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Field Not Present In Document

2008-12-26 Thread Amin Mohammed-Coleman

Hi

I have the following situation:

Document document = new Document();
String body ="This is a body of document";
		Field field = new Field("body", body, Field.Store.YES,  
Field.Index.ANALYZED);

document.add(field);

String id ="1234";
		Field idField = new Field("id", id, Field.Store.YES,  
Field.Index.ANALYZED);

document.add(idField);
rtfIndexer.add(document);
System.out.println(document.getFields());



When I print the fields of the document I get the following:


stored/uncompressed,indexed,tokenizeddocument>, stored/uncompressed,indexed,tokenized]



The RtfIndexer looks like this:

public void add(Document document) {
		IndexWriter rtfIndexWriter =  
IndexWriterFactory.createIndexWriter(rtfDirectory, analyzer);

try {
rtfIndexWriter.addDocument(document);
LOGGER.debug("Added Document:" + document + " to 
index");
commitAndOptimise(rtfIndexWriter);
} catch (CorruptIndexException e) {
throw new IllegalStateException(e);
} catch (IOException e) {
throw new IllegalStateException(e);
}
}

	private void commitAndOptimise(IndexWriter rtfIndexWriter) throws  
CorruptIndexException,IOException {

LOGGER.debug("Committing document and closing index writer");
rtfIndexWriter.optimize();
rtfIndexWriter.commit();
rtfIndexWriter.close();
}



However I load the Document using the below code:

Directory directory = ((RtfIndexer)rtfIndexer).getDirectory();
IndexReader indexReader = IndexReader.open(directory);
Document documentFromIndex = indexReader.document(1);
System.out.println(documentFromIndex.getFields());

I get:

[stored/uncompressed,indexed,tokenized]


It seems as though id field is not being stored in the Index...I can't  
understand why not as I can have added it to the document,



I would be grateful if anyone could help!


Cheers
Amin

P.S. Merry Christmas!



Fwd: Field Not Present In Document

2008-12-26 Thread Amin Mohammed-Coleman



Begin forwarded message:


From: Amin Mohammed-Coleman 
Date: 26 December 2008 20:19:02 GMT
To: java-user@lucene.apache.org
Subject: Field Not Present In Document

Hi

I have the following situation:

Document document = new Document();
String body ="This is a body of document";
		Field field = new Field("body", body, Field.Store.YES,  
Field.Index.ANALYZED);

document.add(field);

String id ="1234";
		Field idField = new Field("id", id, Field.Store.YES,  
Field.Index.ANALYZED);

document.add(idField);
rtfIndexer.add(document);
System.out.println(document.getFields());



When I print the fields of the document I get the following:


stored/uncompressed,indexed,tokenizeddocument>, stored/uncompressed,indexed,tokenized]



The RtfIndexer looks like this:

public void add(Document document) {
		IndexWriter rtfIndexWriter =  
IndexWriterFactory.createIndexWriter(rtfDirectory, analyzer);

try {
rtfIndexWriter.addDocument(document);
LOGGER.debug("Added Document:" + document + " to 
index");
commitAndOptimise(rtfIndexWriter);
} catch (CorruptIndexException e) {
throw new IllegalStateException(e);
} catch (IOException e) {
throw new IllegalStateException(e);
}
}

	private void commitAndOptimise(IndexWriter rtfIndexWriter) throws  
CorruptIndexException,IOException {

LOGGER.debug("Committing document and closing index writer");
rtfIndexWriter.optimize();
rtfIndexWriter.commit();
rtfIndexWriter.close();
}



However I load the Document using the below code:

Directory directory = ((RtfIndexer)rtfIndexer).getDirectory();
IndexReader indexReader = IndexReader.open(directory);
Document documentFromIndex = indexReader.document(1);
System.out.println(documentFromIndex.getFields());

I get:

[stored/uncompressed,indexed,tokenizeddocument>]



It seems as though id field is not being stored in the Index...I  
can't understand why not as I can have added it to the document,



I would be grateful if anyone could help!


Cheers
Amin

P.S. Merry Christmas!





Re: Field Not Present In Document

2008-12-29 Thread Amin Mohammed-Coleman

Hi

Thanks for your reply. It turns out you were correct and I was not  
loading the correct document. User error!



Cheers


Amin

On 28 Dec 2008, at 19:50, Grant Ingersoll  wrote:

How do you know that document in question has an id of 1, as in when  
you do:  Document documentFromIndex = indexReader.document(1)


I would fire up Luke (http://www.getopt.org/luke) against your index  
and see what is inside of it.



On Dec 26, 2008, at 3:19 PM, Amin Mohammed-Coleman wrote:


Hi

I have the following situation:

Document document = new Document();
   String body ="This is a body of document";
   Field field = new Field("body", body, Field.Store.YES,  
Field.Index.ANALYZED);

   document.add(field);

   String id ="1234";
   Field idField = new Field("id", id, Field.Store.YES,  
Field.Index.ANALYZED);

   document.add(idField);
   rtfIndexer.add(document);
   System.out.println(document.getFields());



When I print the fields of the document I get the following:


stored/uncompressed,indexed,tokenizeddocument>, stored/uncompressed,indexed,tokenized]



The RtfIndexer looks like this:

   public void add(Document document) {
   IndexWriter rtfIndexWriter =  
IndexWriterFactory.createIndexWriter(rtfDirectory, analyzer);

   try {
   rtfIndexWriter.addDocument(document);
   LOGGER.debug("Added Document:" + document + " to index");
   commitAndOptimise(rtfIndexWriter);
   } catch (CorruptIndexException e) {
   throw new IllegalStateException(e);
   } catch (IOException e) {
   throw new IllegalStateException(e);
   }
   }

   private void commitAndOptimise(IndexWriter rtfIndexWriter)  
throws CorruptIndexException,IOException {

   LOGGER.debug("Committing document and closing index writer");
   rtfIndexWriter.optimize();
   rtfIndexWriter.commit();
   rtfIndexWriter.close();
   }



However I load the Document using the below code:

   Directory directory = ((RtfIndexer)rtfIndexer).getDirectory();
   IndexReader indexReader = IndexReader.open(directory);
   Document documentFromIndex = indexReader.document(1);
   System.out.println(documentFromIndex.getFields());

I get:

[stored/uncompressed,indexed,tokenizeddocument>]



It seems as though id field is not being stored in the Index...I  
can't understand why not as I can have added it to the document,



I would be grateful if anyone could help!


Cheers
Amin

P.S. Merry Christmas!



--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Fwd: Search Problem

2009-01-01 Thread Amin Mohammed-Coleman




Hi

I have created a RTFHandler which takes a RTF file and creates a  
lucene Document which is indexed.  The RTFHandler looks like  
something like this:


if (bodyText != null) {
Document document = new Document();
			Field field = new Field(MetaDataEnum.BODY.getDescription(),  
bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED);

document.add(field);


}

I am using Java Built in RTF text extraction.  When I run my test to  
verify that the document contains text that I expect this works  
fine.  I get the following when I print the document:


Documentrtf document that will be indexed.


Amin Mohammed-Coleman> stored/ 
uncompressed,indexed stored/ 
uncompressed,indexed stored/ 
uncompressed,indexed stored/ 
uncompressed,indexed>



The problem is when I use the following to search I get no result:

	MultiSearcher multiSearcher = new MultiSearcher(new Searchable[]  
{rtfIndexSearcher});

Term t = new Term("body", "Amin");
TermQuery termQuery = new TermQuery(t);
TopDocs topDocs = multiSearcher.search(termQuery, 1);
System.out.println(topDocs.totalHits);
multiSearcher.close();

RftIndexSearcher is configured with the directory that holds rtf  
documents.  I have used Luke to look at the document and what I am  
finding in the overview tab is the following for the document:


1   bodytest
1   id  1234
1   namertfDocumentToIndex.rtf
1   pathrtfDocumentToIndex.rtf
1   summary This is a
1   typeRTF_INDEXER
1   bodyrtf


However on the Document tab I am getting (in the body field):

This is a test rtf document that will be indexed.

Amin Mohammed-Coleman


I would expect to get a hit using "Amin" or even "document".  I am  
not sure whether the

line:
TopDocs topDocs = multiSearcher.search(termQuery, 1);

is incorrect as I am not too sure of the meaning of "Finds the top n  
hits for query." for search (Query query, int n) according to java  
docs.


I would be grateful if someone may be able to advise on what I may  
be doing wrong.  I am using Lucene 2.4.0



Cheers
Amin








Re: Search Problem

2009-01-01 Thread Amin Mohammed-Coleman

Hi

Sorry I was using the StandardAnalyzer in this instance.

Cheers



On 2 Jan 2009, at 00:55, Chris Lu wrote:


You need to let us know the analyzer you are using.
--  
Chris Lu

-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Thu, Jan 1, 2009 at 1:11 PM, Amin Mohammed-Coleman >wrote:






Hi

I have created a RTFHandler which takes a RTF file and creates a  
lucene
Document which is indexed.  The RTFHandler looks like something  
like this:


if (bodyText != null) {
  Document document = new Document();
  Field field = new
Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(),  
Field.Store.YES,

Field.Index.ANALYZED);
  document.add(field);


}

I am using Java Built in RTF text extraction.  When I run my test to
verify that the document contains text that I expect this works  
fine.  I get

the following when I print the document:

Documentrtf

document that will be indexed.

Amin Mohammed-Coleman>
stored/uncompressed,indexed
stored/uncompressed,indexed
stored/uncompressed,indexed
stored/uncompressed,indexed>


The problem is when I use the following to search I get no result:

  MultiSearcher multiSearcher = new MultiSearcher(new  
Searchable[]

{rtfIndexSearcher});
  Term t = new Term("body", "Amin");
  TermQuery termQuery = new TermQuery(t);
  TopDocs topDocs =  
multiSearcher.search(termQuery,

1);
  System.out.println(topDocs.totalHits);
  multiSearcher.close();

RftIndexSearcher is configured with the directory that holds rtf
documents.  I have used Luke to look at the document and what I am  
finding

in the overview tab is the following for the document:

1   bodytest
1   id  1234
1   namertfDocumentToIndex.rtf
1   pathrtfDocumentToIndex.rtf
1   summary This is a
1   typeRTF_INDEXER
1   bodyrtf


However on the Document tab I am getting (in the body field):

This is a test rtf document that will be indexed.

Amin Mohammed-Coleman


I would expect to get a hit using "Amin" or even "document".  I am  
not

sure whether the
line:
TopDocs topDocs = multiSearcher.search(termQuery, 1);

is incorrect as I am not too sure of the meaning of "Finds the top  
n hits

for query." for search (Query query, int n) according to java docs.

I would be grateful if someone may be able to advise on what I may  
be

doing wrong.  I am using Lucene 2.4.0


Cheers
Amin











-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Search Problem

2009-01-02 Thread Amin Mohammed-Coleman

Hi

I have tried this and it doesn't work.  I don't understand why using  
"amin" instead of "Amin" would work, is it not case insensitive?


I tried "test" for field "body" and this works.  Any other terms don't  
work for example:


"document"
"indexed"

these are tokens that were extracted when creating the lucene document.


Thanks for your reply.

Cheers

Amin

On 2 Jan 2009, at 10:36, Chris Lu wrote:

Basically Lucene stores analyzed tokens, and looks up for the  
matches based

on the tokens.
"Amin" after StandardAnalyzer is "amin", so you need to use new  
Term("body",

"amin"), instead of new Term("body", "Amin"), to search.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Thu, Jan 1, 2009 at 11:30 PM, Amin Mohammed-Coleman >wrote:



Hi

Sorry I was using the StandardAnalyzer in this instance.

Cheers




On 2 Jan 2009, at 00:55, Chris Lu wrote:

You need to let us know the analyzer you are using.

-- Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:

http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Thu, Jan 1, 2009 at 1:11 PM, Amin Mohammed-Coleman 
wrote:





Hi


I have created a RTFHandler which takes a RTF file and creates a  
lucene
Document which is indexed.  The RTFHandler looks like something  
like

this:

if (bodyText != null) {
Document document = new Document();
Field field = new
Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(),
Field.Store.YES,
Field.Index.ANALYZED);
document.add(field);


}

I am using Java Built in RTF text extraction.  When I run my  
test to
verify that the document contains text that I expect this works  
fine.  I

get
the following when I print the document:

Documenttest rtf

document that will be indexed.

Amin Mohammed-Coleman>
stored/uncompressed,indexed
stored/uncompressed,indexed
stored/uncompressed,indexed
stored/uncompressed,indexed>


The problem is when I use the following to search I get no result:

MultiSearcher multiSearcher = new MultiSearcher(new  
Searchable[]

{rtfIndexSearcher});
Term t = new Term("body", "Amin");
TermQuery termQuery = new TermQuery(t);
TopDocs topDocs =  
multiSearcher.search(termQuery,

1);
System.out.println(topDocs.totalHits);
multiSearcher.close();

RftIndexSearcher is configured with the directory that holds rtf
documents.  I have used Luke to look at the document and what I am
finding
in the overview tab is the following for the document:

1   bodytest
1   id  1234
1   namertfDocumentToIndex.rtf
1   pathrtfDocumentToIndex.rtf
1   summary This is a
1       type    RTF_INDEXER
1   bodyrtf


However on the Document tab I am getting (in the body field):

This is a test rtf document that will be indexed.

Amin Mohammed-Coleman


I would expect to get a hit using "Amin" or even "document".  I  
am not

sure whether the
line:
TopDocs topDocs = multiSearcher.search(termQuery, 1);

is incorrect as I am not too sure of the meaning of "Finds the  
top n

hits
for query." for search (Query query, int n) according to java  
docs.


I would be grateful if someone may be able to advise on what I  
may be

doing wrong.  I am using Lucene 2.4.0


Cheers
Amin











-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Search Problem

2009-01-02 Thread Amin Mohammed-Coleman

Hi Erick

Thanks for your reply.

I have used luke to inspect the document and I am some what confused.   
For example when I view the index using the overview tab of Luke I get  
the following:


1   bodytest
1   id  1234
1   namertfDocumentToIndex.rtf
1   pathrtfDocumentToIndex.rtf
1   summary This is a
1   typeRTF_INDEXER
1   bodyrtf


However when I view the document in the Document tab I get the full  
text that was extracted from the rft document (field:body) which is:


This is a test rtf document that will be indexed.
Amin Mohammed-Coleman

I am using the StandardAnaylzer therefore I wouldnt expect the words  
document, indexed, Amin Mohammed-Coleman to be removed.


I have referenced the Lucene In Action book and I can't see what I may  
be doing wrong.  I would be happy to provide a testcase should it be  
required.  When adding the body field to the document I am doing:


Document document = new Document();
			Field field = new Field(FieldNameEnum.BODY.getDescription(),  
bodyText.trim(), Field.Store.YES, Field.Index.ANALYZED);

document.add(field);



When I run the search code the string "test" is the only word that  
returns a result (TopDocs), whereas the others do not (e.g. "amin",  
"document", "indexed").


Thanks again for your help and advice.


Cheers
Amin



On 2 Jan 2009, at 21:20, Erick Erickson wrote:


Casing is usually handled by the analyzer. Since you construct
the term query programmatically, it doesn't go through
any analyzers, thus is not converted into lower case for
searching as was done automatically for you when you
indexed using StandardAnalyzer.

As for why you aren't getting hits, it's unclear to me. But
what I'd do is get a copy of Luke and examine your index
to see what's *really* there. This will often give you clues,
usually pointing to some kind of analyzer behavior that you
weren't expecting.

Best
Erick

On Fri, Jan 2, 2009 at 6:39 AM, Amin Mohammed-Coleman >wrote:



Hi

I have tried this and it doesn't work.  I don't understand why  
using "amin"

instead of "Amin" would work, is it not case insensitive?

I tried "test" for field "body" and this works.  Any other terms  
don't work

for example:

"document"
"indexed"

these are tokens that were extracted when creating the lucene  
document.



Thanks for your reply.

Cheers

Amin


On 2 Jan 2009, at 10:36, Chris Lu wrote:

Basically Lucene stores analyzed tokens, and looks up for the matches

based
on the tokens.
"Amin" after StandardAnalyzer is "amin", so you need to use new
Term("body",
"amin"), instead of new Term("body", "Amin"), to search.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:

http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Thu, Jan 1, 2009 at 11:30 PM, Amin Mohammed-Coleman 
wrote:


Hi


Sorry I was using the StandardAnalyzer in this instance.

Cheers




On 2 Jan 2009, at 00:55, Chris Lu wrote:

You need to let us know the analyzer you are using.


-- Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:


http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request)

got
2.6 Million Euro funding!

On Thu, Jan 1, 2009 at 1:11 PM, Amin Mohammed-Coleman 
wrote:






Hi



I have created a RTFHandler which takes a RTF file and creates a
lucene
Document which is indexed.  The RTFHandler looks like  
something like

this:

if (bodyText != null) {
  Document document = new Document();
  Field field = new
Field(MetaDataEnum.BODY.getDescription(), bodyText.trim(),
Field.Store.YES,
Field.Index.ANALYZED);
  document.add(field);


}

I am using Java Built in RTF text extraction.  When I run my  
test to
verify that the document contains text that I expect this  
works fine.

I
get
the following when I print the document:

Documenttest rtf

document that will be indexed.

Amin Mohammed-Coleman>
stored/uncompressed,indexed
stored/uncompressed,indexed
stored/uncompressed,indexed
stored/uncompressed,indexed>


The problem is when I use the following to search I get no  
result:


  MultiSearcher multiSearcher = new MultiSearcher(new  
Searchable[]

{rtfIndexSearcher});
  Ter

Re: Search Problem

2009-01-03 Thread Amin Mohammed-Coleman


Hi again!

I think I may have found the problem but I was wondering if you could  
verify:


I have the following for my indexer:

public void add(Document document) {
		IndexWriter indexWriter =  
IndexWriterFactory.createIndexWriter(getDirectory(), getAnalyzer());

try {
indexWriter.addDocument(document);
LOGGER.debug("Added Document:" + document + " to 
index");
commitAndOptimise(indexWriter);
} catch (CorruptIndexException e) {
throw new IllegalStateException(e);
} catch (IOException e) {
throw new IllegalStateException(e);
}
}

the commitAndOptimise(indexWriter) looks like this:

private void commitAndOptimise(IndexWriter indexWriter) throws  
CorruptIndexException,IOException {

LOGGER.debug("Committing document and closing index writer");
indexWriter.optimize();
indexWriter.commit();
indexWriter.close();
}

It seems as though if I comment out optimize then the overview tab in  
Luke  for the rtf document looks like:


5   id  1234
3   bodydocument
3   bodybody
1   bodytest
1   bodyrtf
1   namertfDocumentToIndex.rtf
1   bodynew
1   pathrtfDocumentToIndex.rtf
1   summary This is a
1   typeRTF_INDEXER
1   bodycontent


This is more what I expected although "Amin Mohammed-Coleman" hasn't  
been stored in the index.  Should I not be using  
indexWriter.optimize() ?


I tried using the search function in luke and got the following results:
body:test ---> returns result
body:document ---> no result
body:content ---> no result
body:rtf > returns result


Thanks again...sorry to be sending so many emails about this. I am in  
the process of designing and developing a prototype of a document and  
domain indexing/searching component and I would like to demo to the  
rest of my team.



Cheers
Amin



On 3 Jan 2009, at 01:23, Erick Erickson wrote:


Well, your query results are consistent with what Luke is
reporting. So I'd go back and test your assumptions. I
suspect that you're not indexing what you think you are.

For your test document, I'd just print out what you're indexing
and the field it's going into. *for each field*. that is, every time  
you

do a document.add(), print out that data. I'm
pretty sure you'll find that you're not getting what you expect. For
instance, the call to:

MetaDataEnum.BODY.getDescription()

may be returning some nonsense. Or
bodyText.trim()

isn't doing what you expect.

Lucene is used by many folks, and errors of the magnitude you're
experiencing would be seen by many people and the user list would
be flooded with complaints if it were a Lucene issue at root. That
leaves the code you wrote as the most likely culprit. So try a very  
simple

test case with lots of debugging println's. I'm pretty sure you'll
find the underlying issue with some of your assumptions pretty  
quickly.


Sorry I can't be more specific, but we'd have to see all of your code
and the test cases to do that

Best
Erick

On Fri, Jan 2, 2009 at 6:13 PM, Amin Mohammed-Coleman >wrote:



Hi Erick

Thanks for your reply.

I have used luke to inspect the document and I am some what  
confused.  For
example when I view the index using the overview tab of Luke I get  
the

following:

1   bodytest
1   id  1234
1   namertfDocumentToIndex.rtf
1   pathrtfDocumentToIndex.rtf
1   summary This is a
1   typeRTF_INDEXER
1   bodyrtf


However when I view the document in the Document tab I get the full  
text

that was extracted from the rft document (field:body) which is:

This is a test rtf document that will be indexed.
Amin Mohammed-Coleman

I am using the StandardAnaylzer therefore I wouldnt expect the words
document, indexed, Amin Mohammed-Coleman to be removed.

I have referenced the Lucene In Action book and I can't see what I  
may be
doing wrong.  I would be happy to provide a testcase should it be  
required.

When adding the body field to the document I am doing:

  Document document = new Document();
  Field field = new
Field(FieldNameEnum.BODY.getDescription(), bodyText.trim(),  
Field.Store.YES,

Field.Index.ANALYZED);
  document.add(field);



When I run the search code the string "test" is the only word that  
returns
a result (TopDocs), whereas the others do not (e.g. "amin",  
"document",

"indexed").

Thanks again for your help and advice.


Cheers
Amin




On 2 Jan 2009, at 21:20, Erick Erickson wrote:

Casing is usually handled by the analyzer. Since you construct

Re: Search Problem

2009-01-03 Thread Amin Mohammed-Coleman

Hi

I am currently doing this as the indexer will be called from a upload  
action. There is no bulk file processing functioaliry at the moment.



Cheers

Sent from my iPhone

On 3 Jan 2009, at 13:48, Shashi Kant  wrote:


Amin,

Are you calling Close & Optimize after every addDocument?

I would suggest something like this
try
{
 while //this could be your looping through a data reader for  
example

  {
   indexWriter.addDocument(document);
  }
}

finally
{
 commitAndOptimise()
}


HTH

Shashi


- Original Message 
From: Amin Mohammed-Coleman 
To: java-user@lucene.apache.org
Sent: Saturday, January 3, 2009 4:02:52 AM
Subject: Re: Search Problem


Hi again!

I think I may have found the problem but I was wondering if you  
could verify:


I have the following for my indexer:

public void add(Document document) {
   IndexWriter indexWriter =  
IndexWriterFactory.createIndexWriter(getDirectory(), getAnalyzer());

   try {
   indexWriter.addDocument(document);
   LOGGER.debug("Added Document:" + document + " to index");
   commitAndOptimise(indexWriter);
   } catch (CorruptIndexException e) {
   throw new IllegalStateException(e);
   } catch (IOException e) {
   throw new IllegalStateException(e);
   }
   }

the commitAndOptimise(indexWriter) looks like this:

private void commitAndOptimise(IndexWriter indexWriter) throws  
CorruptIndexException,IOException {

   LOGGER.debug("Committing document and closing index writer");
   indexWriter.optimize();
   indexWriter.commit();
   indexWriter.close();
   }

It seems as though if I comment out optimize then the overview tab  
in Luke  for the rtf document looks like:


5id1234
3bodydocument
3bodybody
1bodytest
1bodyrtf
1namertfDocumentToIndex.rtf
1bodynew
1pathrtfDocumentToIndex.rtf
1summaryThis is a
1typeRTF_INDEXER
1bodycontent


This is more what I expected although "Amin Mohammed-Coleman" hasn't  
been stored in the index.  Should I not be using  
indexWriter.optimize() ?


I tried using the search function in luke and got the following  
results:

body:test ---> returns result
body:document ---> no result
body:content ---> no result
body:rtf > returns result


Thanks again...sorry to be sending so many emails about this. I am  
in the process of designing and developing a prototype of a document  
and domain indexing/searching component and I would like to demo to  
the rest of my team.



Cheers
Amin



On 3 Jan 2009, at 01:23, Erick Erickson wrote:


Well, your query results are consistent with what Luke is
reporting. So I'd go back and test your assumptions. I
suspect that you're not indexing what you think you are.

For your test document, I'd just print out what you're indexing
and the field it's going into. *for each field*. that is, every  
time you

do a document.add(), print out that data. I'm
pretty sure you'll find that you're not getting what you expect. For
instance, the call to:

MetaDataEnum.BODY.getDescription()

may be returning some nonsense. Or
bodyText.trim()

isn't doing what you expect.

Lucene is used by many folks, and errors of the magnitude you're
experiencing would be seen by many people and the user list would
be flooded with complaints if it were a Lucene issue at root. That
leaves the code you wrote as the most likely culprit. So try a very  
simple

test case with lots of debugging println's. I'm pretty sure you'll
find the underlying issue with some of your assumptions pretty  
quickly.


Sorry I can't be more specific, but we'd have to see all of your code
and the test cases to do that

Best
Erick

On Fri, Jan 2, 2009 at 6:13 PM, Amin Mohammed-Coleman >wrote:



Hi Erick

Thanks for your reply.

I have used luke to inspect the document and I am some what  
confused.  For
example when I view the index using the overview tab of Luke I get  
the

following:

1   bodytest
1   id  1234
1   namertfDocumentToIndex.rtf
1   pathrtfDocumentToIndex.rtf
1   summary This is a
1   typeRTF_INDEXER
1   bodyrtf


However when I view the document in the Document tab I get the  
full text

that was extracted from the rft document (field:body) which is:

This is a test rtf document that will be indexed.
Amin Mohammed-Coleman

I am using the StandardAnaylzer therefore I wouldnt expect the words
document, indexed, Amin Mohammed-Coleman to be removed.

I have referenced the Lucene In Action book and I can't see what I  
may be
doing wrong.  I would be happy to provide a testcase should it be  
required.

When adding the body field to the document I am doing:

 Document document = new Document();
 Field field = new
Field(FieldNameEnu

Re: Search Problem

2009-01-03 Thread Amin Mohammed-Coleman

Hi

Please find attached a standalone test (inner classes for rtfHandler,  
indexing, etc) that shows search not returning expected results.  I am  
using Lucene 2.4.


Thanks again for the help!

Cheers
Amin



On 3 Jan 2009, at 14:02, Grant Ingersoll wrote:


You shouldn't need to call close and optimize after each document.

You also don't need the commit if you are going to immediately close.

Also, can you send a standalone test that shows the RTF extraction,  
the document creation and the indexing code that demonstrates your  
issue.


FWIW, and as a complete aside to save you some time after you get  
this figured out, instead of re-inventing RTF extraction and PDF  
extraction (as you appear to be doing), have a look at Tika (http://lucene.apache.org/tika 
)


On Jan 3, 2009, at 8:48 AM, Shashi Kant wrote:


Amin,

Are you calling Close & Optimize after every addDocument?

I would suggest something like this
try
{
while //this could be your looping through a data reader for  
example

 {
  indexWriter.addDocument(document);
 }
}

finally
{
commitAndOptimise()
}


HTH

Shashi


- Original Message ----
From: Amin Mohammed-Coleman 
To: java-user@lucene.apache.org
Sent: Saturday, January 3, 2009 4:02:52 AM
Subject: Re: Search Problem


Hi again!

I think I may have found the problem but I was wondering if you  
could verify:


I have the following for my indexer:

public void add(Document document) {
  IndexWriter indexWriter =  
IndexWriterFactory.createIndexWriter(getDirectory(), getAnalyzer());

  try {
  indexWriter.addDocument(document);
  LOGGER.debug("Added Document:" + document + " to index");
  commitAndOptimise(indexWriter);
  } catch (CorruptIndexException e) {
  throw new IllegalStateException(e);
  } catch (IOException e) {
  throw new IllegalStateException(e);
  }
  }

the commitAndOptimise(indexWriter) looks like this:

private void commitAndOptimise(IndexWriter indexWriter) throws  
CorruptIndexException,IOException {

  LOGGER.debug("Committing document and closing index writer");
  indexWriter.optimize();
  indexWriter.commit();
  indexWriter.close();
  }

It seems as though if I comment out optimize then the overview tab  
in Luke  for the rtf document looks like:


5id1234
3bodydocument
3bodybody
1bodytest
1bodyrtf
1namertfDocumentToIndex.rtf
1bodynew
1pathrtfDocumentToIndex.rtf
1summaryThis is a
1typeRTF_INDEXER
1bodycontent


This is more what I expected although "Amin Mohammed-Coleman"  
hasn't been stored in the index.  Should I not be using  
indexWriter.optimize() ?


I tried using the search function in luke and got the following  
results:

body:test ---> returns result
body:document ---> no result
body:content ---> no result
body:rtf > returns result


Thanks again...sorry to be sending so many emails about this. I am  
in the process of designing and developing a prototype of a  
document and domain indexing/searching component and I would like  
to demo to the rest of my team.



Cheers
Amin



On 3 Jan 2009, at 01:23, Erick Erickson wrote:


Well, your query results are consistent with what Luke is
reporting. So I'd go back and test your assumptions. I
suspect that you're not indexing what you think you are.

For your test document, I'd just print out what you're indexing
and the field it's going into. *for each field*. that is, every  
time you

do a document.add(), print out that data. I'm
pretty sure you'll find that you're not getting what you expect. For
instance, the call to:

MetaDataEnum.BODY.getDescription()

may be returning some nonsense. Or
bodyText.trim()

isn't doing what you expect.

Lucene is used by many folks, and errors of the magnitude you're
experiencing would be seen by many people and the user list would
be flooded with complaints if it were a Lucene issue at root. That
leaves the code you wrote as the most likely culprit. So try a  
very simple

test case with lots of debugging println's. I'm pretty sure you'll
find the underlying issue with some of your assumptions pretty  
quickly.


Sorry I can't be more specific, but we'd have to see all of your  
code

and the test cases to do that

Best
Erick

On Fri, Jan 2, 2009 at 6:13 PM, Amin Mohammed-Coleman >wrote:



Hi Erick

Thanks for your reply.

I have used luke to inspect the document and I am some what  
confused.  For
example when I view the index using the overview tab of Luke I  
get the

following:

1   bodytest
1   id  1234
1   namertfDocumentToIndex.rtf
1   pathrtfDocumentToIndex.rtf
1   summary This is a
1   typeRTF_INDEXER
1   bodyrtf


However when I view the document in the Document tab I 

Re: Search Problem

2009-01-03 Thread Amin Mohammed-Coleman

Hi again

Sorry I didn't include the WorkItem class!  Here is the final test  
case.  Apologies!

On 3 Jan 2009, at 14:02, Grant Ingersoll wrote:


You shouldn't need to call close and optimize after each document.

You also don't need the commit if you are going to immediately close.

Also, can you send a standalone test that shows the RTF extraction,  
the document creation and the indexing code that demonstrates your  
issue.


FWIW, and as a complete aside to save you some time after you get  
this figured out, instead of re-inventing RTF extraction and PDF  
extraction (as you appear to be doing), have a look at Tika (http://lucene.apache.org/tika 
)


On Jan 3, 2009, at 8:48 AM, Shashi Kant wrote:


Amin,

Are you calling Close & Optimize after every addDocument?

I would suggest something like this
try
{
while //this could be your looping through a data reader for  
example

 {
  indexWriter.addDocument(document);
 }
}

finally
{
commitAndOptimise()
}


HTH

Shashi


- Original Message 
From: Amin Mohammed-Coleman 
To: java-user@lucene.apache.org
Sent: Saturday, January 3, 2009 4:02:52 AM
Subject: Re: Search Problem


Hi again!

I think I may have found the problem but I was wondering if you  
could verify:


I have the following for my indexer:

public void add(Document document) {
  IndexWriter indexWriter =  
IndexWriterFactory.createIndexWriter(getDirectory(), getAnalyzer());

  try {
  indexWriter.addDocument(document);
  LOGGER.debug("Added Document:" + document + " to index");
  commitAndOptimise(indexWriter);
  } catch (CorruptIndexException e) {
  throw new IllegalStateException(e);
  } catch (IOException e) {
  throw new IllegalStateException(e);
  }
  }

the commitAndOptimise(indexWriter) looks like this:

private void commitAndOptimise(IndexWriter indexWriter) throws  
CorruptIndexException,IOException {

  LOGGER.debug("Committing document and closing index writer");
  indexWriter.optimize();
  indexWriter.commit();
  indexWriter.close();
  }

It seems as though if I comment out optimize then the overview tab  
in Luke  for the rtf document looks like:


5id1234
3bodydocument
3bodybody
1bodytest
1bodyrtf
1namertfDocumentToIndex.rtf
1bodynew
1pathrtfDocumentToIndex.rtf
1summaryThis is a
1typeRTF_INDEXER
1bodycontent


This is more what I expected although "Amin Mohammed-Coleman"  
hasn't been stored in the index.  Should I not be using  
indexWriter.optimize() ?


I tried using the search function in luke and got the following  
results:

body:test ---> returns result
body:document ---> no result
body:content ---> no result
body:rtf > returns result


Thanks again...sorry to be sending so many emails about this. I am  
in the process of designing and developing a prototype of a  
document and domain indexing/searching component and I would like  
to demo to the rest of my team.



Cheers
Amin



On 3 Jan 2009, at 01:23, Erick Erickson wrote:


Well, your query results are consistent with what Luke is
reporting. So I'd go back and test your assumptions. I
suspect that you're not indexing what you think you are.

For your test document, I'd just print out what you're indexing
and the field it's going into. *for each field*. that is, every  
time you

do a document.add(), print out that data. I'm
pretty sure you'll find that you're not getting what you expect. For
instance, the call to:

MetaDataEnum.BODY.getDescription()

may be returning some nonsense. Or
bodyText.trim()

isn't doing what you expect.

Lucene is used by many folks, and errors of the magnitude you're
experiencing would be seen by many people and the user list would
be flooded with complaints if it were a Lucene issue at root. That
leaves the code you wrote as the most likely culprit. So try a  
very simple

test case with lots of debugging println's. I'm pretty sure you'll
find the underlying issue with some of your assumptions pretty  
quickly.


Sorry I can't be more specific, but we'd have to see all of your  
code

and the test cases to do that

Best
Erick

On Fri, Jan 2, 2009 at 6:13 PM, Amin Mohammed-Coleman >wrote:



Hi Erick

Thanks for your reply.

I have used luke to inspect the document and I am some what  
confused.  For
example when I view the index using the overview tab of Luke I  
get the

following:

1   bodytest
1   id  1234
1   namertfDocumentToIndex.rtf
1   pathrtfDocumentToIndex.rtf
1   summary This is a
1   typeRTF_INDEXER
1   bodyrtf


However when I view the document in the Document tab I get the  
full text

that was extracted from the rft document (field:body) which is:

This is a test rtf docu

Re: Search Problem

2009-01-03 Thread Amin Mohammed-Coleman

Hi

I have uploaded to google docs:

url: http://docs.google.com/Doc?id=d77xf5q_0n6hb38fx

Hope this works.


Cheers
Amin
On 3 Jan 2009, at 19:53, Grant Ingersoll wrote:

The mailing list often strips attachments (in fact, I'm surprised  
your earlier ones made it through).  Perhaps you can put them up  
somewhere for download.



On Jan 3, 2009, at 1:07 PM, Amin Mohammed-Coleman wrote:


Hi again

Sorry I didn't include the WorkItem class!  Here is the final test  
case.  Apologies!

On 3 Jan 2009, at 14:02, Grant Ingersoll wrote:


You shouldn't need to call close and optimize after each document.

You also don't need the commit if you are going to immediately  
close.


Also, can you send a standalone test that shows the RTF  
extraction, the document creation and the indexing code that  
demonstrates your issue.


FWIW, and as a complete aside to save you some time after you get  
this figured out, instead of re-inventing RTF extraction and PDF  
extraction (as you appear to be doing), have a look at Tika (http://lucene.apache.org/tika 
)


On Jan 3, 2009, at 8:48 AM, Shashi Kant wrote:


Amin,

Are you calling Close & Optimize after every addDocument?

I would suggest something like this
try
{
  while //this could be your looping through a data reader for  
example

   {
indexWriter.addDocument(document);
   }
}

finally
{
commitAndOptimise()
}


HTH

Shashi


- Original Message ----
From: Amin Mohammed-Coleman 
To: java-user@lucene.apache.org
Sent: Saturday, January 3, 2009 4:02:52 AM
Subject: Re: Search Problem


Hi again!

I think I may have found the problem but I was wondering if you  
could verify:


I have the following for my indexer:

public void add(Document document) {
IndexWriter indexWriter =  
IndexWriterFactory.createIndexWriter(getDirectory(),  
getAnalyzer());

try {
indexWriter.addDocument(document);
LOGGER.debug("Added Document:" + document + " to index");
commitAndOptimise(indexWriter);
} catch (CorruptIndexException e) {
throw new IllegalStateException(e);
} catch (IOException e) {
throw new IllegalStateException(e);
}
}

the commitAndOptimise(indexWriter) looks like this:

private void commitAndOptimise(IndexWriter indexWriter) throws  
CorruptIndexException,IOException {

LOGGER.debug("Committing document and closing index writer");
indexWriter.optimize();
indexWriter.commit();
indexWriter.close();
}

It seems as though if I comment out optimize then the overview  
tab in Luke  for the rtf document looks like:


5id1234
3bodydocument
3bodybody
1bodytest
1bodyrtf
1namertfDocumentToIndex.rtf
1bodynew
1pathrtfDocumentToIndex.rtf
1summaryThis is a
1typeRTF_INDEXER
1body    content


This is more what I expected although "Amin Mohammed-Coleman"  
hasn't been stored in the index.  Should I not be using  
indexWriter.optimize() ?


I tried using the search function in luke and got the following  
results:

body:test ---> returns result
body:document ---> no result
body:content ---> no result
body:rtf > returns result


Thanks again...sorry to be sending so many emails about this. I  
am in the process of designing and developing a prototype of a  
document and domain indexing/searching component and I would like  
to demo to the rest of my team.



Cheers
Amin



On 3 Jan 2009, at 01:23, Erick Erickson wrote:


Well, your query results are consistent with what Luke is
reporting. So I'd go back and test your assumptions. I
suspect that you're not indexing what you think you are.

For your test document, I'd just print out what you're indexing
and the field it's going into. *for each field*. that is, every  
time you

do a document.add(), print out that data. I'm
pretty sure you'll find that you're not getting what you expect.  
For

instance, the call to:

MetaDataEnum.BODY.getDescription()

may be returning some nonsense. Or
bodyText.trim()

isn't doing what you expect.

Lucene is used by many folks, and errors of the magnitude you're
experiencing would be seen by many people and the user list would
be flooded with complaints if it were a Lucene issue at root. That
leaves the code you wrote as the most likely culprit. So try a  
very simple

test case with lots of debugging println's. I'm pretty sure you'll
find the underlying issue with some of your assumptions pretty  
quickly.


Sorry I can't be more specific, but we'd have to see all of your  
code

and the test cases to do that

Best
Erick

On Fri, Jan 2, 2009 at 6:13 PM, Amin Mohammed-Coleman >wrote:



Hi Erick

Thanks for your reply.

I have used luke to inspect the document and I am some what  
confused.  For
example when I view the index using the overview tab of Luke I  
get the


Re: Search Test file

2009-01-03 Thread Amin Mohammed-Coleman
sertNotSame;
import static org.junit.Assert.assertTrue;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import javax.swing.text.BadLocationException;
import javax.swing.text.DefaultStyledDocument;
import javax.swing.text.rtf.RTFEditorKit;

import org.apache.commons.lang.StringUtils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.ant.DocumentHandler;
import org.apache.lucene.ant.DocumentHandlerException;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MultiSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Searchable;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import com.amin.app.lucene.util.WorkItem.IndexerType;

public class SearchTest {

private File rtfFile = null;
private static final String RTF_FILE_NAME =  
"rtfDocumentToIndex.rtf";


@Before
public void setUp() throws Exception {
InputStream inputStream =  
this.getClass().getClassLoader().getResourceAsStream(RTF_FILE_NAME);

rtfFile = new File(RTF_FILE_NAME);
convertInputStreamToFile(inputStream, rtfFile);
}



@Test
public void testCanCreateLuceneDocumentForRTFDocument() throws  
Exception {
JavaBuiltInRTFHandler builtInRTFHandler = new  
JavaBuiltInRTFHandler();

Document document = builtInRTFHandler.getDocument(rtfFile);
assertNotNull(document);
String value = document.get(FieldNameEnum.BODY.getDescription());
assertNotNull(value);
assertNotSame("", value);
assertTrue(value.contains("Amin Mohammed-Coleman"));
assertTrue(value.contains("This is a test rtf document that will  
be indexed."));

String path = document.get(FieldNameEnum.PATH.getDescription());
assertNotNull(path);
assertTrue(path.contains(".rtf"));
String fileName = document.get(FieldNameEnum.NAME.getDescription());
assertNotNull(fileName);
assertEquals(RTF_FILE_NAME, fileName);
assertEquals(WorkItem.IndexerType.RTF_INDEXER.name(),  
document.get(FieldNameEnum.TYPE.getDescription()));


}



@Test
public void testCanSearchRtfDocument() throws Exception {
JavaBuiltInRTFHandler builtInRTFHandler = new  
JavaBuiltInRTFHandler();

Document document = builtInRTFHandler.getDocument(rtfFile);
IndexWriter indexWriter = new  
IndexWriter(getDirectory(),getAnalyzer(),new  
IndexWriter.MaxFieldLength(2));

try {
indexWriter.addDocument(document);
commitAndCloseWriter(indexWriter);
} catch (CorruptIndexException e) {
throw new IllegalStateException(e);
} catch (IOException e) {
throw new IllegalStateException(e);
}

//I plan to use other searchers later
IndexSearcher indexSearcher = new IndexSearcher(getDirectory());
MultiSearcher multiSearcher = new MultiSearcher(new Searchable[]  
{indexSearcher});
QueryParser queryParser = new MultiFieldQueryParser(new String[]  
{FieldNameEnum.BODY.getDescription()}, new StandardAnalyzer());

Query query = queryParser.parse("amin");
TopDocs topDocs = multiSearcher.search(query,  
BooleanQuery.getMaxClauseCount());

assertNotNull(topDocs);
assertEquals(1, topDocs.totalHits);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocs) {
Document documentFromSearch = indexSearcher.doc(scoreDoc.doc);
assertNotNull(documentFromSearch);
String bodyText =  
documentFromSearch.get(FieldNameEnum.BODY.getDescription());

assertNotNull(bodyText);
assertNotSame("", bodyText);
assertTrue(bodyText.contains("Amin Mohammed-Coleman"));
assertTrue(bodyText.contains("This is a test rtf document that  
will be indexed."));


}
multiSearcher.close();

}

@After
public void tearDown() throws Exception {
rtfFile.delete();
if (getDirectory().list() != null && getDirectory().list().length  
> 0) {

IndexReader reader = IndexReader.open(getDirectory());
for(int i = 0; i < reader.maxDoc();i++) {
reader.deleteDocument(i);
}
reader.close();
}
}

private void commitAndCloseWriter(IndexWriter indexWriter) throws  
CorruptIndexException,IOException {

indexWriter.commit();
indexWriter.close();
}


public Directory getDirectory() throws IOException {
return FSDirectory.getDirectory("/tmp/lucene/rtf");
}

public Analyzer getAnalyzer() {
return new StandardAnalyzer();
}
private static void convertInputStreamToFile(InputStrea

Re: Search Test file

2009-01-03 Thread Amin Mohammed-Coleman

Hi,

Please ignore my last email.  Just woke up and wrote the email.  After  
looking at the luke further it looks like the token is being stored at  
index.amin, that is why "amin" wasn't working.  Making those changes  
that you recommended worked.


I will investigate further why "amin" token is being stored as  
"indexed.amin".



Thanks again for all the help.

Cheers

Amin


On 4 Jan 2009, at 02:23, Grant Ingersoll wrote:




Begin forwarded message:


From: Grant Ingersoll 
Date: January 3, 2009 8:19:14 PM EST
To: java-...@lucene.apache.org
Subject: Fwd: Search Test file
Reply-To: java-...@lucene.apache.org

Hi Amin,

I see a couple of issues with your program below, and one that is  
the cause of the problem of not finding "amin" as a query term.


When you construct your IndexWriter, you are doing:
IndexWriter indexWriter = new  
IndexWriter(getDirectory(),getAnalyzer(),new  
IndexWriter.MaxFieldLength(2));


The MaxFieldLength parameter specifies the maximum number of tokens  
allowed in a Field.  Everything else after that is dropped.  See http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexWriter.html#IndexWriter(org.apache.lucene.store.Directory,%20org.apache.lucene.analysis.Analyzer,%20org.apache.lucene.index.IndexWriter.MaxFieldLength) 
 and http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexWriter.MaxFieldLength.html


Also,
TopDocs topDocs = multiSearcher.search(query,  
BooleanQuery.getMaxClauseCount());


strikes me as really odd.  Why are you passing in the max clause  
count as the number of results you want returned?


Cheers,
Grant



Begin forwarded message:


From: "ami...@gmail.com" 
Date: January 3, 2009 3:24:52 PM EST
To: gsing...@apache.org
Subject: Search Test file

I've shared a document with you called "Search Test file":
http://docs.google.com/Doc?id=d77xf5q_0n6hb38fx&invite=cjq79zj

It's not an attachment -- it's stored online at Google Docs. To  
open this document, just click the link above.

---

Hi

I have uploaded the test file at google docs. It is currently a  
txt file but if you change the extension to .java it should work.


package com.amin.app.lucene.search.impl;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotNull;
import static org.junit.Assert.assertNotSame;
import static org.junit.Assert.assertTrue;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import javax.swing.text.BadLocationException;
import javax.swing.text.DefaultStyledDocument;
import javax.swing.text.rtf.RTFEditorKit;

import org.apache.commons.lang.StringUtils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.ant.DocumentHandler;
import org.apache.lucene.ant.DocumentHandlerException;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MultiSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Searchable;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import com.amin.app.lucene.util.WorkItem.IndexerType;

public class SearchTest {

private File rtfFile = null;
private static final String RTF_FILE_NAME =  
"rtfDocumentToIndex.rtf";


@Before
public void setUp() throws Exception {
InputStream inputStream =  
this.getClass().getClassLoader().getResourceAsStream(RTF_FILE_NAME);

rtfFile = new File(RTF_FILE_NAME);
convertInputStreamToFile(inputStream, rtfFile);
}



@Test
public void testCanCreateLuceneDocumentForRTFDocument() throws  
Exception {
JavaBuiltInRTFHandler builtInRTFHandler = new  
JavaBuiltInRTFHandler();

Document document = builtInRTFHandler.getDocument(rtfFile);
assertNotNull(document);
String value = document.get(FieldNameEnum.BODY.getDescription());
assertNotNull(value);
assertNotSame("", value);
assertTrue(value.contains("Amin Mohammed-Coleman"));
assertTrue(value.contains("This is a test rtf document that will  
be indexed."));

String path = document.get(FieldNameEnum.PATH.getDescription());
assertNotNull(path);
assertTrue(path.contains(".rtf"));
String fileName = document.get(FieldNameEnum.NAME.getDescription())

Re: Search Test file

2009-01-04 Thread Amin Mohammed-Coleman


Hi

Test case passing now. Thanks for your help. I kind of thought it was  
probably something I was doing wrong!


Cheers

Amin

On 4 Jan 2009, at 16:59, Grant Ingersoll  wrote:



On Jan 4, 2009, at 2:49 AM, Amin Mohammed-Coleman wrote:


Hi Grant

Thank you for looking at the test case.  I have updated the  
IndexWriter to use UNLIMITED for MaxFieldLength.   I tried using  
Integer.MAX_VALUE for



Also,
TopDocs topDocs = multiSearcher.search(query,  
BooleanQuery.getMaxClauseCount());


strikes me as really odd.  Why are you passing in the max clause  
count as the number of results you want returned?





Just pass in something like "10".


However I get the following exception :

java.lang.NegativeArraySizeException
   at  
org.apache.lucene.util.PriorityQueue.initialize(PriorityQueue.java: 
41)

   at org.apache.lucene.search.HitQueue.(HitQueue.java:24)
   at  
org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:200)

   at org.apache.lucene.search.Searcher.search(Searcher.java:136)
   at org.apache.lucene.search.Searcher.search(Searcher.java:146)
   at  
com. 
amin. 
app. 
lucene. 
search.impl.SearchTest.testCanSearchRtfDocument(SearchTest.java:101)

   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at  
sun. 
reflect. 
NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at  
sun. 
reflect. 
DelegatingMethodAccessorImpl. 
invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.junit.internal.runners.TestMethod.invoke(TestMethod.java: 
59)
   at  
org. 
junit.internal.runners.MethodRoadie.runTestMethod(MethodRoadie.java: 
98)
   at org.junit.internal.runners.MethodRoadie 
$2.run(MethodRoadie.java:79)
   at  
org. 
junit. 
internal. 
runners.MethodRoadie.runBeforesThenTestThenAfters(MethodRoadie.java: 
87)
   at  
org.junit.internal.runners.MethodRoadie.runTest(MethodRoadie.java:77)
   at org.junit.internal.runners.MethodRoadie.run(MethodRoadie.java: 
42)
   at  
org. 
junit. 
internal. 
runners.JUnit4ClassRunner.invokeTestMethod(JUnit4ClassRunner.java:88)
   at  
org. 
junit. 
internal. 
runners.JUnit4ClassRunner.runMethods(JUnit4ClassRunner.java:51)
   at org.junit.internal.runners.JUnit4ClassRunner 
$1.run(JUnit4ClassRunner.java:44)
   at  
org. 
junit.internal.runners.ClassRoadie.runUnprotected(ClassRoadie.java: 
27)
   at  
org. 
junit.internal.runners.ClassRoadie.runProtected(ClassRoadie.java:37)
   at  
org. 
junit.internal.runners.JUnit4ClassRunner.run(JUnit4ClassRunner.java: 
42)
   at  
org. 
eclipse. 
jdt. 
internal. 
junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:45)
   at  
org. 
eclipse. 
jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
   at  
org. 
eclipse. 
jdt. 
internal. 
junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
   at  
org. 
eclipse. 
jdt. 
internal. 
junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
   at  
org. 
eclipse. 
jdt. 
internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
   at  
org. 
eclipse. 
jdt. 
internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java: 
196)



I know that this is an issue (not being able to use  
Integer.MAX_VALUE).  I tried using 100 and my test still doesn't  
pass.



Cheers
Amin


On 4 Jan 2009, at 02:23, Grant Ingersoll wrote:




Begin forwarded message:


From: Grant Ingersoll 
Date: January 3, 2009 8:19:14 PM EST
To: java-...@lucene.apache.org
Subject: Fwd: Search Test file
Reply-To: java-...@lucene.apache.org

Hi Amin,

I see a couple of issues with your program below, and one that is  
the cause of the problem of not finding "amin" as a query term.


When you construct your IndexWriter, you are doing:
IndexWriter indexWriter = new  
IndexWriter(getDirectory(),getAnalyzer(),new  
IndexWriter.MaxFieldLength(2));


The MaxFieldLength parameter specifies the maximum number of  
tokens allowed in a Field.  Everything else after that is  
dropped.  See http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexWriter.html#IndexWriter(org.apache.lucene.store.Directory,%20org.apache.lucene.analysis.Analyzer,%20org.apache.lucene.index.IndexWriter.MaxFieldLength 
) and http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/index/IndexWriter.MaxFieldLength.html


Also,
TopDocs topDocs = multiSearcher.search(query,  
BooleanQuery.getMaxClauseCount());


strikes me as really odd.  Why are you passing in the max clause  
count as the number of results you want returned?


Cheers,
Grant



Begin forwarded message:


From: "ami...@gmail.com" 
Date: January 3, 2009 3:24:52 PM EST
To: gsing...@apache.org
Subject: Search Test file

I've shared a document with you called "Search Test file":
http://docs.google.com/Doc?id=d77xf5q_0n6hb38fx&invite=cjq79zj

It's not an attachment -- it's stored online at Google Docs. To  
open this document, just click the link abov

MultiSearcher: close()

2009-01-18 Thread Amin Mohammed-Coleman

Hi

I have a class that uses the MultiSearcher inorder to perform search  
using different other searches.  Here is a snippet of the class:


MultiSearcher multiSearcher = null;
try {
			multiSearcher = new MultiSearcher(searchers.toArray(new  
IndexSearcher[] {}));
			QueryParser queryParser = new  
MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer);

Query query = 
queryParser.parse(searchRequest.getSearchTerm());

//TODO: Sort and Filters

TopDocs topDocs = multiSearcher.search(query, 100);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
			LOGGER.debug("total number of hits for [" + query.toString() + " ]  
= " +topDocs.totalHits);


for (ScoreDoc scoreDoc : scoreDocs) {
final Document doc = 
multiSearcher.doc(scoreDoc.doc);
float score = scoreDoc.score;
final BaseDocument baseDocument = new 
BaseDocument(doc, score);
Summary documentSummary = new 
DocumentSummaryImpl(baseDocument);
summaryList.add(documentSummary);
}

} catch (Exception e) {
throw new IllegalStateException(e);
} finally {
if (multiSearcher != null) {
try {
multiSearcher.close();
} catch (IOException e) {
	LOGGER.error("Could not close multisearcher. Need to investigate  
why.", e);

}
}
}


This class is injected with dependencies using spring.  Do I need to  
explicitly close the multisearcher? If I call the method first time it  
is ok, then any subsequent calls generate the following:
org.apache.lucene.store.AlreadyClosedException: this IndexReader is  
closed
What is the best practice for this?  I had a look at Lucene In Action  
book and example doesn't close the multisearcher.


Any help would be highly appreciated.

Cheers




Re: clustering with compass & terracotta

2009-01-18 Thread Amin Mohammed-Coleman
I've been working on integrating hibernate search and Gigaspaces XAP.  
It's been raised as a openspaces project and awaiting approval.


The aim is to place indexes on the space and use gigaspaces middleware  
support for clustering, replication and other services.


Sent from my iPhone

On 15 Jan 2009, at 20:05, Glen Newton  wrote:


There is a discussion here:
http://www.terracotta.org/web/display/orgsite/Lucene+Integration

Also of interest: "Katta - distribute lucene indexes in a grid"
http://katta.wiki.sourceforge.net/

-glen

http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html
http://zzzoot.blogspot.com/2008/11/software-announcement-lusql-database-to.html
http://zzzoot.blogspot.com/2008/09/katta-released-lucene-on-grid.html
http://zzzoot.blogspot.com/2008/06/lucene-concurrent-search-performance.html
http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html


2009/1/15 Angel, Eric :

I just ran into this
http://www.compass-project.org/docs/2.0.0/reference/html/needle-terracot
ta.html and was wondering if any of you had tried anything like  
this and

if so, what your experience was like.



Eric






--

-

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Indexing and Searching Web Application

2009-01-19 Thread Amin Mohammed-Coleman

Hi

I have recently worked on developing an application which allows you  
to upload a file (which is indexed so you can search later).  I have  
numerous tests to show that you can index and search documents (in  
some instances within the same test), however when I perform the  
operation in the site:


1) Upload File and Index
2) Search

I don't get any hits.  When I restart the application then if I make  
another search I can find the results.  It seems as though indexes  
aren't being committed when I do the initial upload.  This is  
strange.  I explicitly call commit in my code when I upload the file.   
Has anyone experienced this before?


Any help would be appreciated.

Kind Regards

Amin

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Indexing and Searching Web Application

2009-01-19 Thread Amin Mohammed-Coleman

I make a call to my search class which looks like this:


public Summary[] search(SearchRequest searchRequest) {
List summaryList = new ArrayList();
StopWatch stopWatch = new StopWatch("searchStopWatch");
stopWatch.start();
MultiSearcher multiSearcher = null;
try {
			multiSearcher = new MultiSearcher(searchers.toArray(new  
IndexSearcher[] {}));
			QueryParser queryParser = new  
MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer);

Query query = 
queryParser.parse(searchRequest.getSearchTerm());

//TODO: Sort and Filters

TopDocs topDocs = multiSearcher.search(query, 100);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
			LOGGER.debug("total number of hits for [" + query.toString() + " ]  
= " +topDocs.totalHits);


for (ScoreDoc scoreDoc : scoreDocs) {
final Document doc = 
multiSearcher.doc(scoreDoc.doc);
float score = scoreDoc.score;
final BaseDocument baseDocument = new 
BaseDocument(doc, score);
Summary documentSummary = new 
DocumentSummaryImpl(baseDocument);
summaryList.add(documentSummary);
}

} catch (Exception e) {
throw new IllegalStateException(e);
}
stopWatch.stop();

		LOGGER.debug("total time taken for seach: " +  
stopWatch.getTotalTimeMillis() + " ms");

return summaryList.toArray(new Summary[] {});
}

Do I need to do this explicitly?


Cheers
Amin

On 19 Jan 2009, at 20:48, Greg Shackles wrote:

After you make the commit to the index, are you reloading the index  
in the

searchers?

- Greg


On Mon, Jan 19, 2009 at 3:29 PM, Amin Mohammed-Coleman >wrote:



Hi

I have recently worked on developing an application which allows  
you to
upload a file (which is indexed so you can search later).  I have  
numerous
tests to show that you can index and search documents (in some  
instances
within the same test), however when I perform the operation in the  
site:


1) Upload File and Index
2) Search

I don't get any hits.  When I restart the application then if I make
another search I can find the results.  It seems as though indexes  
aren't

being committed when I do the initial upload.  This is strange.  I
explicitly call commit in my code when I upload the file.  Has anyone
experienced this before?

Any help would be appreciated.

Kind Regards

Amin

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org






Re: Indexing and Searching Web Application

2009-01-19 Thread Amin Mohammed-Coleman



Sent from my iPhone

On 19 Jan 2009, at 23:23, Greg Shackles  wrote:

I just quickly skimmed the code since I don't have much time right  
now but

it looks like you are keeping an array of IndexSearchers open that you
re-use in this search function, right?  If that's the case, you need  
to tell
those IndexSearchers to re-open the indexes because they have  
changed since

they were first opened.  That should solve your problem.

- Greg

On Mon, Jan 19, 2009 at 4:45 PM, Amin Mohammed-Coleman >wrote:



I make a call to my search class which looks like this:


public Summary[] search(SearchRequest searchRequest) {
  List summaryList = new ArrayList();
  StopWatch stopWatch = new StopWatch("searchStopWatch");
  stopWatch.start();
  MultiSearcher multiSearcher = null;
  try {
  multiSearcher = new
MultiSearcher(searchers.toArray(new IndexSearcher[] {}));
  QueryParser queryParser = new
MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),  
analyzer);

  Query query =
queryParser.parse(searchRequest.getSearchTerm());

  //TODO: Sort and Filters

  TopDocs topDocs = multiSearcher.search(query,  
100);

  ScoreDoc[] scoreDocs = topDocs.scoreDocs;
  LOGGER.debug("total number of hits for [" +
query.toString() + " ] = " +topDocs.totalHits);

  for (ScoreDoc scoreDoc : scoreDocs) {
  final Document doc =
multiSearcher.doc(scoreDoc.doc);
  float score = scoreDoc.score;
  final BaseDocument baseDocument = new
BaseDocument(doc, score);
  Summary documentSummary = new
DocumentSummaryImpl(baseDocument);
  summaryList.add(documentSummary);
  }

  } catch (Exception e) {
  throw new IllegalStateException(e);
  }
  stopWatch.stop();

  LOGGER.debug("total time taken for seach: " +
stopWatch.getTotalTimeMillis() + " ms");
  return summaryList.toArray(new Summary[] {});
  }

Do I need to do this explicitly?


Cheers
Amin


On 19 Jan 2009, at 20:48, Greg Shackles wrote:

After you make the commit to the index, are you reloading the index  
in the

searchers?

- Greg


On Mon, Jan 19, 2009 at 3:29 PM, Amin Mohammed-Coleman 
wrote:


Hi


I have recently worked on developing an application which allows  
you to

upload a file (which is indexed so you can search later).  I have
numerous
tests to show that you can index and search documents (in some  
instances
within the same test), however when I perform the operation in  
the site:


1) Upload File and Index
2) Search

I don't get any hits.  When I restart the application then if I  
make
another search I can find the results.  It seems as though  
indexes aren't

being committed when I do the initial upload.  This is strange.  I
explicitly call commit in my code when I upload the file.  Has  
anyone

experienced this before?

Any help would be appreciated.

Kind Regards

Amin

--- 
--

To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org







-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Indexing and Searching Web Application

2009-01-19 Thread Amin Mohammed-Coleman

Hi

Thanks for your reply. I originally explicitly closed the  
multisearcher but this caused problem in that first search would work  
and then subsequent searches would cause an  
IndexReaderClosedException. I sent an email to the mailing group on  
what the best practice is on whether to close the multi searcher or  
leave it open.


I couldn't see in the dogs how to re open.  Would it be possible to  
get some advice on how to do this.



Thanks again for your help.

On 19 Jan 2009, at 23:23, Greg Shackles  wrote:

I just quickly skimmed the code since I don't have much time right  
now but

it looks like you are keeping an array of IndexSearchers open that you
re-use in this search function, right?  If that's the case, you need  
to tell
those IndexSearchers to re-open the indexes because they have  
changed since

they were first opened.  That should solve your problem.

- Greg

On Mon, Jan 19, 2009 at 4:45 PM, Amin Mohammed-Coleman >wrote:



I make a call to my search class which looks like this:


public Summary[] search(SearchRequest searchRequest) {
  List summaryList = new ArrayList();
  StopWatch stopWatch = new StopWatch("searchStopWatch");
  stopWatch.start();
  MultiSearcher multiSearcher = null;
  try {
  multiSearcher = new
MultiSearcher(searchers.toArray(new IndexSearcher[] {}));
  QueryParser queryParser = new
MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),  
analyzer);

  Query query =
queryParser.parse(searchRequest.getSearchTerm());

  //TODO: Sort and Filters

  TopDocs topDocs = multiSearcher.search(query,  
100);

  ScoreDoc[] scoreDocs = topDocs.scoreDocs;
  LOGGER.debug("total number of hits for [" +
query.toString() + " ] = " +topDocs.totalHits);

  for (ScoreDoc scoreDoc : scoreDocs) {
  final Document doc =
multiSearcher.doc(scoreDoc.doc);
  float score = scoreDoc.score;
  final BaseDocument baseDocument = new
BaseDocument(doc, score);
  Summary documentSummary = new
DocumentSummaryImpl(baseDocument);
  summaryList.add(documentSummary);
  }

  } catch (Exception e) {
  throw new IllegalStateException(e);
  }
  stopWatch.stop();

  LOGGER.debug("total time taken for seach: " +
stopWatch.getTotalTimeMillis() + " ms");
  return summaryList.toArray(new Summary[] {});
  }

Do I need to do this explicitly?


Cheers
Amin


On 19 Jan 2009, at 20:48, Greg Shackles wrote:

After you make the commit to the index, are you reloading the index  
in the

searchers?

- Greg


On Mon, Jan 19, 2009 at 3:29 PM, Amin Mohammed-Coleman 
wrote:


Hi


I have recently worked on developing an application which allows  
you to

upload a file (which is indexed so you can search later).  I have
numerous
tests to show that you can index and search documents (in some  
instances
within the same test), however when I perform the operation in  
the site:


1) Upload File and Index
2) Search

I don't get any hits.  When I restart the application then if I  
make
another search I can find the results.  It seems as though  
indexes aren't

being committed when I do the initial upload.  This is strange.  I
explicitly call commit in my code when I upload the file.  Has  
anyone

experienced this before?

Any help would be appreciated.

Kind Regards

Amin

--- 
--

To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org







-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Indexing and Searching Web Application

2009-01-20 Thread Amin Mohammed-Coleman

Hi

After your email I had a look around and came up with the below  
solution (I'm not sure if this is the right approach or there is a  
performance implication to doing this)


public Summary[] search(SearchRequest searchRequest) {
List summaryList = new ArrayList();
StopWatch stopWatch = new StopWatch("searchStopWatch");
stopWatch.start();
MultiSearcher multiSearcher = null;
		List newIndexSearchers = new  
ArrayList();

try {
for (IndexSearcher indexSearcher: searchers) {
IndexReader indexReader = 
indexSearcher.getIndexReader().reopen();
IndexSearcher indexSearch = new 
IndexSearcher(indexReader);
newIndexSearchers.add(indexSearch);
}

			multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new  
IndexSearcher[] {}));
			QueryParser queryParser = new  
MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer);

Query query = 
queryParser.parse(searchRequest.getSearchTerm());

//TODO: Sort and Filters

TopDocs topDocs = multiSearcher.search(query, 100);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
			LOGGER.debug("total number of hits for [" + query.toString() + " ]  
= " +topDocs.totalHits);


for (ScoreDoc scoreDoc : scoreDocs) {
final Document doc = 
multiSearcher.doc(scoreDoc.doc);
float score = scoreDoc.score;
final BaseDocument baseDocument = new 
BaseDocument(doc, score);
Summary documentSummary = new 
DocumentSummaryImpl(baseDocument);
summaryList.add(documentSummary);
}

} catch (Exception e) {
throw new IllegalStateException(e);
}

stopWatch.stop();

		LOGGER.debug("total time taken for seach: " +  
stopWatch.getTotalTimeMillis() + " ms");

return summaryList.toArray(new Summary[] {});
}


The searchers are configured in spring using which looks like this:

class="org.apache.lucene.search.IndexSearcher" scope="prototype" lazy- 
init="true" >
		ref="rtfDirectory"  />

    

I set the dependencies on the DocumentSearcher class.


Cheers
Amin


On 19 Jan 2009, at 21:45, Amin Mohammed-Coleman wrote:


I make a call to my search class which looks like this:


public Summary[] search(SearchRequest searchRequest) {
List summaryList = new ArrayList();
StopWatch stopWatch = new StopWatch("searchStopWatch");
stopWatch.start();
MultiSearcher multiSearcher = null;
try {
			multiSearcher = new MultiSearcher(searchers.toArray(new  
IndexSearcher[] {}));
			QueryParser queryParser = new  
MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),  
analyzer);

Query query = 
queryParser.parse(searchRequest.getSearchTerm());

//TODO: Sort and Filters

TopDocs topDocs = multiSearcher.search(query, 100);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
			LOGGER.debug("total number of hits for [" + query.toString() +  
" ] = " +topDocs.totalHits);


for (ScoreDoc scoreDoc : scoreDocs) {
final Document doc = 
multiSearcher.doc(scoreDoc.doc);
float score = scoreDoc.score;
final BaseDocument baseDocument = new 
BaseDocument(doc, score);
Summary documentSummary = new 
DocumentSummaryImpl(baseDocument);
summaryList.add(documentSummary);
}

} catch (Exception e) {
throw new IllegalStateException(e);
}
stopWatch.stop();

		LOGGER.debug("total time taken for seach: " +  
stopWatch.getTotalTimeMillis() + " ms");

return summaryList.toArray(new Summary[] {});
}

Do I need to do this explicitly?


Cheers
Amin

On 19 Jan 2009, at 20:48, Greg Shackles wrote:

After you make the commit to the index, are you reloading the index  
in the

searchers?

- Greg


On Mon, Jan 19, 2009 at 3:29 PM, Amin Mohammed-Coleman >wrote:



Hi

I have recently worked 

Re: Indexing and Searching Web Application

2009-01-20 Thread Amin Mohammed-Coleman
Am I supposed to close the oldIndexReader?  I just tried this and I get an
exception stating that the IndexReader is closed.

Cheers

On Tue, Jan 20, 2009 at 9:33 AM, Ganesh  wrote:

> Reopen the reader, only if it is modified.
>
> IndexReader oldIndexReader = indexSearcher.getIndexReader();
> if (!oldIndexReader.isCurrent()) {
>   IndexReader newIndexReader = oldIndexReader.reOpen();
>   oldIndexReader.close();
>   indexSearcher.close();
>   IndexSearcher indexSearch = new IndexSearcher(newIndexReader);
> }
>
> Regards
> Ganesh
>
> - Original Message - From: "Amin Mohammed-Coleman" <
> ami...@gmail.com>
> To: 
> Sent: Tuesday, January 20, 2009 1:38 PM
> Subject: Re: Indexing and Searching Web Application
>
>
>
>  Hi
>>
>> After your email I had a look around and came up with the below
>> solution (I'm not sure if this is the right approach or there is a
>> performance implication to doing this)
>>
>> public Summary[] search(SearchRequest searchRequest) {
>> List summaryList = new ArrayList();
>> StopWatch stopWatch = new StopWatch("searchStopWatch");
>> stopWatch.start();
>> MultiSearcher multiSearcher = null;
>> List newIndexSearchers = new
>> ArrayList();
>> try {
>> for (IndexSearcher indexSearcher: searchers) {
>> IndexReader indexReader = indexSearcher.getIndexReader().reopen();
>> IndexSearcher indexSearch = new IndexSearcher(indexReader);
>> newIndexSearchers.add(indexSearch);
>> }
>>
>> multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new
>> IndexSearcher[] {}));
>> QueryParser queryParser = new
>> MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(), analyzer);
>> Query query = queryParser.parse(searchRequest.getSearchTerm());
>>
>> //TODO: Sort and Filters
>>
>> TopDocs topDocs = multiSearcher.search(query, 100);
>> ScoreDoc[] scoreDocs = topDocs.scoreDocs;
>> LOGGER.debug("total number of hits for [" + query.toString() + " ]
>> = " +topDocs.totalHits);
>>
>> for (ScoreDoc scoreDoc : scoreDocs) {
>> final Document doc = multiSearcher.doc(scoreDoc.doc);
>> float score = scoreDoc.score;
>> final BaseDocument baseDocument = new BaseDocument(doc, score);
>> Summary documentSummary = new DocumentSummaryImpl(baseDocument);
>> summaryList.add(documentSummary);
>> }
>>
>> } catch (Exception e) {
>> throw new IllegalStateException(e);
>> }
>>
>> stopWatch.stop();
>>
>> LOGGER.debug("total time taken for seach: " +
>> stopWatch.getTotalTimeMillis() + " ms");
>> return summaryList.toArray(new Summary[] {});
>> }
>>
>>
>> The searchers are configured in spring using which looks like this:
>>
>> > class="org.apache.lucene.search.IndexSearcher" scope="prototype" lazy-
>> init="true" >
>> > ref="rtfDirectory"  />
>> 
>>
>> I set the dependencies on the DocumentSearcher class.
>>
>>
>> Cheers
>> Amin
>>
>>
>> On 19 Jan 2009, at 21:45, Amin Mohammed-Coleman wrote:
>>
>>  I make a call to my search class which looks like this:
>>>
>>>
>>> public Summary[] search(SearchRequest searchRequest) {
>>> List summaryList = new ArrayList();
>>> StopWatch stopWatch = new StopWatch("searchStopWatch");
>>> stopWatch.start();
>>> MultiSearcher multiSearcher = null;
>>> try {
>>> multiSearcher = new MultiSearcher(searchers.toArray(new
>>> IndexSearcher[] {}));
>>> QueryParser queryParser = new
>>> MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
>>> analyzer);
>>> Query query = queryParser.parse(searchRequest.getSearchTerm());
>>>
>>> //TODO: Sort and Filters
>>>
>>> TopDocs topDocs = multiSearcher.search(query, 100);
>>> ScoreDoc[] scoreDocs = topDocs.scoreDocs;
>>> LOGGER.debug("total number of hits for [" + query.toString() +
>>> " ] = " +topDocs.totalHits);
>>>
>>> for (ScoreDoc scoreDoc : scoreDocs) {
>>> final Document doc = multiSearcher.doc(scoreDoc.doc);
>>> float score = scoreDoc.score;
>>> final BaseDocument baseDocument = new BaseDocument(doc, score);
>>> Summary documentSummary = new DocumentSummaryImpl(baseDocument);
>>> summaryList.add(documentSummary);
>>> }
>>>
>>> } catch (Exception e) {
>>> throw new IllegalStateException(e);
>>> }
>>> 

Re: Indexing and Searching Web Application

2009-01-20 Thread Amin Mohammed-Coleman

Hi

Yes I am using the reopen method on indexreader. I am not closing the  
old indexer as per Ganesh's instruction. It seems to be working  
correctly so I presume it's ok not to close.


Thanks


Amin

On 20 Jan 2009, at 19:27, "Angel, Eric"  wrote:


There's a reopen() method in the IndexReader class.  You can use that.

-Original Message-
From: Amin Mohammed-Coleman [mailto:ami...@gmail.com]
Sent: Tuesday, January 20, 2009 5:02 AM
To: java-user@lucene.apache.org
Subject: Re: Indexing and Searching Web Application

Am I supposed to close the oldIndexReader?  I just tried this and I  
get

an
exception stating that the IndexReader is closed.

Cheers

On Tue, Jan 20, 2009 at 9:33 AM, Ganesh  wrote:


Reopen the reader, only if it is modified.

IndexReader oldIndexReader = indexSearcher.getIndexReader();
if (!oldIndexReader.isCurrent()) {
 IndexReader newIndexReader = oldIndexReader.reOpen();
 oldIndexReader.close();
 indexSearcher.close();
 IndexSearcher indexSearch = new IndexSearcher(newIndexReader);
}

Regards
Ganesh

- Original Message ----- From: "Amin Mohammed-Coleman" <
ami...@gmail.com>
To: 
Sent: Tuesday, January 20, 2009 1:38 PM
Subject: Re: Indexing and Searching Web Application



Hi


After your email I had a look around and came up with the below
solution (I'm not sure if this is the right approach or there is a
performance implication to doing this)

public Summary[] search(SearchRequest searchRequest) {
List summaryList = new ArrayList();
StopWatch stopWatch = new StopWatch("searchStopWatch");
stopWatch.start();
MultiSearcher multiSearcher = null;
List newIndexSearchers = new
ArrayList();
try {
for (IndexSearcher indexSearcher: searchers) {
IndexReader indexReader = indexSearcher.getIndexReader().reopen();
IndexSearcher indexSearch = new IndexSearcher(indexReader);
newIndexSearchers.add(indexSearch);
}

multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new
IndexSearcher[] {}));
QueryParser queryParser = new
MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),

analyzer);

Query query = queryParser.parse(searchRequest.getSearchTerm());

//TODO: Sort and Filters

TopDocs topDocs = multiSearcher.search(query, 100);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
LOGGER.debug("total number of hits for [" + query.toString() + " ]
= " +topDocs.totalHits);

for (ScoreDoc scoreDoc : scoreDocs) {
final Document doc = multiSearcher.doc(scoreDoc.doc);
float score = scoreDoc.score;
final BaseDocument baseDocument = new BaseDocument(doc, score);
Summary documentSummary = new DocumentSummaryImpl(baseDocument);
summaryList.add(documentSummary);
}

} catch (Exception e) {
throw new IllegalStateException(e);
}

stopWatch.stop();

LOGGER.debug("total time taken for seach: " +
stopWatch.getTotalTimeMillis() + " ms");
return summaryList.toArray(new Summary[] {});
}


The searchers are configured in spring using which looks like this:


lazy-

init="true" >



I set the dependencies on the DocumentSearcher class.


Cheers
Amin


On 19 Jan 2009, at 21:45, Amin Mohammed-Coleman wrote:

I make a call to my search class which looks like this:



public Summary[] search(SearchRequest searchRequest) {
List summaryList = new ArrayList();
StopWatch stopWatch = new StopWatch("searchStopWatch");
stopWatch.start();
MultiSearcher multiSearcher = null;
try {
multiSearcher = new MultiSearcher(searchers.toArray(new
IndexSearcher[] {}));
QueryParser queryParser = new
MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
analyzer);
Query query = queryParser.parse(searchRequest.getSearchTerm());

//TODO: Sort and Filters

TopDocs topDocs = multiSearcher.search(query, 100);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
LOGGER.debug("total number of hits for [" + query.toString() +
" ] = " +topDocs.totalHits);

for (ScoreDoc scoreDoc : scoreDocs) {
final Document doc = multiSearcher.doc(scoreDoc.doc);
float score = scoreDoc.score;
final BaseDocument baseDocument = new BaseDocument(doc, score);
Summary documentSummary = new DocumentSummaryImpl(baseDocument);
summaryList.add(documentSummary);
}

} catch (Exception e) {
throw new IllegalStateException(e);
}
stopWatch.stop();

LOGGER.debug("total time taken for seach: " +
stopWatch.getTotalTimeMillis() + " ms");
return summaryList.toArray(new Summary[] {});
}

Do I need to do this explicitly?


Cheers
Amin

On 19 Jan 2009, at 20:48, Greg Shackles wrote:

After you make the commit to the index, are you reloading the index

in the
searchers?

- Greg


On Mon, Jan 19, 2009 at 3:29 PM, Amin Mohammed-Coleman <
ami...@gmail.com

wrote:


Hi


I have recently worked on developing an application which allows
you to
upload a file (which is indexed so you can search later).  I have
numerous
tests to show that you can index and search documents (in some
instances
within the same test), however when I per

Re: Indexing and Searching Web Application

2009-01-21 Thread Amin Mohammed-Coleman

Hi
Will give that a go.

Thanks

Sent from my iPhone

On 21 Jan 2009, at 12:26, "Ganesh"  wrote:

I am closing the old reader and it is working fine for me. Refer to  
IndexReader.Reopen javadoc.


///Below is the code snipper from IndexReader.reopen javadoc

IndexReader reader = ...
...
IndexReader new = r.reopen();
if (new != reader) {
 ... // reader was reopened
 reader.close();  //Old reader is closed.
}
reader = new;

Regards
Ganesh

- Original Message ----- From: "Amin Mohammed-Coleman" >

To: 
Cc: 
Sent: Wednesday, January 21, 2009 1:07 AM
Subject: Re: Indexing and Searching Web Application



Hi

Yes I am using the reopen method on indexreader. I am not closing  
the  old indexer as per Ganesh's instruction. It seems to be  
working  correctly so I presume it's ok not to close.


Thanks


Amin

On 20 Jan 2009, at 19:27, "Angel, Eric"  wrote:

There's a reopen() method in the IndexReader class.  You can use  
that.


-Original Message-
From: Amin Mohammed-Coleman [mailto:ami...@gmail.com]
Sent: Tuesday, January 20, 2009 5:02 AM
To: java-user@lucene.apache.org
Subject: Re: Indexing and Searching Web Application

Am I supposed to close the oldIndexReader?  I just tried this and  
I  get

an
exception stating that the IndexReader is closed.

Cheers

On Tue, Jan 20, 2009 at 9:33 AM, Ganesh   
wrote:



Reopen the reader, only if it is modified.

IndexReader oldIndexReader = indexSearcher.getIndexReader();
if (!oldIndexReader.isCurrent()) {
IndexReader newIndexReader = oldIndexReader.reOpen();
oldIndexReader.close();
indexSearcher.close();
IndexSearcher indexSearch = new IndexSearcher(newIndexReader);
}

Regards
Ganesh

----- Original Message - From: "Amin Mohammed-Coleman" <
ami...@gmail.com>
To: 
Sent: Tuesday, January 20, 2009 1:38 PM
Subject: Re: Indexing and Searching Web Application



Hi


After your email I had a look around and came up with the below
solution (I'm not sure if this is the right approach or there is a
performance implication to doing this)

public Summary[] search(SearchRequest searchRequest) {
List summaryList = new ArrayList();
StopWatch stopWatch = new StopWatch("searchStopWatch");
stopWatch.start();
MultiSearcher multiSearcher = null;
List newIndexSearchers = new
ArrayList();
try {
for (IndexSearcher indexSearcher: searchers) {
IndexReader indexReader = indexSearcher.getIndexReader().reopen();
IndexSearcher indexSearch = new IndexSearcher(indexReader);
newIndexSearchers.add(indexSearch);
}

multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new
IndexSearcher[] {}));
QueryParser queryParser = new
MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),

analyzer);

Query query = queryParser.parse(searchRequest.getSearchTerm());

//TODO: Sort and Filters

TopDocs topDocs = multiSearcher.search(query, 100);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
LOGGER.debug("total number of hits for [" + query.toString() + " ]
= " +topDocs.totalHits);

for (ScoreDoc scoreDoc : scoreDocs) {
final Document doc = multiSearcher.doc(scoreDoc.doc);
float score = scoreDoc.score;
final BaseDocument baseDocument = new BaseDocument(doc, score);
Summary documentSummary = new DocumentSummaryImpl(baseDocument);
summaryList.add(documentSummary);
}

} catch (Exception e) {
throw new IllegalStateException(e);
}

stopWatch.stop();

LOGGER.debug("total time taken for seach: " +
stopWatch.getTotalTimeMillis() + " ms");
return summaryList.toArray(new Summary[] {});
}


The searchers are configured in spring using which looks like  
this:



lazy-

init="true" >



I set the dependencies on the DocumentSearcher class.


Cheers
Amin


On 19 Jan 2009, at 21:45, Amin Mohammed-Coleman wrote:

I make a call to my search class which looks like this:



public Summary[] search(SearchRequest searchRequest) {
List summaryList = new ArrayList();
StopWatch stopWatch = new StopWatch("searchStopWatch");
stopWatch.start();
MultiSearcher multiSearcher = null;
try {
multiSearcher = new MultiSearcher(searchers.toArray(new
IndexSearcher[] {}));
QueryParser queryParser = new
MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
analyzer);
Query query = queryParser.parse(searchRequest.getSearchTerm());

//TODO: Sort and Filters

TopDocs topDocs = multiSearcher.search(query, 100);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
LOGGER.debug("total number of hits for [" + query.toString() +
" ] = " +topDocs.totalHits);

for (ScoreDoc scoreDoc : scoreDocs) {
final Document doc = multiSearcher.doc(scoreDoc.doc);
float score = scoreDoc.score;
final BaseDocument baseDocument = new BaseDocument(doc, score);
Summary documentSummary = new DocumentSummaryImpl(baseDocument);
summaryList.add(documentSummary);
}

} catch (Exception e) {
throw new IllegalStateException(e);
}
stopWatch.stop();

LOGGER.debug("total time taken for seach: "

Re: Indexing and Searching Web Application

2009-01-21 Thread Amin Mohammed-Coleman
Hi
I did the following according to java docs:

for (IndexSearcher indexSearcher: searchers) {

 IndexReader reader = indexSearcher.getIndexReader();

 IndexReader newReader = reader.reopen();

 if (newReader != reader) {

   reader.close();

 }

 reader = newReader;

 IndexSearcher indexSearch = new IndexSearcher(reader);

 indexSearchers.add(indexSearch);

}


First search works ok, susequent search result in:


org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed



Cheers



On Wed, Jan 21, 2009 at 1:47 PM, Amin Mohammed-Coleman wrote:

> Hi
> Will give that a go.
>
> Thanks
>
> Sent from my iPhone
>
> On 21 Jan 2009, at 12:26, "Ganesh"  wrote:
>
>  I am closing the old reader and it is working fine for me. Refer to
>> IndexReader.Reopen javadoc.
>>
>> ///Below is the code snipper from IndexReader.reopen javadoc
>>
>> IndexReader reader = ...
>> ...
>> IndexReader new = r.reopen();
>> if (new != reader) {
>>  ... // reader was reopened
>>  reader.close();  //Old reader is closed.
>> }
>> reader = new;
>>
>> Regards
>> Ganesh
>>
>> - Original Message - From: "Amin Mohammed-Coleman" <
>> ami...@gmail.com>
>> To: 
>> Cc: 
>> Sent: Wednesday, January 21, 2009 1:07 AM
>>
>> Subject: Re: Indexing and Searching Web Application
>>
>>
>>  Hi
>>>
>>> Yes I am using the reopen method on indexreader. I am not closing the
>>>  old indexer as per Ganesh's instruction. It seems to be working  correctly
>>> so I presume it's ok not to close.
>>>
>>> Thanks
>>>
>>>
>>> Amin
>>>
>>> On 20 Jan 2009, at 19:27, "Angel, Eric"  wrote:
>>>
>>>  There's a reopen() method in the IndexReader class.  You can use that.
>>>>
>>>> -Original Message-
>>>> From: Amin Mohammed-Coleman [mailto:ami...@gmail.com]
>>>> Sent: Tuesday, January 20, 2009 5:02 AM
>>>> To: java-user@lucene.apache.org
>>>> Subject: Re: Indexing and Searching Web Application
>>>>
>>>> Am I supposed to close the oldIndexReader?  I just tried this and I  get
>>>> an
>>>> exception stating that the IndexReader is closed.
>>>>
>>>> Cheers
>>>>
>>>> On Tue, Jan 20, 2009 at 9:33 AM, Ganesh  wrote:
>>>>
>>>>  Reopen the reader, only if it is modified.
>>>>>
>>>>> IndexReader oldIndexReader = indexSearcher.getIndexReader();
>>>>> if (!oldIndexReader.isCurrent()) {
>>>>> IndexReader newIndexReader = oldIndexReader.reOpen();
>>>>> oldIndexReader.close();
>>>>> indexSearcher.close();
>>>>> IndexSearcher indexSearch = new IndexSearcher(newIndexReader);
>>>>> }
>>>>>
>>>>> Regards
>>>>> Ganesh
>>>>>
>>>>> - Original Message - From: "Amin Mohammed-Coleman" <
>>>>> ami...@gmail.com>
>>>>> To: 
>>>>> Sent: Tuesday, January 20, 2009 1:38 PM
>>>>> Subject: Re: Indexing and Searching Web Application
>>>>>
>>>>>
>>>>>
>>>>> Hi
>>>>>
>>>>>>
>>>>>> After your email I had a look around and came up with the below
>>>>>> solution (I'm not sure if this is the right approach or there is a
>>>>>> performance implication to doing this)
>>>>>>
>>>>>> public Summary[] search(SearchRequest searchRequest) {
>>>>>> List summaryList = new ArrayList();
>>>>>> StopWatch stopWatch = new StopWatch("searchStopWatch");
>>>>>> stopWatch.start();
>>>>>> MultiSearcher multiSearcher = null;
>>>>>> List newIndexSearchers = new
>>>>>> ArrayList();
>>>>>> try {
>>>>>> for (IndexSearcher indexSearcher: searchers) {
>>>>>> IndexReader indexReader = indexSearcher.getIndexReader().reopen();
>>>>>> IndexSearcher indexSearch = new IndexSearcher(indexReader);
>>>>>> newIndexSearchers.add(indexSearch);
>>>>>> }
>>>>>>
>>>>>> multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new
>>>>>> IndexSearcher[] {}));
>>>>>> QueryParser queryParser = new
>>>>>> MultiFieldQ

Re: Indexing and Searching Web Application

2009-01-21 Thread Amin Mohammed-Coleman

Hi,

That is what I am doing with the line:

indexSearchers.add(indexSearch);

indexSearchers is an ArrayList that is constructed before the for loop:

List indexSearchers = new ArrayList();


I then pass the indexSearchers to :

multiSearcher = new MultiSearcher(indexSearchers.toArray(new  
IndexSearcher[] {}));



Cheers

On 21 Jan 2009, at 20:19, Ian Lea wrote:


I haven't been following this thread, but shouldn't you be replacing
the old searcher in your list of searchers rather than just adding the
new one on the end?  Could be wrong - I find the names in your code
snippet rather confusing.


--
Ian.

On Wed, Jan 21, 2009 at 6:59 PM, Amin Mohammed-Coleman > wrote:

Hi
I did the following according to java docs:

for (IndexSearcher indexSearcher: searchers) {

IndexReader reader = indexSearcher.getIndexReader();

IndexReader newReader = reader.reopen();

if (newReader != reader) {

 reader.close();

}

reader = newReader;

IndexSearcher indexSearch = new IndexSearcher(reader);

indexSearchers.add(indexSearch);

}


First search works ok, susequent search result in:


org.apache.lucene.store.AlreadyClosedException: this IndexReader is  
closed




Cheers



On Wed, Jan 21, 2009 at 1:47 PM, Amin Mohammed-Coleman >wrote:



Hi
Will give that a go.

Thanks

Sent from my iPhone

On 21 Jan 2009, at 12:26, "Ganesh"  wrote:

I am closing the old reader and it is working fine for me. Refer to

IndexReader.Reopen javadoc.

///Below is the code snipper from IndexReader.reopen javadoc

IndexReader reader = ...
...
IndexReader new = r.reopen();
if (new != reader) {
... // reader was reopened
reader.close();  //Old reader is closed.
}
reader = new;

Regards
Ganesh

- Original Message - From: "Amin Mohammed-Coleman" <
ami...@gmail.com>
To: 
Cc: 
Sent: Wednesday, January 21, 2009 1:07 AM

Subject: Re: Indexing and Searching Web Application


Hi


Yes I am using the reopen method on indexreader. I am not  
closing the
old indexer as per Ganesh's instruction. It seems to be working   
correctly

so I presume it's ok not to close.

Thanks


Amin

On 20 Jan 2009, at 19:27, "Angel, Eric"   
wrote:


There's a reopen() method in the IndexReader class.  You can use  
that.


-Original Message-
From: Amin Mohammed-Coleman [mailto:ami...@gmail.com]
Sent: Tuesday, January 20, 2009 5:02 AM
To: java-user@lucene.apache.org
Subject: Re: Indexing and Searching Web Application

Am I supposed to close the oldIndexReader?  I just tried this  
and I  get

an
exception stating that the IndexReader is closed.

Cheers

On Tue, Jan 20, 2009 at 9:33 AM, Ganesh   
wrote:


Reopen the reader, only if it is modified.


IndexReader oldIndexReader = indexSearcher.getIndexReader();
if (!oldIndexReader.isCurrent()) {
IndexReader newIndexReader = oldIndexReader.reOpen();
oldIndexReader.close();
indexSearcher.close();
IndexSearcher indexSearch = new IndexSearcher(newIndexReader);
}

Regards
Ganesh

- Original Message - From: "Amin Mohammed-Coleman" <
ami...@gmail.com>
To: 
Sent: Tuesday, January 20, 2009 1:38 PM
Subject: Re: Indexing and Searching Web Application



Hi



After your email I had a look around and came up with the below
solution (I'm not sure if this is the right approach or there  
is a

performance implication to doing this)

public Summary[] search(SearchRequest searchRequest) {
List summaryList = new ArrayList();
StopWatch stopWatch = new StopWatch("searchStopWatch");
stopWatch.start();
MultiSearcher multiSearcher = null;
List newIndexSearchers = new
ArrayList();
try {
for (IndexSearcher indexSearcher: searchers) {
IndexReader indexReader =  
indexSearcher.getIndexReader().reopen();

IndexSearcher indexSearch = new IndexSearcher(indexReader);
newIndexSearchers.add(indexSearch);
}

multiSearcher = new MultiSearcher(newIndexSearchers.toArray(new
IndexSearcher[] {}));
QueryParser queryParser = new
MultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),


analyzer);



Query query = queryParser.parse(searchRequest.getSearchTerm());


//TODO: Sort and Filters

TopDocs topDocs = multiSearcher.search(query, 100);
ScoreDoc[] scoreDocs = topDocs.scoreDocs;
LOGGER.debug("total number of hits for [" + query.toString()  
+ " ]

= " +topDocs.totalHits);

for (ScoreDoc scoreDoc : scoreDocs) {
final Document doc = multiSearcher.doc(scoreDoc.doc);
float score = scoreDoc.score;
final BaseDocument baseDocument = new BaseDocument(doc, score);
Summary documentSummary = new  
DocumentSummaryImpl(baseDocument);

summaryList.add(documentSummary);
}

} catch (Exception e) {
throw new IllegalStateException(e);
}

stopWatch.stop();

LOGGER.debug("total time taken for seach: " +
stopWatch.getTotalTimeMillis() + " ms");
return summaryList.toArray(new Summary[] {});
}


The searchers are configured in spring using which looks like  
this:


class="

Re: Indexing and Searching Web Application

2009-01-21 Thread Amin Mohammed-Coleman

Hi

I am trying to get an understanding and what the best practice is.

I am not saying that I am right, it may well be that my code is wrong,  
that is why I am posting this.   The original loop that I am iterating  
over is a spring injected dependency.  I don't reuse that in the  
multisearcher.  I create a new list (local variable) when I invoke the  
search method.  So I'm not sure how I can be adding to an existing list.


I presume it's a bad idea not to close the indexreader in this case.

Cheers




On 21 Jan 2009, at 20:43, Ian Lea wrote:


Oh well, it's your code so I guess you know what it does.

But I still think you're wrong.

If your list contains 3 searchers at the top of the loop and all 3
need to be reopened then the list will contain 6 searchers at the end
of the loop, and the first 3 will be for readers that you've just
closed.  Hence the already closed exception when you try to use them.


--
Ian.


On Wed, Jan 21, 2009 at 8:24 PM, Amin Mohammed-Coleman > wrote:

Hi,

That is what I am doing with the line:

indexSearchers.add(indexSearch);

indexSearchers is an ArrayList that is constructed before the for  
loop:


List indexSearchers = new ArrayList();


I then pass the indexSearchers to :

multiSearcher = new MultiSearcher(indexSearchers.toArray(new  
IndexSearcher[]

{}));


Cheers

On 21 Jan 2009, at 20:19, Ian Lea wrote:


I haven't been following this thread, but shouldn't you be replacing
the old searcher in your list of searchers rather than just adding  
the

new one on the end?  Could be wrong - I find the names in your code
snippet rather confusing.


--
Ian.

On Wed, Jan 21, 2009 at 6:59 PM, Amin Mohammed-Coleman >

wrote:


Hi
I did the following according to java docs:

for (IndexSearcher indexSearcher: searchers) {

IndexReader reader = indexSearcher.getIndexReader();

IndexReader newReader = reader.reopen();

if (newReader != reader) {

reader.close();

}

reader = newReader;

IndexSearcher indexSearch = new IndexSearcher(reader);

indexSearchers.add(indexSearch);

}


First search works ok, susequent search result in:


org.apache.lucene.store.AlreadyClosedException: this IndexReader is
closed



Cheers



On Wed, Jan 21, 2009 at 1:47 PM, Amin Mohammed-Coleman
wrote:


Hi
Will give that a go.

Thanks

Sent from my iPhone

On 21 Jan 2009, at 12:26, "Ganesh"  wrote:

I am closing the old reader and it is working fine for me. Refer  
to


IndexReader.Reopen javadoc.

///Below is the code snipper from IndexReader.reopen javadoc

IndexReader reader = ...
...
IndexReader new = r.reopen();
if (new != reader) {
... // reader was reopened
reader.close();  //Old reader is closed.
}
reader = new;

Regards
Ganesh

- Original Message - From: "Amin Mohammed-Coleman" <
ami...@gmail.com>
To: 
Cc: 
Sent: Wednesday, January 21, 2009 1:07 AM

Subject: Re: Indexing and Searching Web Application


Hi


Yes I am using the reopen method on indexreader. I am not  
closing the

old indexer as per Ganesh's instruction. It seems to be working
correctly
so I presume it's ok not to close.

Thanks


Amin

On 20 Jan 2009, at 19:27, "Angel, Eric"   
wrote:


There's a reopen() method in the IndexReader class.  You can  
use that.


-Original Message-
From: Amin Mohammed-Coleman [mailto:ami...@gmail.com]
Sent: Tuesday, January 20, 2009 5:02 AM
To: java-user@lucene.apache.org
Subject: Re: Indexing and Searching Web Application

Am I supposed to close the oldIndexReader?  I just tried this  
and I

get
an
exception stating that the IndexReader is closed.

Cheers

On Tue, Jan 20, 2009 at 9:33 AM, Ganesh 
wrote:

Reopen the reader, only if it is modified.


IndexReader oldIndexReader = indexSearcher.getIndexReader();
if (!oldIndexReader.isCurrent()) {
IndexReader newIndexReader = oldIndexReader.reOpen();
oldIndexReader.close();
indexSearcher.close();
IndexSearcher indexSearch = new IndexSearcher(newIndexReader);
}

Regards
Ganesh

- Original Message - From: "Amin Mohammed-Coleman" <
ami...@gmail.com>
To: 
Sent: Tuesday, January 20, 2009 1:38 PM
Subject: Re: Indexing and Searching Web Application



Hi



After your email I had a look around and came up with the  
below
solution (I'm not sure if this is the right approach or  
there is a

performance implication to doing this)

public Summary[] search(SearchRequest searchRequest) {
List summaryList = new ArrayList();
StopWatch stopWatch = new StopWatch("searchStopWatch");
stopWatch.start();
MultiSearcher multiSearcher = null;
List newIndexSearchers = new
ArrayList();
try {
for (IndexSearcher indexSearcher: searchers) {
IndexReader indexReader =  
indexSearcher.getIndexReader().reopen();

IndexSearcher indexSearch = new IndexSearcher(indexReader);
newIndexSearchers.add(indexSearch);
}

multiSearcher = new  
MultiSearcher(newIndexSearchers.toArray(new

IndexSearcher[] {}));
QueryParser

Re: Indexing and Searching Web Application

2009-01-21 Thread Amin Mohammed-Coleman

Hi

Thanks for your reply. You right it looks as the original list is the  
problem.  The list I loop over is spring configured to return a list  
of index searcher. Each index searcher looks at different indexes.


I would like to inject the list of index searchers as we may have  
requirement to add new searchers. This would mean configuring spring  
config file.


Trying to remove old index searcher gives me concurrent modification  
exception. Hmmm.


Is there another approach I can take?


Cheers

Sent from my iPhone

On 21 Jan 2009, at 22:32, Erick Erickson   
wrote:



NOTE: you're iterating over 'searchers' and
adding to indexSearchers. Is that a typo?

Assuming that it's not and your 'searchers'
is the copy you talk about (so you can freely
add?) you never delete from the underlying
indexSearchers. But you do close elements
because you're closing a reference to the
searcher that  points to the same underlying
object.

Assuming that's not the problem,
here's what I'd suggest. Log the count of
searchers just above your loop...

print searchers.size(); (or whatever).
for (IndexSearcher indexSearcher: searchers) {

Ian claims that the first time you'll see some number X
The second time you'll see X + Y.

You have to be getting this list of servers from someplace.
Wherever it is, it's (probably) persisted across calls,
because if it isn't you wouldn't have any open readers
to close.

Are you sure your local variable isn't just a reference
to the underlying (permanent) list?

See inline comments

for (IndexSearcher indexSearcher: searchers) {

IndexReader reader = indexSearcher.getIndexReader();

IndexReader newReader = reader.reopen();

if (newReader != reader) {

 reader.close();

}
[EOE} you have not removed the instance of the searcher from  
searchers (the

var in your for loop)
  but you have closed it. So next time your code tries to  
use it,

you've already closed it.

reader = newReader;

IndexSearcher indexSearch = new IndexSearcher(reader);

[EOE] This adds the newly opened searcher to the end of your array.  
The

original (closed) one is still there.

indexSearchers.add(indexSearch);

}

[EOE] So if you use searchers anywhere from here on, it's got closed
readers in it if you closed any of them.

Best
Erick

On Wed, Jan 21, 2009 at 4:19 PM, Amin Mohammed-Coleman >wrote:



Hi

I am trying to get an understanding and what the best practice is.

I am not saying that I am right, it may well be that my code is  
wrong, that
is why I am posting this.   The original loop that I am iterating  
over is a
spring injected dependency.  I don't reuse that in the  
multisearcher.  I
create a new list (local variable) when I invoke the search  
method.  So I'm

not sure how I can be adding to an existing list.

I presume it's a bad idea not to close the indexreader in this case.

Cheers





On 21 Jan 2009, at 20:43, Ian Lea wrote:

Oh well, it's your code so I guess you know what it does.


But I still think you're wrong.

If your list contains 3 searchers at the top of the loop and all 3
need to be reopened then the list will contain 6 searchers at the  
end

of the loop, and the first 3 will be for readers that you've just
closed.  Hence the already closed exception when you try to use  
them.



--
Ian.


On Wed, Jan 21, 2009 at 8:24 PM, Amin Mohammed-Coleman >

wrote:


Hi,

That is what I am doing with the line:

indexSearchers.add(indexSearch);

indexSearchers is an ArrayList that is constructed before the for  
loop:


List indexSearchers = new  
ArrayList();



I then pass the indexSearchers to :

multiSearcher = new MultiSearcher(indexSearchers.toArray(new
IndexSearcher[]
{}));


Cheers

On 21 Jan 2009, at 20:19, Ian Lea wrote:

I haven't been following this thread, but shouldn't you be  
replacing
the old searcher in your list of searchers rather than just  
adding the
new one on the end?  Could be wrong - I find the names in your  
code

snippet rather confusing.


--
Ian.

On Wed, Jan 21, 2009 at 6:59 PM, Amin Mohammed-Coleman <
ami...@gmail.com>
wrote:



Hi
I did the following according to java docs:

for (IndexSearcher indexSearcher: searchers) {

IndexReader reader = indexSearcher.getIndexReader();

IndexReader newReader = reader.reopen();

if (newReader != reader) {

reader.close();

}

reader = newReader;

IndexSearcher indexSearch = new IndexSearcher(reader);

indexSearchers.add(indexSearch);

}


First search works ok, susequent search result in:


org.apache.lucene.store.AlreadyClosedException: this  
IndexReader is

closed



Cheers



On Wed, Jan 21, 2009 at 1:47 PM, Amin Mohammed-Coleman
wrote:

Hi

Will give that a go.

Thanks

Sent from my iPhone

On 21 Jan 2009, at 12:26, "Ganesh"   
wrote:


I am closing the old reader and it is working fine for me.  
Refer to




IndexReader.Reopen javadoc.

///Below is the code

Re: Indexing and Searching Web Application

2009-01-21 Thread Amin Mohammed-Coleman


Hi

Please ignore my last email. I have managed to work out how to fix the  
problem.


Sent reply without morning coffee!

Thanks

Amin

Sent from my iPhone

On 21 Jan 2009, at 22:32, Erick Erickson   
wrote:



NOTE: you're iterating over 'searchers' and
adding to indexSearchers. Is that a typo?

Assuming that it's not and your 'searchers'
is the copy you talk about (so you can freely
add?) you never delete from the underlying
indexSearchers. But you do close elements
because you're closing a reference to the
searcher that  points to the same underlying
object.

Assuming that's not the problem,
here's what I'd suggest. Log the count of
searchers just above your loop...

print searchers.size(); (or whatever).
for (IndexSearcher indexSearcher: searchers) {

Ian claims that the first time you'll see some number X
The second time you'll see X + Y.

You have to be getting this list of servers from someplace.
Wherever it is, it's (probably) persisted across calls,
because if it isn't you wouldn't have any open readers
to close.

Are you sure your local variable isn't just a reference
to the underlying (permanent) list?

See inline comments

for (IndexSearcher indexSearcher: searchers) {

IndexReader reader = indexSearcher.getIndexReader();

IndexReader newReader = reader.reopen();

if (newReader != reader) {

 reader.close();

}
[EOE} you have not removed the instance of the searcher from  
searchers (the

var in your for loop)
  but you have closed it. So next time your code tries to  
use it,

you've already closed it.

reader = newReader;

IndexSearcher indexSearch = new IndexSearcher(reader);

[EOE] This adds the newly opened searcher to the end of your array.  
The

original (closed) one is still there.

indexSearchers.add(indexSearch);

}

[EOE] So if you use searchers anywhere from here on, it's got closed
readers in it if you closed any of them.

Best
Erick

On Wed, Jan 21, 2009 at 4:19 PM, Amin Mohammed-Coleman >wrote:



Hi

I am trying to get an understanding and what the best practice is.

I am not saying that I am right, it may well be that my code is  
wrong, that
is why I am posting this.   The original loop that I am iterating  
over is a
spring injected dependency.  I don't reuse that in the  
multisearcher.  I
create a new list (local variable) when I invoke the search  
method.  So I'm

not sure how I can be adding to an existing list.

I presume it's a bad idea not to close the indexreader in this case.

Cheers





On 21 Jan 2009, at 20:43, Ian Lea wrote:

Oh well, it's your code so I guess you know what it does.


But I still think you're wrong.

If your list contains 3 searchers at the top of the loop and all 3
need to be reopened then the list will contain 6 searchers at the  
end

of the loop, and the first 3 will be for readers that you've just
closed.  Hence the already closed exception when you try to use  
them.



--
Ian.


On Wed, Jan 21, 2009 at 8:24 PM, Amin Mohammed-Coleman >

wrote:


Hi,

That is what I am doing with the line:

indexSearchers.add(indexSearch);

indexSearchers is an ArrayList that is constructed before the for  
loop:


List indexSearchers = new  
ArrayList();



I then pass the indexSearchers to :

multiSearcher = new MultiSearcher(indexSearchers.toArray(new
IndexSearcher[]
{}));


Cheers

On 21 Jan 2009, at 20:19, Ian Lea wrote:

I haven't been following this thread, but shouldn't you be  
replacing
the old searcher in your list of searchers rather than just  
adding the
new one on the end?  Could be wrong - I find the names in your  
code

snippet rather confusing.


--
Ian.

On Wed, Jan 21, 2009 at 6:59 PM, Amin Mohammed-Coleman <
ami...@gmail.com>
wrote:



Hi
I did the following according to java docs:

for (IndexSearcher indexSearcher: searchers) {

IndexReader reader = indexSearcher.getIndexReader();

IndexReader newReader = reader.reopen();

if (newReader != reader) {

reader.close();

}

reader = newReader;

IndexSearcher indexSearch = new IndexSearcher(reader);

indexSearchers.add(indexSearch);

}


First search works ok, susequent search result in:


org.apache.lucene.store.AlreadyClosedException: this  
IndexReader is

closed



Cheers



On Wed, Jan 21, 2009 at 1:47 PM, Amin Mohammed-Coleman
wrote:

Hi

Will give that a go.

Thanks

Sent from my iPhone

On 21 Jan 2009, at 12:26, "Ganesh"   
wrote:


I am closing the old reader and it is working fine for me.  
Refer to




IndexReader.Reopen javadoc.

///Below is the code snipper from IndexReader.reopen javadoc

IndexReader reader = ...
...
IndexReader new = r.reopen();
if (new != reader) {
... // reader was reopened
reader.close();  //Old reader is closed.
}
reader = new;

Regards
Ganesh

- Original Message - From: "Amin Mohammed-Coleman" <
ami...@gmail.com>
To: 
Cc: 
Sent: Wednesday, January 21, 2009 1

Field.Store.YES Question

2009-02-05 Thread Amin Mohammed-Coleman
Hi

I'm probably going to get shot down for asking this simple question.
Although I think I understand the basic concept of Field I feel there is
something that I am missing and I was wondering if someone might help to
clarify.

You can store a field value in an index using Field.Store.YES or if the
content is too large then you can exclude it be stored in the index using
Field.Store.NO.   How does Lucene know how to search for a term in an index
if the value hasn't been stored in the index?  I guess I can understand that
if you don't store the field then you can't get the field and it's value
using the document api.

Is there a seperate part in the lucene document that the tokenised strings
are stored and therefore Lucene knows where to look?

Again I do apologise for asking this question...I just feel like I'm missing
something (knew I shouldn't have had those tequilla shots!).


Thanks
Amin


Re: Field.Store.YES Question

2009-02-05 Thread Amin Mohammed-Coleman
Thanks guys for your replies!  It's helped alot!

Cheers
Amin

On Thu, Feb 5, 2009 at 9:28 AM, Ganesh  wrote:

> Field.Store.Yes is to store the field data as it is, so that it could be
> retrieved to display results.
> Field.Index.ANALYZED, tokenizes the field and stores the tokenized content.
>
> Regards
> Ganesh
>
> - Original Message - From: "Amin Mohammed-Coleman" <
> ami...@gmail.com>
> To: 
> Sent: Thursday, February 05, 2009 2:00 PM
> Subject: Field.Store.YES Question
>
>
>
>  Hi
>>
>> I'm probably going to get shot down for asking this simple question.
>> Although I think I understand the basic concept of Field I feel there is
>> something that I am missing and I was wondering if someone might help to
>> clarify.
>>
>> You can store a field value in an index using Field.Store.YES or if the
>> content is too large then you can exclude it be stored in the index using
>> Field.Store.NO.   How does Lucene know how to search for a term in an
>> index
>> if the value hasn't been stored in the index?  I guess I can understand
>> that
>> if you don't store the field then you can't get the field and it's value
>> using the document api.
>>
>> Is there a seperate part in the lucene document that the tokenised strings
>> are stored and therefore Lucene knows where to look?
>>
>> Again I do apologise for asking this question...I just feel like I'm
>> missing
>> something (knew I shouldn't have had those tequilla shots!).
>>
>>
>> Thanks
>> Amin
>>
>>
> Send instant messages to your online friends http://in.messenger.yahoo.com
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Faceted Search using Lucene

2009-02-16 Thread Amin Mohammed-Coleman
Hi
I am looking at building a faceted search using Lucene.  I know that Solr
comes with this built in, however I would like to try this by myself
(something to add to my CV!).  I have been looking around and I found that
you can use the IndexReader and use TermVectors.  This looks ok but I'm not
sure how to filter the results so that a particular user can only see a
subset of results.  The next option I was looking at was something like

Term term1 = new Term("brand", "ford");

Term term2 = new Term("brand", "vw");

Term[] termsArray = new Term[] { term1, term2 };un

int[] docFreqs = indexSearcher.docFreqs(termsArray);


The only problem here is that I have to provide the brand type each time a
new brand is created.  Again I'm not sure how I can filter the results here.
It may be that I'm using the wrong api methods to do this.


I would be grateful if I could get some advice on this.



Cheers

Amin


P.S.  I am basically trying to do something that displays the following


Personal Contact (23) Business Contact (45) and so on..


Re: Faceted Search using Lucene

2009-02-22 Thread Amin Mohammed-Coleman

Hi

Sorry to re send this email but I was wondering if I could get some  
advice on this.


Cheers

Amin

On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman   
wrote:



Hi

I am looking at building a faceted search using Lucene.  I know that  
Solr comes with this built in, however I would like to try this by  
myself (something to add to my CV!).  I have been looking around and  
I found that you can use the IndexReader and use TermVectors.  This  
looks ok but I'm not sure how to filter the results so that a  
particular user can only see a subset of results.  The next option I  
was looking at was something like


Term term1 = new Term("brand", "ford");
Term term2 = new Term("brand", "vw");
Term[] termsArray = new Term[] { term1, term2 };un
int[] docFreqs = indexSearcher.docFreqs(termsArray);

The only problem here is that I have to provide the brand type each  
time a new brand is created.  Again I'm not sure how I can filter  
the results here. It may be that I'm using the wrong api methods to  
do this.


I would be grateful if I could get some advice on this.


Cheers
Amin

P.S.  I am basically trying to do something that displays the  
following


Personal Contact (23) Business Contact (45) and so on..








Re: Faceted Search using Lucene

2009-02-22 Thread Amin Mohammed-Coleman

Hi

Thanks just what I needed!

Cheers
Amin

On 22 Feb 2009, at 16:11, Marcelo Ochoa  wrote:


Hi Amin:
 Please take a look a this blog post:
http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
 Best regards, Marcelo.

On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman > wrote:

Hi

Sorry to re send this email but I was wondering if I could get some  
advice

on this.

Cheers

Amin

On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman   
wrote:



Hi

I am looking at building a faceted search using Lucene.  I know  
that Solr

comes with this built in, however I would like to try this by myself
(something to add to my CV!).  I have been looking around and I  
found that
you can use the IndexReader and use TermVectors.  This looks ok  
but I'm not
sure how to filter the results so that a particular user can only  
see a
subset of results.  The next option I was looking at was something  
like


Term term1 = new Term("brand", "ford");
Term term2 = new Term("brand", "vw");
Term[] termsArray = new Term[] { term1, term2 };un
int[] docFreqs = indexSearcher.docFreqs(termsArray);

The only problem here is that I have to provide the brand type  
each time a
new brand is created.  Again I'm not sure how I can filter the  
results here.

It may be that I'm using the wrong api methods to do this.

I would be grateful if I could get some advice on this.


Cheers
Amin

P.S.  I am basically trying to do something that displays the  
following


Personal Contact (23) Business Contact (45) and so on..












--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
__
Want to integrate Lucene and Oracle?
http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
Is Oracle 11g REST ready?
http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Faceted Search using Lucene

2009-02-24 Thread Amin Mohammed-Coleman
Hi
I have been able to get the code working for my scenario, however I have a
question and I was wondering if I could get some help.  I have a list of
IndexSearchers which are used in a MultiSearcher class.  I use the
indexsearchers to get each indexreader and put them into a MultiIndexReader.

IndexReader[] readers = new IndexReader[searchables.length];

for (int i =0 ; i < searchables.length;i++) {

IndexSearcher indexSearcher = (IndexSearcher)searchables[i];

readers[i] = indexSearcher.getIndexReader();

IndexReader newReader = readers[i].reopen();

if (newReader != readers[i]) {

readers[i].close();

}

readers[i] = newReader;



}

 multiReader = new MultiReader(readers);

OpenBitSetFacetHitCounter facetHitCounter = new OpenBitSetFacetHitCounter();

IndexSearcher indexSearcher = new IndexSearcher(multiReader);


I then use the indexseacher to do the facet stuff.  I end the code with
closing the multireader.  This is causing problems in another method where I
do some other search as the indexreaders are closed.  Is it ok to not close
the multiindexreader or should I do some additional checks in the other
method to see if the indexreader is closed?



Cheers


P.S. Hope that made sense...!


On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman wrote:

> Hi
>
> Thanks just what I needed!
>
> Cheers
> Amin
>
>
> On 22 Feb 2009, at 16:11, Marcelo Ochoa  wrote:
>
>  Hi Amin:
>>  Please take a look a this blog post:
>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
>>  Best regards, Marcelo.
>>
>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman 
>> wrote:
>>
>>> Hi
>>>
>>> Sorry to re send this email but I was wondering if I could get some
>>> advice
>>> on this.
>>>
>>> Cheers
>>>
>>> Amin
>>>
>>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman 
>>> wrote:
>>>
>>>  Hi
>>>>
>>>> I am looking at building a faceted search using Lucene.  I know that
>>>> Solr
>>>> comes with this built in, however I would like to try this by myself
>>>> (something to add to my CV!).  I have been looking around and I found
>>>> that
>>>> you can use the IndexReader and use TermVectors.  This looks ok but I'm
>>>> not
>>>> sure how to filter the results so that a particular user can only see a
>>>> subset of results.  The next option I was looking at was something like
>>>>
>>>> Term term1 = new Term("brand", "ford");
>>>> Term term2 = new Term("brand", "vw");
>>>> Term[] termsArray = new Term[] { term1, term2 };un
>>>> int[] docFreqs = indexSearcher.docFreqs(termsArray);
>>>>
>>>> The only problem here is that I have to provide the brand type each time
>>>> a
>>>> new brand is created.  Again I'm not sure how I can filter the results
>>>> here.
>>>> It may be that I'm using the wrong api methods to do this.
>>>>
>>>> I would be grateful if I could get some advice on this.
>>>>
>>>>
>>>> Cheers
>>>> Amin
>>>>
>>>> P.S.  I am basically trying to do something that displays the following
>>>>
>>>> Personal Contact (23) Business Contact (45) and so on..
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Marcelo F. Ochoa
>> http://marceloochoa.blogspot.com/
>> http://marcelo.ochoa.googlepages.com/home
>> __
>> Want to integrate Lucene and Oracle?
>>
>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
>> Is Oracle 11g REST ready?
>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>


Re: Faceted Search using Lucene

2009-02-24 Thread Amin Mohammed-Coleman
The reason for the indexreader.reopen is because I have a webapp which
enables users to upload files and then search for the documents.  If I don't
reopen i'm concerned that the facet hit counter won't be updated.

On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman wrote:

> Hi
> I have been able to get the code working for my scenario, however I have a
> question and I was wondering if I could get some help.  I have a list of
> IndexSearchers which are used in a MultiSearcher class.  I use the
> indexsearchers to get each indexreader and put them into a MultiIndexReader.
>
> IndexReader[] readers = new IndexReader[searchables.length];
>
> for (int i =0 ; i < searchables.length;i++) {
>
> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>
> readers[i] = indexSearcher.getIndexReader();
>
> IndexReader newReader = readers[i].reopen();
>
> if (newReader != readers[i]) {
>
> readers[i].close();
>
> }
>
> readers[i] = newReader;
>
>
>
> }
>
>  multiReader = new MultiReader(readers);
>
> OpenBitSetFacetHitCounter facetHitCounter = newOpenBitSetFacetHitCounter();
>
> IndexSearcher indexSearcher = new IndexSearcher(multiReader);
>
>
> I then use the indexseacher to do the facet stuff.  I end the code with
> closing the multireader.  This is causing problems in another method where I
> do some other search as the indexreaders are closed.  Is it ok to not close
> the multiindexreader or should I do some additional checks in the other
> method to see if the indexreader is closed?
>
>
>
> Cheers
>
>
> P.S. Hope that made sense...!
>
>
> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman 
> wrote:
>
>> Hi
>>
>> Thanks just what I needed!
>>
>> Cheers
>> Amin
>>
>>
>> On 22 Feb 2009, at 16:11, Marcelo Ochoa  wrote:
>>
>>  Hi Amin:
>>>  Please take a look a this blog post:
>>>
>>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
>>>  Best regards, Marcelo.
>>>
>>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman 
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> Sorry to re send this email but I was wondering if I could get some
>>>> advice
>>>> on this.
>>>>
>>>> Cheers
>>>>
>>>> Amin
>>>>
>>>> On 16 Feb 2009, at 20:37, Amin Mohammed-Coleman 
>>>> wrote:
>>>>
>>>>  Hi
>>>>>
>>>>> I am looking at building a faceted search using Lucene.  I know that
>>>>> Solr
>>>>> comes with this built in, however I would like to try this by myself
>>>>> (something to add to my CV!).  I have been looking around and I found
>>>>> that
>>>>> you can use the IndexReader and use TermVectors.  This looks ok but I'm
>>>>> not
>>>>> sure how to filter the results so that a particular user can only see a
>>>>> subset of results.  The next option I was looking at was something like
>>>>>
>>>>> Term term1 = new Term("brand", "ford");
>>>>> Term term2 = new Term("brand", "vw");
>>>>> Term[] termsArray = new Term[] { term1, term2 };un
>>>>> int[] docFreqs = indexSearcher.docFreqs(termsArray);
>>>>>
>>>>> The only problem here is that I have to provide the brand type each
>>>>> time a
>>>>> new brand is created.  Again I'm not sure how I can filter the results
>>>>> here.
>>>>> It may be that I'm using the wrong api methods to do this.
>>>>>
>>>>> I would be grateful if I could get some advice on this.
>>>>>
>>>>>
>>>>> Cheers
>>>>> Amin
>>>>>
>>>>> P.S.  I am basically trying to do something that displays the following
>>>>>
>>>>> Personal Contact (23) Business Contact (45) and so on..
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Marcelo F. Ochoa
>>> http://marceloochoa.blogspot.com/
>>> http://marcelo.ochoa.googlepages.com/home
>>> __
>>> Want to integrate Lucene and Oracle?
>>>
>>> http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
>>> Is Oracle 11g REST ready?
>>> http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html
>>>
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>


Re: Faceted Search using Lucene

2009-02-26 Thread Amin Mohammed-Coleman
Hi

Thanks for your reply.  I have modified the code to the following:

public Map getFacetHitCount(String searchTerm) {

QueryParser queryParser =
newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
analyzer);

Query baseQuery = null;

try {

if (!StringUtils.isBlank(searchTerm)) {

baseQuery = queryParser.parse(searchTerm);

LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
baseQuery.toString() +"'");

} else {

LOGGER.debug("No base query. Using default, which is going to check for all
documents of every type.");

}

} catch (ParseException e1) {

throw new RuntimeException(e1);

}

Map subQueries = constructDocTypeSubQueriesMap();

 Map facetHitCount = new HashMap();

MultiReader multiReader = null;

try {

Searchable[] searchables = this.searchers.toArray(new Searchable[]
{}).clone();

IndexReader[] readers = new IndexReader[searchables.length];

for (int i =0 ; i < searchables.length;i++) {

IndexSearcher indexSearcher = (IndexSearcher)searchables[i];

readers[i] = indexSearcher.getIndexReader();

Directory directory = readers[i].directory();

IndexReader indexReader = IndexReader.open(directory);

readers[i] = indexReader;

}

multiReader = new MultiReader(readers);

OpenBitSetFacetHitCounter facetHitCounter = new OpenBitSetFacetHitCounter();

IndexSearcher indexSearcher = new IndexSearcher(multiReader);

if (baseQuery != null) {

facetHitCounter.setBaseQuery(baseQuery);

}

facetHitCounter.setSearcher(indexSearcher);

facetHitCounter.setSubQueries(subQueries);

facetHitCount= facetHitCounter.getFacetHitCounts();

LOGGER.debug("Document Type Facet Hit Count '" + facetHitCount + "'");

} catch (Exception e) {

throw new IllegalStateException(e);

} finally {

try {

multiReader.close();

LOGGER.debug("Closed multi reader.");

} catch (IOException e) {

throw new IllegalStateException(e);

}

}

return facetHitCount;

}


Does this make sense?  I am new to lucene and working on a complete search
solution so I would be grateful for any advice on what is best practice.


Cheers





On Thu, Feb 26, 2009 at 7:55 AM, Michael Stoppelman wrote:

> If another thread is executing a query with the handle to one of readers[i]
> you're going to kill it since the IndexReader is now closed.
> Just don't call the IndexReader#close() method. If nothing is pointing at
> the readers they should be garbage collected. Also, you might
> want to warm up your new IndexSearcher before you switch to it, meaning run
> a few queries on it before you swap the old one out.
>
> M
>
>
>
> On Tue, Feb 24, 2009 at 12:48 PM, Amin Mohammed-Coleman  >wrote:
>
> > The reason for the indexreader.reopen is because I have a webapp which
> > enables users to upload files and then search for the documents.  If I
> > don't
> > reopen i'm concerned that the facet hit counter won't be updated.
> >
> > On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman  > >wrote:
> >
> > > Hi
> > > I have been able to get the code working for my scenario, however I
> have
> > a
> > > question and I was wondering if I could get some help.  I have a list
> of
> > > IndexSearchers which are used in a MultiSearcher class.  I use the
> > > indexsearchers to get each indexreader and put them into a
> > MultiIndexReader.
> > >
> > > IndexReader[] readers = new IndexReader[searchables.length];
> > >
> > > for (int i =0 ; i < searchables.length;i++) {
> > >
> > > IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
> > >
> > > readers[i] = indexSearcher.getIndexReader();
> > >
> > > IndexReader newReader = readers[i].reopen();
> > >
> > > if (newReader != readers[i]) {
> > >
> > > readers[i].close();
> > >
> > > }
> > >
> > > readers[i] = newReader;
> > >
> > >
> > >
> > > }
> > >
> > >  multiReader = new MultiReader(readers);
> > >
> > > OpenBitSetFacetHitCounter facetHitCounter =
> > newOpenBitSetFacetHitCounter();
> > >
> > > IndexSearcher indexSearcher = new IndexSearcher(multiReader);
> > >
> > >
> > > I then use the indexseacher to do the facet stuff.  I end the code with
> > > closing the multireader.  This is causing problems in another method
> > where I
> > > do some other search as the indexreaders are closed.  Is it ok to not
> > close
> > > the multiindexreader or should I do some additional checks in the other
> > > method to see if the indexreader is closed?
> > >
> > >
> 

Re: Faceted Search using Lucene

2009-02-26 Thread Amin Mohammed-Coleman
Hi

Thanks for your reply.  Without sound completely ...silly...how do i go
abouts using the methods you mentioned...

Cheers
Amin

On Thu, Feb 26, 2009 at 10:24 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> Actually, it's best to use IndexReader.incRef/decRef to track the
> IndexReader.
>
> You should not rely on GC to close your IndexReader since this can easily
> tie up resources (eg open file descriptors) for too long.
>
> Mike
>
>
> Michael Stoppelman wrote:
>
>  If another thread is executing a query with the handle to one of
>> readers[i]
>> you're going to kill it since the IndexReader is now closed.
>> Just don't call the IndexReader#close() method. If nothing is pointing at
>> the readers they should be garbage collected. Also, you might
>> want to warm up your new IndexSearcher before you switch to it, meaning
>> run
>> a few queries on it before you swap the old one out.
>>
>> M
>>
>>
>>
>> On Tue, Feb 24, 2009 at 12:48 PM, Amin Mohammed-Coleman > >wrote:
>>
>>  The reason for the indexreader.reopen is because I have a webapp which
>>> enables users to upload files and then search for the documents.  If I
>>> don't
>>> reopen i'm concerned that the facet hit counter won't be updated.
>>>
>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman >>
>>>> wrote:
>>>>
>>>
>>>  Hi
>>>> I have been able to get the code working for my scenario, however I have
>>>>
>>> a
>>>
>>>> question and I was wondering if I could get some help.  I have a list of
>>>> IndexSearchers which are used in a MultiSearcher class.  I use the
>>>> indexsearchers to get each indexreader and put them into a
>>>>
>>> MultiIndexReader.
>>>
>>>>
>>>> IndexReader[] readers = new IndexReader[searchables.length];
>>>>
>>>> for (int i =0 ; i < searchables.length;i++) {
>>>>
>>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>>>>
>>>> readers[i] = indexSearcher.getIndexReader();
>>>>
>>>>   IndexReader newReader = readers[i].reopen();
>>>>
>>>> if (newReader != readers[i]) {
>>>>
>>>> readers[i].close();
>>>>
>>>> }
>>>>
>>>> readers[i] = newReader;
>>>>
>>>>
>>>>
>>>> }
>>>>
>>>> multiReader = new MultiReader(readers);
>>>>
>>>> OpenBitSetFacetHitCounter facetHitCounter =
>>>>
>>> newOpenBitSetFacetHitCounter();
>>>
>>>>
>>>> IndexSearcher indexSearcher = new IndexSearcher(multiReader);
>>>>
>>>>
>>>> I then use the indexseacher to do the facet stuff.  I end the code with
>>>> closing the multireader.  This is causing problems in another method
>>>>
>>> where I
>>>
>>>> do some other search as the indexreaders are closed.  Is it ok to not
>>>>
>>> close
>>>
>>>> the multiindexreader or should I do some additional checks in the other
>>>> method to see if the indexreader is closed?
>>>>
>>>>
>>>>
>>>> Cheers
>>>>
>>>>
>>>> P.S. Hope that made sense...!
>>>>
>>>>
>>>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed-Coleman <
>>>> ami...@gmail.com
>>>> wrote:
>>>>
>>>>  Hi
>>>>>
>>>>> Thanks just what I needed!
>>>>>
>>>>> Cheers
>>>>> Amin
>>>>>
>>>>>
>>>>> On 22 Feb 2009, at 16:11, Marcelo Ochoa 
>>>>>
>>>> wrote:
>>>
>>>>
>>>>> Hi Amin:
>>>>>
>>>>>> Please take a look a this blog post:
>>>>>>
>>>>>>
>>>>>>
>>> http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html
>>>
>>>> Best regards, Marcelo.
>>>>>>
>>>>>> On Sun, Feb 22, 2009 at 1:18 PM, Amin Mohammed-Coleman <
>>>>>>
>>>>> ami...@gmail.com>
>>>
>>>> wrote:
>>>>>>
>>>>>>  Hi
>>>>>>>
>>>>>>> Sorry to re send this email 

Re: Faceted Search using Lucene

2009-02-26 Thread Amin Mohammed-Coleman
Hi

Thanks for your help.  I will modify my facet search and my other code to
use the recommendations.   Would it be ok to get a review of the completed
code?  I just want to make sure that I'm not doing anything that may cause
any problems (threading, memory).

Cheers

On Thu, Feb 26, 2009 at 1:10 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> See below -- this is an excerpt from the upcoming Lucene in Action
> revision (chapter 10).
>
> It's a simple class.  Use it like this for searching:
>
>  IndexSearcher searcher = manager.get();
>  try {
>searcher.search(...).
>...render results...
>  } finally {
>manager.release(searcher);
>searcher = null;
>  }
>
> When you want to reopen (application dependent), call maybeReopen.
> Subclass and define the warm() method if needed.
>
> NOTE: this hasn't yet been heavily tested (I just quickly revised it to use
> incRef/decRef).
>
> Mike
>
> import java.io.IOException;
> import java.util.HashMap;
>
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.store.Directory;
>
> /** Utility class to get/refresh searchers when you are
>  *  using multiple threads. */
>
> public class SearcherManager {
>
>  private IndexSearcher currentSearcher; //A
>  private Directory dir;
>
>  public SearcherManager(Directory dir) throws IOException {
>this.dir = dir;
>currentSearcher = new IndexSearcher(IndexReader.open(dir));  //B
>  }
>
>  public void warm(IndexSearcher searcher) {}//C
>
>  public void maybeReopen() throws IOException { //D
>long currentVersion = currentSearcher.getIndexReader().getVersion();
>if (IndexReader.getCurrentVersion(dir) != currentVersion) {
>  IndexReader newReader = currentSearcher.getIndexReader().reopen();
>  assert newReader != currentSearcher.getIndexReader();
>  IndexSearcher newSearcher = new IndexSearcher(newReader);
>  warm(newSearcher);
>  swapSearcher(newSearcher);
>}
>  }
>
>  public synchronized IndexSearcher get() {  //E
>currentSearcher.getIndexReader().incRef();
>return currentSearcher;
>  }
>
>  public synchronized void release(IndexSearcher searcher)   //F
>throws IOException {
>searcher.getIndexReader().decRef();
>  }
>
>  private synchronized void swapSearcher(IndexSearcher newSearcher) //G
>  throws IOException {
>release(currentSearcher);
>currentSearcher = newSearcher;
>  }
> }
>
> /*
> #A Current IndexSearcher
> #B Create initial searcher
> #C Implement in subclass to warm new searcher
> #D Call this to reopen searcher if index changed
> #E Returns current searcher
> #F Release searcher
> #G Swaps currentSearcher to new searcher
> */
>
> Mike
>
>
> Amin Mohammed-Coleman wrote:
>
>  Hi
>>
>> Thanks for your reply.  Without sound completely ...silly...how do i go
>> abouts using the methods you mentioned...
>>
>> Cheers
>> Amin
>>
>> On Thu, Feb 26, 2009 at 10:24 AM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>
>>> Actually, it's best to use IndexReader.incRef/decRef to track the
>>> IndexReader.
>>>
>>> You should not rely on GC to close your IndexReader since this can easily
>>> tie up resources (eg open file descriptors) for too long.
>>>
>>> Mike
>>>
>>>
>>> Michael Stoppelman wrote:
>>>
>>> If another thread is executing a query with the handle to one of
>>>
>>>> readers[i]
>>>> you're going to kill it since the IndexReader is now closed.
>>>> Just don't call the IndexReader#close() method. If nothing is pointing
>>>> at
>>>> the readers they should be garbage collected. Also, you might
>>>> want to warm up your new IndexSearcher before you switch to it, meaning
>>>> run
>>>> a few queries on it before you swap the old one out.
>>>>
>>>> M
>>>>
>>>>
>>>>
>>>> On Tue, Feb 24, 2009 at 12:48 PM, Amin Mohammed-Coleman <
>>>> ami...@gmail.com
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> The reason for the indexreader.reopen is because I have a webapp which
>>>>
>>>>> enables users to upload files and then search for the documents.  If I
>>>>> don't
>>>>> reopen i

Re: Faceted Search using Lucene

2009-02-26 Thread Amin Mohammed-Coleman
Hi
I have modified my search code.  Here is the following:
[code]

 public Summary[] search(SearchRequest searchRequest)
throwsSearchExecutionException {

String searchTerm = searchRequest.getSearchTerm();

if (StringUtils.isBlank(searchTerm)) {

throw new SearchExecutionException("Search string cannot be empty. There
will be too many results to process.");

}

List summaryList = new ArrayList();

StopWatch stopWatch = new StopWatch("searchStopWatch");

stopWatch.start();

MultiSearcher multiSearcher = null;

List indexSearchers = new ArrayList();

boolean refreshSearchers = false;

try {

LOGGER.debug("Ensuring all index readers are up to date...");

for (IndexSearcher indexSearcher: searchers) {

 IndexReader reader = indexSearcher.getIndexReader();

 reader.incRef();

 Directory directory = reader.directory();



 long currentVersion = reader.getVersion();

 if (IndexReader.getCurrentVersion(directory) != currentVersion) {

 IndexReader newReader = reader.reopen();

 if (newReader != reader) {

 reader.decRef();

 refreshSearchers = true;

 }

 reader = newReader;

 }

 IndexSearcher indexSearch = new IndexSearcher(reader);

 indexSearchers.add(indexSearch);

}

if (refreshSearchers) {

searchers.clear();

searchers = new ArrayList(indexSearchers);

}

LOGGER.debug("All Index Searchers are up to date. No of index searchers '" +
indexSearchers.size() +"'");

 multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[]
{}));

PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper(
analyzer);

analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(),
newKeywordAnalyzer());

QueryParser queryParser =
newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
analyzerWrapper);

 Query query = queryParser.parse(searchTerm);

LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
query.toString() +"'");

 Sort sort = null;

sort = applySortIfApplicable(searchRequest);

 Filter[] filters =applyFiltersIfApplicable(searchRequest);

 ChainedFilter chainedFilter = null;

if (filters != null) {

chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);

}

TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort);

ScoreDoc[] scoreDocs = topDocs.scoreDocs;

LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs.
totalHits);

 for (ScoreDoc scoreDoc : scoreDocs) {

final Document doc = multiSearcher.doc(scoreDoc.doc);

float score = scoreDoc.score;

final BaseDocument baseDocument = new BaseDocument(doc, score);

Summary documentSummary = new DocumentSummaryImpl(baseDocument);

summaryList.add(documentSummary);

}

multiSearcher.close();

} catch (Exception e) {

throw new IllegalStateException(e);

}

stopWatch.stop();

 LOGGER.debug("total time taken for document seach: " +
stopWatch.getTotalTimeMillis() + " ms");

return summaryList.toArray(new Summary[] {});

}

 [/code]

Just some background:

There is a list of indexsearchers that are injected via Spring.  These
searchers are configured again by Spring.  As you can see the multisearcher
is a local variable.  I then have a variable that checks if a indexreader is
not up to date.  When this is set to true the indexsearchers are refreshed.

I would be grateful on your thoughts.


On Thu, Feb 26, 2009 at 1:35 PM, Amin Mohammed-Coleman wrote:

> Hi
>
> Thanks for your help.  I will modify my facet search and my other code to
> use the recommendations.   Would it be ok to get a review of the completed
> code?  I just want to make sure that I'm not doing anything that may cause
> any problems (threading, memory).
>
> Cheers
>
>
> On Thu, Feb 26, 2009 at 1:10 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>>
>> See below -- this is an excerpt from the upcoming Lucene in Action
>> revision (chapter 10).
>>
>> It's a simple class.  Use it like this for searching:
>>
>>  IndexSearcher searcher = manager.get();
>>  try {
>>searcher.search(...).
>>...render results...
>>  } finally {
>>manager.release(searcher);
>>searcher = null;
>>  }
>>
>> When you want to reopen (application dependent), call maybeReopen.
>> Subclass and define the warm() method if needed.
>>
>> NOTE: this hasn't yet been heavily tested (I just quickly revised it to
>> use
>> incRef/decRef).
>>
>> Mike
>>
>> import java.io.IOException;
>> import java.util.HashMap;
>>
>> import org.apache.lucene.search.IndexSearcher;
>> import org.apache.lucene.index.IndexReader;
>> import org.apache.lucene.store.Directory;
>>
>> /** Utility class to get/refresh searchers when you are
>&

Re: Faceted Search using Lucene

2009-02-26 Thread Amin Mohammed-Coleman
Forgot to mention that the previous code that i sent was related to facet
search.  This is a general search method I have implemented (they can
probably be combined...).

On Thu, Feb 26, 2009 at 8:21 PM, Amin Mohammed-Coleman wrote:

> Hi
> I have modified my search code.  Here is the following:
> [code]
>
>  public Summary[] search(SearchRequest searchRequest)  
> throwsSearchExecutionException {
>
> String searchTerm = searchRequest.getSearchTerm();
>
> if (StringUtils.isBlank(searchTerm)) {
>
> throw new SearchExecutionException("Search string cannot be empty. There
> will be too many results to process.");
>
> }
>
> List summaryList = new ArrayList();
>
> StopWatch stopWatch = new StopWatch("searchStopWatch");
>
> stopWatch.start();
>
> MultiSearcher multiSearcher = null;
>
> List indexSearchers = new ArrayList();
>
> boolean refreshSearchers = false;
>
> try {
>
> LOGGER.debug("Ensuring all index readers are up to date...");
>
> for (IndexSearcher indexSearcher: searchers) {
>
>  IndexReader reader = indexSearcher.getIndexReader();
>
>  reader.incRef();
>
>  Directory directory = reader.directory();
>
>
>
>  long currentVersion = reader.getVersion();
>
>  if (IndexReader.getCurrentVersion(directory) != currentVersion) {
>
>  IndexReader newReader = reader.reopen();
>
>  if (newReader != reader) {
>
>  reader.decRef();
>
>  refreshSearchers = true;
>
>  }
>
>  reader = newReader;
>
>  }
>
>  IndexSearcher indexSearch = new IndexSearcher(reader);
>
>  indexSearchers.add(indexSearch);
>
> }
>
> if (refreshSearchers) {
>
> searchers.clear();
>
> searchers = new ArrayList(indexSearchers);
>
> }
>
> LOGGER.debug("All Index Searchers are up to date. No of index searchers '"+ 
> indexSearchers.size() +
> "'");
>
>  multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[]
> {}));
>
> PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper(
> analyzer);
>
> analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(), 
> newKeywordAnalyzer());
>
> QueryParser queryParser = 
> newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
> analyzerWrapper);
>
>  Query query = queryParser.parse(searchTerm);
>
> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
> query.toString() +"'");
>
>  Sort sort = null;
>
> sort = applySortIfApplicable(searchRequest);
>
>  Filter[] filters =applyFiltersIfApplicable(searchRequest);
>
>  ChainedFilter chainedFilter = null;
>
> if (filters != null) {
>
> chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);
>
> }
>
> TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort);
>
> ScoreDoc[] scoreDocs = topDocs.scoreDocs;
>
> LOGGER.debug("total number of hits for [" + query.toString() + " ] = 
> "+topDocs.
> totalHits);
>
>  for (ScoreDoc scoreDoc : scoreDocs) {
>
> final Document doc = multiSearcher.doc(scoreDoc.doc);
>
> float score = scoreDoc.score;
>
> final BaseDocument baseDocument = new BaseDocument(doc, score);
>
> Summary documentSummary = new DocumentSummaryImpl(baseDocument);
>
> summaryList.add(documentSummary);
>
> }
>
> multiSearcher.close();
>
> } catch (Exception e) {
>
> throw new IllegalStateException(e);
>
> }
>
> stopWatch.stop();
>
>  LOGGER.debug("total time taken for document seach: " +
> stopWatch.getTotalTimeMillis() + " ms");
>
> return summaryList.toArray(new Summary[] {});
>
> }
>
>  [/code]
>
> Just some background:
>
> There is a list of indexsearchers that are injected via Spring.  These
> searchers are configured again by Spring.  As you can see the multisearcher
> is a local variable.  I then have a variable that checks if a indexreader is
> not up to date.  When this is set to true the indexsearchers are refreshed.
>
> I would be grateful on your thoughts.
>
>
> On Thu, Feb 26, 2009 at 1:35 PM, Amin Mohammed-Coleman 
> wrote:
>
>> Hi
>>
>> Thanks for your help.  I will modify my facet search and my other code to
>> use the recommendations.   Would it be ok to get a review of the completed
>> code?  I just want to make sure that I'm not doing anything that may cause
>> any problems (threading, memory).
>>
>> Cheers
>>
>>
>> On Thu, Feb 26, 2009 at 1:10 PM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>&

Re: Faceted Search using Lucene

2009-03-01 Thread Amin Mohammed-Coleman
Hi
Thanks for your input.  I would like to have a go at doing this myself
first, Solr may be an option.

* You are creating a new Analyzer & QueryParser every time, also
   creating unnecessary garbage; instead, they should be created once
   & reused.

-- I can moved the code out so that it is only created once and reused.


 * You always make a new IndexSearcher and a new MultiSearcher even
   when nothing has changed.  This just generates unnecessary garbage
   which GC then must sweep up.

-- This was something I thought about.  I could move it out so that it's
created once.  However I presume inside my code i need to check whether the
indexreaders are update to date.  This needs to be synchronized as well I
guess(?)

 * I don't see any synchronization -- it looks like two search
   requests are allowed into this method at the same time?  Which is
   dangerous... eg both (or, more) will wastefully reopen the
   readers.
--  So i need to extract the logic for reopening and provide a
synchronisation mechanism.


Ok.  So I have some work to do.  I'll refactor the code and see if I can get
inline to your recommendations.


On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> On a quick look, I think there are a few problems with the code:
>
>  * I don't see any synchronization -- it looks like two search
>requests are allowed into this method at the same time?  Which is
>dangerous... eg both (or, more) will wastefully reopen the
>readers.
>
>  * You are over-incRef'ing (the reader.incRef inside the loop) -- I
>don't see a corresponding decRef.
>
>  * You reopen and warm your searchers "live" (vs with BG thread);
>meaning the unlucky search request that hits a reopen pays the
>cost.  This might be OK if the index is small enough that
>reopening & warming takes very little time.  But if index gets
>large, making a random search pay that warming cost is not nice to
>the end user.  It erodes their trust in you.
>
>  * You always make a new IndexSearcher and a new MultiSearcher even
>when nothing has changed.  This just generates unnecessary garbage
>which GC then must sweep up.
>
>  * You are creating a new Analyzer & QueryParser every time, also
>creating unnecessary garbage; instead, they should be created once
>& reused.
>
> You should consider simply using Solr -- it handles all this logic for
> you and has been well debugged with time...
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  The reason for the indexreader.reopen is because I have a webapp which
>> enables users to upload files and then search for the documents.  If I
>> don't
>> reopen i'm concerned that the facet hit counter won't be updated.
>>
>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman > >wrote:
>>
>>  Hi
>>> I have been able to get the code working for my scenario, however I have
>>> a
>>> question and I was wondering if I could get some help.  I have a list of
>>> IndexSearchers which are used in a MultiSearcher class.  I use the
>>> indexsearchers to get each indexreader and put them into a
>>> MultiIndexReader.
>>>
>>> IndexReader[] readers = new IndexReader[searchables.length];
>>>
>>> for (int i =0 ; i < searchables.length;i++) {
>>>
>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>>>
>>> readers[i] = indexSearcher.getIndexReader();
>>>
>>>   IndexReader newReader = readers[i].reopen();
>>>
>>> if (newReader != readers[i]) {
>>>
>>> readers[i].close();
>>>
>>> }
>>>
>>> readers[i] = newReader;
>>>
>>>
>>>
>>> }
>>>
>>> multiReader = new MultiReader(readers);
>>>
>>> OpenBitSetFacetHitCounter facetHitCounter =
>>> newOpenBitSetFacetHitCounter();
>>>
>>> IndexSearcher indexSearcher = new IndexSearcher(multiReader);
>>>
>>>
>>> I then use the indexseacher to do the facet stuff.  I end the code with
>>> closing the multireader.  This is causing problems in another method
>>> where I
>>> do some other search as the indexreaders are closed.  Is it ok to not
>>> close
>>> the multiindexreader or should I do some additional checks in the other
>>> method to see if the indexreader is closed?
>>>
>>>
>>>
>>> Cheers
>>>
>>>
>>> P.S. Hope that made sense...!
>>>
>>>
>>> On Mon, Feb 23, 2009 at 7:20 AM, Amin Mohammed

Re: Faceted Search using Lucene

2009-03-01 Thread Amin Mohammed-Coleman
thanks.  i will rewrite..in between giving my baby her feed and playing with
the other child and my wife who wants me to do several other things!


On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> Amin Mohammed-Coleman wrote:
>
>  Hi
>> Thanks for your input.  I would like to have a go at doing this myself
>> first, Solr may be an option.
>>
>> * You are creating a new Analyzer & QueryParser every time, also
>>  creating unnecessary garbage; instead, they should be created once
>>  & reused.
>>
>> -- I can moved the code out so that it is only created once and reused.
>>
>>
>> * You always make a new IndexSearcher and a new MultiSearcher even
>>  when nothing has changed.  This just generates unnecessary garbage
>>  which GC then must sweep up.
>>
>> -- This was something I thought about.  I could move it out so that it's
>> created once.  However I presume inside my code i need to check whether
>> the
>> indexreaders are update to date.  This needs to be synchronized as well I
>> guess(?)
>>
>
> Yes you should synchronize the check for whether the IndexReader is
> current.
>
>  * I don't see any synchronization -- it looks like two search
>>  requests are allowed into this method at the same time?  Which is
>>  dangerous... eg both (or, more) will wastefully reopen the
>>  readers.
>> --  So i need to extract the logic for reopening and provide a
>> synchronisation mechanism.
>>
>
> Yes.
>
>
>  Ok.  So I have some work to do.  I'll refactor the code and see if I can
>> get
>> inline to your recommendations.
>>
>>
>> On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>
>>> On a quick look, I think there are a few problems with the code:
>>>
>>> * I don't see any synchronization -- it looks like two search
>>>  requests are allowed into this method at the same time?  Which is
>>>  dangerous... eg both (or, more) will wastefully reopen the
>>>  readers.
>>>
>>> * You are over-incRef'ing (the reader.incRef inside the loop) -- I
>>>  don't see a corresponding decRef.
>>>
>>> * You reopen and warm your searchers "live" (vs with BG thread);
>>>  meaning the unlucky search request that hits a reopen pays the
>>>  cost.  This might be OK if the index is small enough that
>>>  reopening & warming takes very little time.  But if index gets
>>>  large, making a random search pay that warming cost is not nice to
>>>  the end user.  It erodes their trust in you.
>>>
>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>  when nothing has changed.  This just generates unnecessary garbage
>>>  which GC then must sweep up.
>>>
>>> * You are creating a new Analyzer & QueryParser every time, also
>>>  creating unnecessary garbage; instead, they should be created once
>>>  & reused.
>>>
>>> You should consider simply using Solr -- it handles all this logic for
>>> you and has been well debugged with time...
>>>
>>> Mike
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>> The reason for the indexreader.reopen is because I have a webapp which
>>>
>>>> enables users to upload files and then search for the documents.  If I
>>>> don't
>>>> reopen i'm concerned that the facet hit counter won't be updated.
>>>>
>>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman <
>>>> ami...@gmail.com
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> Hi
>>>>
>>>>> I have been able to get the code working for my scenario, however I
>>>>> have
>>>>> a
>>>>> question and I was wondering if I could get some help.  I have a list
>>>>> of
>>>>> IndexSearchers which are used in a MultiSearcher class.  I use the
>>>>> indexsearchers to get each indexreader and put them into a
>>>>> MultiIndexReader.
>>>>>
>>>>> IndexReader[] readers = new IndexReader[searchables.length];
>>>>>
>>>>> for (int i =0 ; i < searchables.length;i++) {
>>>>>
>>>>> IndexSearcher indexSearcher = (IndexSearcher)searchables[i];
>>>>>
>>>>> readers[i] = indexSearcher.ge

Re: Faceted Search using Lucene

2009-03-01 Thread Amin Mohammed-Coleman
just a quick point:
 public void maybeReopen() throws IOException { //D
   long currentVersion = currentSearcher.getIndexReader().getVersion();
   if (IndexReader.getCurrentVersion(dir) != currentVersion) {
 IndexReader newReader = currentSearcher.getIndexReader().reopen();
 assert newReader != currentSearcher.getIndexReader();
 IndexSearcher newSearcher = new IndexSearcher(newReader);
 warm(newSearcher);
 swapSearcher(newSearcher);
   }
 }

should the above be synchronised?

On Sun, Mar 1, 2009 at 1:25 PM, Amin Mohammed-Coleman wrote:

> thanks.  i will rewrite..in between giving my baby her feed and playing
> with the other child and my wife who wants me to do several other things!
>
>
>
> On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>>
>> Amin Mohammed-Coleman wrote:
>>
>>  Hi
>>> Thanks for your input.  I would like to have a go at doing this myself
>>> first, Solr may be an option.
>>>
>>> * You are creating a new Analyzer & QueryParser every time, also
>>>  creating unnecessary garbage; instead, they should be created once
>>>  & reused.
>>>
>>> -- I can moved the code out so that it is only created once and reused.
>>>
>>>
>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>  when nothing has changed.  This just generates unnecessary garbage
>>>  which GC then must sweep up.
>>>
>>> -- This was something I thought about.  I could move it out so that it's
>>> created once.  However I presume inside my code i need to check whether
>>> the
>>> indexreaders are update to date.  This needs to be synchronized as well I
>>> guess(?)
>>>
>>
>> Yes you should synchronize the check for whether the IndexReader is
>> current.
>>
>>  * I don't see any synchronization -- it looks like two search
>>>  requests are allowed into this method at the same time?  Which is
>>>  dangerous... eg both (or, more) will wastefully reopen the
>>>  readers.
>>> --  So i need to extract the logic for reopening and provide a
>>> synchronisation mechanism.
>>>
>>
>> Yes.
>>
>>
>>  Ok.  So I have some work to do.  I'll refactor the code and see if I can
>>> get
>>> inline to your recommendations.
>>>
>>>
>>> On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>>
>>>> On a quick look, I think there are a few problems with the code:
>>>>
>>>> * I don't see any synchronization -- it looks like two search
>>>>  requests are allowed into this method at the same time?  Which is
>>>>  dangerous... eg both (or, more) will wastefully reopen the
>>>>  readers.
>>>>
>>>> * You are over-incRef'ing (the reader.incRef inside the loop) -- I
>>>>  don't see a corresponding decRef.
>>>>
>>>> * You reopen and warm your searchers "live" (vs with BG thread);
>>>>  meaning the unlucky search request that hits a reopen pays the
>>>>  cost.  This might be OK if the index is small enough that
>>>>  reopening & warming takes very little time.  But if index gets
>>>>  large, making a random search pay that warming cost is not nice to
>>>>  the end user.  It erodes their trust in you.
>>>>
>>>> * You always make a new IndexSearcher and a new MultiSearcher even
>>>>  when nothing has changed.  This just generates unnecessary garbage
>>>>  which GC then must sweep up.
>>>>
>>>> * You are creating a new Analyzer & QueryParser every time, also
>>>>  creating unnecessary garbage; instead, they should be created once
>>>>  & reused.
>>>>
>>>> You should consider simply using Solr -- it handles all this logic for
>>>> you and has been well debugged with time...
>>>>
>>>> Mike
>>>>
>>>> Amin Mohammed-Coleman wrote:
>>>>
>>>> The reason for the indexreader.reopen is because I have a webapp which
>>>>
>>>>> enables users to upload files and then search for the documents.  If I
>>>>> don't
>>>>> reopen i'm concerned that the facet hit counter won't be updated.
>>>>>
>>>>> On Tue, Feb 24, 2009 at 8:32 PM, Amin Mohammed-Coleman <
>>>>&

Re: Faceted Search using Lucene

2009-03-01 Thread Amin Mohammed-Coleman
Hi
I've now done the following:

public Summary[] search(final SearchRequest searchRequest)
throwsSearchExecutionException {

final String searchTerm = searchRequest.getSearchTerm();

if (StringUtils.isBlank(searchTerm)) {

throw new SearchExecutionException("Search string cannot be empty. There
will be too many results to process.");

}

List summaryList = new ArrayList();

StopWatch stopWatch = new StopWatch("searchStopWatch");

stopWatch.start();

List indexSearchers = new ArrayList();

try {

LOGGER.debug("Ensuring all index readers are up to date...");

maybeReopen();

LOGGER.debug("All Index Searchers are up to date. No of index searchers '" +
indexSearchers.size() +"'");

 Query query = queryParser.parse(searchTerm);

LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
query.toString() +"'");

 Sort sort = null;

sort = applySortIfApplicable(searchRequest);

 Filter[] filters =applyFiltersIfApplicable(searchRequest);

 ChainedFilter chainedFilter = null;

if (filters != null) {

chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);

}

TopDocs topDocs = get().search(query,chainedFilter ,100,sort);

ScoreDoc[] scoreDocs = topDocs.scoreDocs;

LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs.
totalHits);

 for (ScoreDoc scoreDoc : scoreDocs) {

final Document doc = multiSearcher.doc(scoreDoc.doc);

float score = scoreDoc.score;

final BaseDocument baseDocument = new BaseDocument(doc, score);

Summary documentSummary = new DocumentSummaryImpl(baseDocument);

summaryList.add(documentSummary);

}

multiSearcher.close();

} catch (Exception e) {

throw new IllegalStateException(e);

}

stopWatch.stop();

 LOGGER.debug("total time taken for document seach: " +
stopWatch.getTotalTimeMillis() + " ms");

return summaryList.toArray(new Summary[] {});

}


And have the following methods:

@PostConstruct

public void initialiseQueryParser() {

PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper(
analyzer);

analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(),
newKeywordAnalyzer());

queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
analyzerWrapper);

 try {

LOGGER.debug("Initialising multi searcher ");

this.multiSearcher = new MultiSearcher(searchers.toArray(new IndexSearcher[]
{}));

LOGGER.debug("multi searcher initialised");

} catch (IOException e) {

throw new IllegalStateException(e);

}

 }


Initialises mutltisearcher when this class is creared by spring.


private synchronized void swapMultiSearcher(MultiSearcher newMultiSearcher)
{

try {

release(multiSearcher);

} catch (IOException e) {

throw new IllegalStateException(e);

}

multiSearcher = newMultiSearcher;

}

  public void maybeReopen() throws IOException {

 MultiSearcher newMultiSeacher = null;

 boolean refreshMultiSeacher = false;

 List indexSearchers = new ArrayList();

 synchronized (searchers) {

 for (IndexSearcher indexSearcher: searchers) {

 IndexReader reader = indexSearcher.getIndexReader();

 reader.incRef();

 Directory directory = reader.directory();

 long currentVersion = reader.getVersion();

 if (IndexReader.getCurrentVersion(directory) != currentVersion) {

 IndexReader newReader = indexSearcher.getIndexReader().reopen();

 if (newReader != reader) {

 reader.decRef();

 refreshMultiSeacher = true;

 }

 reader = newReader;

 IndexSearcher newSearcher = new IndexSearcher(newReader);

 indexSearchers.add(newSearcher);

 }

 }

 }



 if (refreshMultiSeacher) {

newMultiSeacher = new
MultiSearcher(indexSearchers.toArray(newIndexSearcher[] {}));

warm(newMultiSeacher);

swapMultiSearcher(newMultiSeacher);

 }



 }


  private void warm(MultiSearcher newMultiSeacher) {

 }



 private synchronized MultiSearcher get() {

for (IndexSearcher indexSearcher: searchers) {

indexSearcher.getIndexReader().incRef();

}

return multiSearcher;

}

 private synchronized void release(MultiSearcher multiSearcher)
throwsIOException {

for (IndexSearcher indexSearcher: searchers) {

indexSearcher.getIndexReader().decRef();

}

}


However I am now getting


java.lang.IllegalStateException:
org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed


on the call:


private synchronized MultiSearcher get() {

for (IndexSearcher indexSearcher: searchers) {

indexSearcher.getIndexReader().incRef();

}

return multiSearcher;

}


I'm doing something wrong ..obviously..not sure where though..


Cheers


On Sun, Mar 1, 2009 at 1:36 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> I was wondering the same thing ;)
>
> It's best to call this method from a single BG "warming" thread, in which
> case it would not need its own synchronization.
>
> But, to be safe, I'll add intern

Re: Faceted Search using Lucene

2009-03-01 Thread Amin Mohammed-Coleman
sorrry I added

release(multiSearcher);


instead of multiSearcher.close();

On Sun, Mar 1, 2009 at 2:17 PM, Amin Mohammed-Coleman wrote:

> Hi
> I've now done the following:
>
> public Summary[] search(final SearchRequest searchRequest)  
> throwsSearchExecutionException {
>
> final String searchTerm = searchRequest.getSearchTerm();
>
> if (StringUtils.isBlank(searchTerm)) {
>
> throw new SearchExecutionException("Search string cannot be empty. There
> will be too many results to process.");
>
> }
>
> List summaryList = new ArrayList();
>
> StopWatch stopWatch = new StopWatch("searchStopWatch");
>
> stopWatch.start();
>
> List indexSearchers = new ArrayList();
>
> try {
>
> LOGGER.debug("Ensuring all index readers are up to date...");
>
> maybeReopen();
>
> LOGGER.debug("All Index Searchers are up to date. No of index searchers '"+ 
> indexSearchers.size() +
> "'");
>
>  Query query = queryParser.parse(searchTerm);
>
> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
> query.toString() +"'");
>
>  Sort sort = null;
>
> sort = applySortIfApplicable(searchRequest);
>
>  Filter[] filters =applyFiltersIfApplicable(searchRequest);
>
>  ChainedFilter chainedFilter = null;
>
> if (filters != null) {
>
> chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);
>
> }
>
> TopDocs topDocs = get().search(query,chainedFilter ,100,sort);
>
> ScoreDoc[] scoreDocs = topDocs.scoreDocs;
>
> LOGGER.debug("total number of hits for [" + query.toString() + " ] = 
> "+topDocs.
> totalHits);
>
>  for (ScoreDoc scoreDoc : scoreDocs) {
>
> final Document doc = multiSearcher.doc(scoreDoc.doc);
>
> float score = scoreDoc.score;
>
> final BaseDocument baseDocument = new BaseDocument(doc, score);
>
> Summary documentSummary = new DocumentSummaryImpl(baseDocument);
>
> summaryList.add(documentSummary);
>
> }
>
> multiSearcher.close();
>
> } catch (Exception e) {
>
> throw new IllegalStateException(e);
>
> }
>
> stopWatch.stop();
>
>  LOGGER.debug("total time taken for document seach: " +
> stopWatch.getTotalTimeMillis() + " ms");
>
> return summaryList.toArray(new Summary[] {});
>
> }
>
>
> And have the following methods:
>
> @PostConstruct
>
> public void initialiseQueryParser() {
>
> PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper(
> analyzer);
>
> analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(), 
> newKeywordAnalyzer());
>
> queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
> analyzerWrapper);
>
>  try {
>
> LOGGER.debug("Initialising multi searcher ");
>
> this.multiSearcher = new MultiSearcher(searchers.toArray(newIndexSearcher[] 
> {}));
>
> LOGGER.debug("multi searcher initialised");
>
> } catch (IOException e) {
>
> throw new IllegalStateException(e);
>
> }
>
>  }
>
>
> Initialises mutltisearcher when this class is creared by spring.
>
>
>  private synchronized void swapMultiSearcher(MultiSearcher
> newMultiSearcher)  {
>
> try {
>
> release(multiSearcher);
>
> } catch (IOException e) {
>
> throw new IllegalStateException(e);
>
> }
>
> multiSearcher = newMultiSearcher;
>
> }
>
>   public void maybeReopen() throws IOException {
>
>  MultiSearcher newMultiSeacher = null;
>
>  boolean refreshMultiSeacher = false;
>
>  List indexSearchers = new ArrayList();
>
>  synchronized (searchers) {
>
>  for (IndexSearcher indexSearcher: searchers) {
>
>  IndexReader reader = indexSearcher.getIndexReader();
>
>  reader.incRef();
>
>  Directory directory = reader.directory();
>
>  long currentVersion = reader.getVersion();
>
>  if (IndexReader.getCurrentVersion(directory) != currentVersion) {
>
>  IndexReader newReader = indexSearcher.getIndexReader().reopen();
>
>  if (newReader != reader) {
>
>  reader.decRef();
>
>  refreshMultiSeacher = true;
>
>  }
>
>  reader = newReader;
>
>  IndexSearcher newSearcher = new IndexSearcher(newReader);
>
>  indexSearchers.add(newSearcher);
>
>  }
>
>  }
>
>  }
>
>
>
>  if (refreshMultiSeacher) {
>
> newMultiSeacher = new MultiSearcher(indexSearchers.toArray(newIndexSearcher[] 
> {}));
>
> warm(newMultiSeacher);
>
> swapMultiSearcher(newMultiSeacher);
>
>  }
>
>
>
>  }
>
>
>   private void warm(MultiSearcher newMultiSeac

Re: Faceted Search using Lucene

2009-03-01 Thread Amin Mohammed-Coleman
Hi again...
Thanks for your patience, I modified the code to do the following:

private void maybeReopen() throws Exception {

 startReopen();

 try {

 MultiSearcher newMultiSeacher = get();

 boolean refreshMultiSeacher = false;

 List indexSearchers = new ArrayList();

 synchronized (searchers) {

 for (IndexSearcher indexSearcher: searchers) {

 IndexReader reader = indexSearcher.getIndexReader();

 reader.incRef();

 Directory directory = reader.directory();

 long currentVersion = reader.getVersion();

 if (IndexReader.getCurrentVersion(directory) != currentVersion) {

 IndexReader newReader = indexSearcher.getIndexReader().reopen();

 if (newReader != reader) {

 reader.decRef();

 refreshMultiSeacher = true;

 }

 reader = newReader;

 IndexSearcher newSearcher = new IndexSearcher(reader);

 indexSearchers.add(newSearcher);

 }

 }

 }



 if (refreshMultiSeacher) {

try {

newMultiSeacher = new
MultiSearcher(indexSearchers.toArray(newIndexSearcher[] {}));

warm(newMultiSeacher);

swapMultiSearcher(newMultiSeacher);

}finally {

release(multiSearcher);

}

 }

 } finally {

 doneReopen();

 }

 }


But I'm still getting an AlreadyCloseException this occurs when I call the
get() method in the main search code.


Cheers



On Sun, Mar 1, 2009 at 2:24 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> OK new version of SearcherManager, that fixes maybeReopen() so that it can
> be called from multiple threads.
>
> NOTE: it's still untested!
>
> Mike
>
> package lia.admin;
>
> import java.io.IOException;
> import java.util.HashMap;
>
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.store.Directory;
>
> /** Utility class to get/refresh searchers when you are
>  *  using multiple threads. */
>
> public class SearcherManager {
>
>  private IndexSearcher currentSearcher; //A
>  private Directory dir;
>
>  public SearcherManager(Directory dir) throws IOException {
>this.dir = dir;
>currentSearcher = new IndexSearcher(IndexReader.open(dir));  //B
>  }
>
>  public void warm(IndexSearcher searcher) {}//C
>
>  private boolean reopening;
>
>  private synchronized void startReopen()//D
>throws InterruptedException {
>while (reopening) {
>  wait();
>}
>reopening = true;
>  }
>
>  private synchronized void doneReopen() {   //E
>reopening = false;
>notifyAll();
>  }
>
>  public void maybeReopen() throws InterruptedException, IOException { //F
>
>startReopen();
>
>try {
>  final IndexSearcher searcher = get();
>  try {
>long currentVersion = currentSearcher.getIndexReader().getVersion();
>  //G
>if (IndexReader.getCurrentVersion(dir) != currentVersion) {
>   //G
>  IndexReader newReader = currentSearcher.getIndexReader().reopen();
>  //G
>  assert newReader != currentSearcher.getIndexReader();
>   //G
>  IndexSearcher newSearcher = new IndexSearcher(newReader);
>   //G
>  warm(newSearcher);
>  //G
>  swapSearcher(newSearcher);
>  //G
>}
>  } finally {
>release(searcher);
>  }
>} finally {
>  doneReopen();
>}
>  }
>
>  public synchronized IndexSearcher get() {  //H
>currentSearcher.getIndexReader().incRef();
>return currentSearcher;
>  }
>
>  public synchronized void release(IndexSearcher searcher)   //I
>throws IOException {
>searcher.getIndexReader().decRef();
>  }
>
>  private synchronized void swapSearcher(IndexSearcher newSearcher) //J
>  throws IOException {
>release(currentSearcher);
>currentSearcher = newSearcher;
>  }
> }
>
> /*
> #A Current IndexSearcher
> #B Create initial searcher
> #C Implement in subclass to warm new searcher
> #D Pauses until no other thread is reopening
> #E Finish reopen and notify other threads
> #F Reopen searcher if there are changes
> #G Check index version and reopen, warm, swap if needed
> #H Returns current searcher
> #I Release searcher
> #J Swaps currentSearcher to new searcher
> */
>
> Mike
>
>
> On Mar 1, 2009, at 8:27 AM, Amin Mohammed-Coleman wrote:
>
>  just a quick point:
>> public void maybeReopen() throws IOException { //D
>>  long currentVersion = currentSearcher.getIndexReader().getVersion();
>>  if (IndexReader.getCurrentVersion(dir) != currentVersion) {
>>IndexReader newReader = currentSearcher.getIndexReader().reopen();
>>assert newReader != currentSearcher.getIndexReader();
>>I

Re: Faceted Search using Lucene

2009-03-01 Thread Amin Mohammed-Coleman
Hi
Thanks again for helping on a Sunday!

I have now modified my maybeOpen() to do the following:

 private void maybeReopen() throws Exception {

 LOGGER.debug("Initiating reopening of index readers...");

 IndexSearcher[] indexSearchers = (IndexSearcher[]) multiSearcher
.getSearchables();

 for (IndexSearcher indexSearcher : indexSearchers) {

 IndexReader indexReader = indexSearcher.getIndexReader();

 SearcherManager documentSearcherManager = new
 SearcherManager(indexReader.directory());

 documentSearcherManager.maybeReopen();

}

 }


And get() to:


private synchronized MultiSearcher get() {

IndexSearcher[] indexSearchers = (IndexSearcher[]) multiSearcher
.getSearchables();

List  indexSearchersList = new ArrayList();

for (IndexSearcher indexSearcher : indexSearchers) {

IndexReader indexReader = indexSearcher.getIndexReader();

SearcherManager documentSearcherManager = null;

try {

documentSearcherManager = new SearcherManager(indexReader.directory());

} catch (IOException e) {

throw new IllegalStateException(e);

}

indexSearchersList.add(documentSearcherManager.get());

}

try {

multiSearcher = new
MultiSearcher(indexSearchersList.toArray(newIndexSearcher[] {}));

} catch (IOException e) {

throw new IllegalStateException(e);

}

return multiSearcher;

}



This makes all my test pass.  I am using the SearchManager that you
recommended.  Does this look ok?


On Sun, Mar 1, 2009 at 2:38 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Your maybeReopen has an excess incRef().
>
> I'm not sure how you open the searchers in the first place?  The list
> starts as empty, and nothing populates it?
>
> When you do the initial population, you need an incRef.
>
> I think you're hitting IllegalStateException because maybeReopen is
> closing a reader before get() can get it (since they synchronize on
> different objects).
>
> I'd recommend switching to the SearcherManager class.  Instantiate one
> for each of your searchers.  On each search request, go through them
> and call maybeReopen(), and then call get() and gather each
> IndexSearcher instance into a new array.  Then, make a new
> MultiSearcher (opposite of what I said before): while that creates a
> small amount of garbage, it'll keep your code simpler (good
> tradeoff).
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  sorrry I added
>>
>> release(multiSearcher);
>>
>>
>> instead of multiSearcher.close();
>>
>> On Sun, Mar 1, 2009 at 2:17 PM, Amin Mohammed-Coleman > >wrote:
>>
>>  Hi
>>> I've now done the following:
>>>
>>> public Summary[] search(final SearchRequest searchRequest)
>>>  throwsSearchExecutionException {
>>>
>>> final String searchTerm = searchRequest.getSearchTerm();
>>>
>>> if (StringUtils.isBlank(searchTerm)) {
>>>
>>> throw new SearchExecutionException("Search string cannot be empty. There
>>> will be too many results to process.");
>>>
>>> }
>>>
>>> List summaryList = new ArrayList();
>>>
>>> StopWatch stopWatch = new StopWatch("searchStopWatch");
>>>
>>> stopWatch.start();
>>>
>>> List indexSearchers = new ArrayList();
>>>
>>> try {
>>>
>>> LOGGER.debug("Ensuring all index readers are up to date...");
>>>
>>> maybeReopen();
>>>
>>> LOGGER.debug("All Index Searchers are up to date. No of index searchers
>>> '"+ indexSearchers.size() +
>>> "'");
>>>
>>> Query query = queryParser.parse(searchTerm);
>>>
>>> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
>>> query.toString() +"'");
>>>
>>> Sort sort = null;
>>>
>>> sort = applySortIfApplicable(searchRequest);
>>>
>>> Filter[] filters =applyFiltersIfApplicable(searchRequest);
>>>
>>> ChainedFilter chainedFilter = null;
>>>
>>> if (filters != null) {
>>>
>>> chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);
>>>
>>> }
>>>
>>> TopDocs topDocs = get().search(query,chainedFilter ,100,sort);
>>>
>>> ScoreDoc[] scoreDocs = topDocs.scoreDocs;
>>>
>>> LOGGER.debug("total number of hits for [" + query.toString() + " ] =
>>> "+topDocs.
>>> totalHits);
>>>
>>> for (ScoreDoc scoreDoc : scoreDocs) {
>>>
>>> final Document doc = multiSearcher.doc(scoreDoc.doc);
>>>

Re: Faceted Search using Lucene

2009-03-01 Thread Amin Mohammed-Coleman
Sorry...i'm getting slightly confused.
I have a PostConstruct which is where I should create an array of
SearchManagers (per indexSeacher).  From there I initialise the
multisearcher using the get().  After which I need to call maybeReopen for
each IndexSearcher.  So I'll do the following:

@PostConstruct

public void initialiseDocumentSearcher() {

PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper(
analyzer);

analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(),
newKeywordAnalyzer());

queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
analyzerWrapper);

 try {

LOGGER.debug("Initialising multi searcher ");

documentSearcherManagers = new DocumentSearcherManager[searchers.size()];

for (int i = 0; i < searchers.size() ;i++) {

IndexSearcher indexSearcher = searchers.get(i);

Directory directory = indexSearcher.getIndexReader().directory();

DocumentSearcherManager documentSearcherManager =
newDocumentSearcherManager(directory);

documentSearcherManagers[i]=documentSearcherManager;

}

LOGGER.debug("multi searcher initialised");

} catch (IOException e) {

throw new IllegalStateException(e);

}

 }


This initialises search managers.  I then have methods:


 private void maybeReopen() throws Exception {

LOGGER.debug("Initiating reopening of index readers...");

for (DocumentSearcherManager documentSearcherManager :
documentSearcherManagers) {

documentSearcherManager.maybeReopen();

}

 }



 private void release() throws Exception {

for (DocumentSearcherManager documentSearcherManager :
documentSearcherManagers) {

documentSearcherManager.release(documentSearcherManager.get());

}

 }


  private MultiSearcher get() {

List listOfIndexSeachers = new ArrayList();

for (DocumentSearcherManager documentSearcherManager :
documentSearcherManagers) {

listOfIndexSeachers.add(documentSearcherManager.get());

}

try {

multiSearcher = new
MultiSearcher(listOfIndexSeachers.toArray(newIndexSearcher[] {}));

} catch (IOException e) {

throw new IllegalStateException(e);

}

return multiSearcher;

}


These methods are used in the following manner in the search code:


public Summary[] search(final SearchRequest searchRequest)
throwsSearchExecutionException {

final String searchTerm = searchRequest.getSearchTerm();

if (StringUtils.isBlank(searchTerm)) {

throw new SearchExecutionException("Search string cannot be empty. There
will be too many results to process.");

}

List summaryList = new ArrayList();

StopWatch stopWatch = new StopWatch("searchStopWatch");

stopWatch.start();

List indexSearchers = new ArrayList();

try {

LOGGER.debug("Ensuring all index readers are up to date...");

maybeReopen();

LOGGER.debug("All Index Searchers are up to date. No of index searchers '" +
indexSearchers.size() +"'");

 Query query = queryParser.parse(searchTerm);

LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
query.toString() +"'");

 Sort sort = null;

sort = applySortIfApplicable(searchRequest);

 Filter[] filters =applyFiltersIfApplicable(searchRequest);

 ChainedFilter chainedFilter = null;

if (filters != null) {

chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);

}

TopDocs topDocs = get().search(query,chainedFilter ,100,sort);

ScoreDoc[] scoreDocs = topDocs.scoreDocs;

LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs.
totalHits);

 for (ScoreDoc scoreDoc : scoreDocs) {

final Document doc = get().doc(scoreDoc.doc);

float score = scoreDoc.score;

final BaseDocument baseDocument = new BaseDocument(doc, score);

Summary documentSummary = new DocumentSummaryImpl(baseDocument);

summaryList.add(documentSummary);

}

release();

} catch (Exception e) {

throw new IllegalStateException(e);

}

stopWatch.stop();

 LOGGER.debug("total time taken for document seach: " +
stopWatch.getTotalTimeMillis() + " ms");

return summaryList.toArray(new Summary[] {});

}


Does this look better?  Again..I really really appreciate your help!


On Sun, Mar 1, 2009 at 4:18 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> This is not quite right -- you should only create SearcherManager once
> (per Direcotry) at startup/app load, not with every search request.
>
> And I don't see release -- it must call SearcherManager.release of
> each of the IndexSearchers previously returned from get().
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  Hi
>> Thanks again for helping on a Sunday!
>>
>> I have now modified my maybeOpen() to do the following:
>>
>> private void maybeReopen() throws Exception {
>>
>> LOGGER.debug("Initiating reopening of index readers...");
>>
>

Re: Faceted Search using Lucene

2009-03-01 Thread Amin Mohammed-Coleman
Hi
The searchers are injected into the class via Spring.  So when a client
calls the class it is fully configured with a list of index searchers.
 However I have removed this list and instead injecting a list of
directories which are passed to the DocumentSearchManager.
 DocumentSearchManager is SearchManager (should've mentioned that earlier).
 So finally I have modified by release code to do the following:

 private void release(MultiSearcher multiSeacher) throws Exception {

 IndexSearcher[] indexSearchers = (IndexSearcher[])
multiSeacher.getSearchables();

 for(int i =0 ; i < indexSearchers.length;i++) {

 documentSearcherManagers[i].release(indexSearchers[i]);

 }

 }


and it's use looks like this:


public Summary[] search(final SearchRequest searchRequest)
throwsSearchExecutionException {

final String searchTerm = searchRequest.getSearchTerm();

if (StringUtils.isBlank(searchTerm)) {

throw new SearchExecutionException("Search string cannot be empty. There
will be too many results to process.");

}

List summaryList = new ArrayList();

StopWatch stopWatch = new StopWatch("searchStopWatch");

stopWatch.start();

List indexSearchers = new ArrayList();

try {

LOGGER.debug("Ensuring all index readers are up to date...");

maybeReopen();

LOGGER.debug("All Index Searchers are up to date. No of index searchers '" +
indexSearchers.size() +"'");

 Query query = queryParser.parse(searchTerm);

LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
query.toString() +"'");

 Sort sort = null;

sort = applySortIfApplicable(searchRequest);

 Filter[] filters =applyFiltersIfApplicable(searchRequest);

 ChainedFilter chainedFilter = null;

if (filters != null) {

chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);

}

TopDocs topDocs = get().search(query,chainedFilter ,100,sort);

ScoreDoc[] scoreDocs = topDocs.scoreDocs;

LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs.
totalHits);

 for (ScoreDoc scoreDoc : scoreDocs) {

final Document doc = get().doc(scoreDoc.doc);

float score = scoreDoc.score;

final BaseDocument baseDocument = new BaseDocument(doc, score);

Summary documentSummary = new DocumentSummaryImpl(baseDocument);

summaryList.add(documentSummary);

}

} catch (Exception e) {

throw new IllegalStateException(e);

} finally {

release(get());

}

stopWatch.stop();

 LOGGER.debug("total time taken for document seach: " +
stopWatch.getTotalTimeMillis() + " ms");

return summaryList.toArray(new Summary[] {});

}


So the final post construct constructs the DocumentSearchMangers with the
list of directories..looking like this


@PostConstruct

public void initialiseDocumentSearcher() {

PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper(
analyzer);

analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(),
newKeywordAnalyzer());

queryParser = newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
analyzerWrapper);

 try {

LOGGER.debug("Initialising multi searcher ");

documentSearcherManagers = new DocumentSearcherManager[directories.size()];

for (int i = 0; i < directories.size() ;i++) {

Directory directory = directories.get(i);

DocumentSearcherManager documentSearcherManager =
newDocumentSearcherManager(directory);

documentSearcherManagers[i]=documentSearcherManager;

}

LOGGER.debug("multi searcher initialised");

} catch (IOException e) {

throw new IllegalStateException(e);

}

 }



Cheers

Amin



On Sun, Mar 1, 2009 at 6:15 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> I don't understand where searchers comes from, prior to
> initializeDocumentSearcher?  You should, instead, simply create the
> SearcherManager (from your Directory instances).  You don't need any
> searchers during initialize.
>
> Is DocumentSearcherManager the same as SearcherManager (just renamed)?
>
> The release method is wrong -- you're calling .get() and then
> immediately release.  Instead, you should step through the searchers
> from your MultiSearcher and release them to each SearcherManager.
>
> You should call your release() in a finally clause.
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  Sorry...i'm getting slightly confused.
>> I have a PostConstruct which is where I should create an array of
>> SearchManagers (per indexSeacher).  From there I initialise the
>> multisearcher using the get().  After which I need to call maybeReopen for
>> each IndexSearcher.  So I'll do the following:
>>
>> @PostConstruct
>>
>> public void initialiseDocumentSearcher() {
>>
>> PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper(
>> analyzer);
>&

Re: Faceted Search using Lucene

2009-03-02 Thread Amin Mohammed-Coleman
Hi there
Good morning!  Here is the final search code:

public Summary[] search(final SearchRequest searchRequest)
throwsSearchExecutionException {

final String searchTerm = searchRequest.getSearchTerm();

if (StringUtils.isBlank(searchTerm)) {

throw new SearchExecutionException("Search string cannot be empty. There
will be too many results to process.");

}

List summaryList = new ArrayList();

StopWatch stopWatch = new StopWatch("searchStopWatch");

stopWatch.start();

MultiSearcher multiSearcher = null;

try {

LOGGER.debug("Ensuring all index readers are up to date...");

maybeReopen();

 Query query = queryParser.parse(searchTerm);

LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
query.toString() +"'");

 Sort sort = null;

sort = applySortIfApplicable(searchRequest);

 Filter[] filters =applyFiltersIfApplicable(searchRequest);

 ChainedFilter chainedFilter = null;

if (filters != null) {

chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);

}

multiSearcher = get();

TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort);

ScoreDoc[] scoreDocs = topDocs.scoreDocs;

LOGGER.debug("total number of hits for [" + query.toString() + " ] = "+topDocs.
totalHits);

 for (ScoreDoc scoreDoc : scoreDocs) {

final Document doc = multiSearcher.doc(scoreDoc.doc);

float score = scoreDoc.score;

final BaseDocument baseDocument = new BaseDocument(doc, score);

Summary documentSummary = new DocumentSummaryImpl(baseDocument);

summaryList.add(documentSummary);

}

} catch (Exception e) {

throw new IllegalStateException(e);

} finally {

if (multiSearcher != null) {

release(multiSearcher);

}

}

stopWatch.stop();

 LOGGER.debug("total time taken for document seach: " +
stopWatch.getTotalTimeMillis() + " ms");

return summaryList.toArray(new Summary[] {});

}



I hope this makes sense...thanks again!


Cheers

Amin



On Sun, Mar 1, 2009 at 8:09 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> You're calling get() too many times.  For every call to get() you must
> match with a call to release().
>
> So, once at the front of your search method you should:
>
>  MultiSearcher searcher = get();
>
> then use that searcher to do searching, retrieve docs, etc.
>
> Then in the finally clause, pass that searcher to release.
>
> So, only one call to get() and one matching call to release().
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  Hi
>> The searchers are injected into the class via Spring.  So when a client
>> calls the class it is fully configured with a list of index searchers.
>> However I have removed this list and instead injecting a list of
>> directories which are passed to the DocumentSearchManager.
>> DocumentSearchManager is SearchManager (should've mentioned that earlier).
>> So finally I have modified by release code to do the following:
>>
>> private void release(MultiSearcher multiSeacher) throws Exception {
>>
>> IndexSearcher[] indexSearchers = (IndexSearcher[])
>> multiSeacher.getSearchables();
>>
>> for(int i =0 ; i < indexSearchers.length;i++) {
>>
>> documentSearcherManagers[i].release(indexSearchers[i]);
>>
>> }
>>
>> }
>>
>>
>> and it's use looks like this:
>>
>>
>> public Summary[] search(final SearchRequest searchRequest)
>> throwsSearchExecutionException {
>>
>> final String searchTerm = searchRequest.getSearchTerm();
>>
>> if (StringUtils.isBlank(searchTerm)) {
>>
>> throw new SearchExecutionException("Search string cannot be empty. There
>> will be too many results to process.");
>>
>> }
>>
>> List summaryList = new ArrayList();
>>
>> StopWatch stopWatch = new StopWatch("searchStopWatch");
>>
>> stopWatch.start();
>>
>> List indexSearchers = new ArrayList();
>>
>> try {
>>
>> LOGGER.debug("Ensuring all index readers are up to date...");
>>
>> maybeReopen();
>>
>> LOGGER.debug("All Index Searchers are up to date. No of index searchers '"
>> +
>> indexSearchers.size() +"'");
>>
>> Query query = queryParser.parse(searchTerm);
>>
>> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
>> query.toString() +"'");
>>
>> Sort sort = null;
>>
>> sort = applySortIfApplicable(searchRequest);
>>
>> Filter[] filters =applyFiltersIfApplicable(searchRequest);
>>
>> ChainedFilter chainedFilter = null;
>>
>> if 

Re: Faceted Search using Lucene

2009-03-02 Thread Amin Mohammed-Coleman
I noticed that if i do the get() before the maybeReopen then I get no
results.  But otherwise I can change it further.

On Mon, Mar 2, 2009 at 11:46 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> There is no such thing as final code -- code is alive and is always
> changing ;)
>
> It looks good to me.
>
> Though one trivial thing is: I would move the code in the try clause up to
> and including the multiSearcher=get() out above the try.  I always attempt
> to "shrink wrap" what's inside a try clause to the minimum that needs to be
> there.  Ie, your code that creates a query, finds the right sort & filter to
> use, etc, can all happen outside the try, because you have not yet acquired
> the multiSearcher.
>
> If you do that, you also don't need the null check in the finally clause,
> because multiSearcher must be non-null on entering the try.
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  Hi there
>> Good morning!  Here is the final search code:
>>
>> public Summary[] search(final SearchRequest searchRequest)
>> throwsSearchExecutionException {
>>
>> final String searchTerm = searchRequest.getSearchTerm();
>>
>> if (StringUtils.isBlank(searchTerm)) {
>>
>> throw new SearchExecutionException("Search string cannot be empty. There
>> will be too many results to process.");
>>
>> }
>>
>> List summaryList = new ArrayList();
>>
>> StopWatch stopWatch = new StopWatch("searchStopWatch");
>>
>> stopWatch.start();
>>
>> MultiSearcher multiSearcher = null;
>>
>> try {
>>
>> LOGGER.debug("Ensuring all index readers are up to date...");
>>
>> maybeReopen();
>>
>> Query query = queryParser.parse(searchTerm);
>>
>> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
>> query.toString() +"'");
>>
>> Sort sort = null;
>>
>> sort = applySortIfApplicable(searchRequest);
>>
>> Filter[] filters =applyFiltersIfApplicable(searchRequest);
>>
>> ChainedFilter chainedFilter = null;
>>
>> if (filters != null) {
>>
>> chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);
>>
>> }
>>
>> multiSearcher = get();
>>
>> TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort);
>>
>> ScoreDoc[] scoreDocs = topDocs.scoreDocs;
>>
>> LOGGER.debug("total number of hits for [" + query.toString() + " ] =
>> "+topDocs.
>> totalHits);
>>
>> for (ScoreDoc scoreDoc : scoreDocs) {
>>
>> final Document doc = multiSearcher.doc(scoreDoc.doc);
>>
>> float score = scoreDoc.score;
>>
>> final BaseDocument baseDocument = new BaseDocument(doc, score);
>>
>> Summary documentSummary = new DocumentSummaryImpl(baseDocument);
>>
>> summaryList.add(documentSummary);
>>
>> }
>>
>> } catch (Exception e) {
>>
>> throw new IllegalStateException(e);
>>
>> } finally {
>>
>> if (multiSearcher != null) {
>>
>> release(multiSearcher);
>>
>> }
>>
>> }
>>
>> stopWatch.stop();
>>
>> LOGGER.debug("total time taken for document seach: " +
>> stopWatch.getTotalTimeMillis() + " ms");
>>
>> return summaryList.toArray(new Summary[] {});
>>
>> }
>>
>>
>>
>> I hope this makes sense...thanks again!
>>
>>
>> Cheers
>>
>> Amin
>>
>>
>>
>> On Sun, Mar 1, 2009 at 8:09 PM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>
>>> You're calling get() too many times.  For every call to get() you must
>>> match with a call to release().
>>>
>>> So, once at the front of your search method you should:
>>>
>>> MultiSearcher searcher = get();
>>>
>>> then use that searcher to do searching, retrieve docs, etc.
>>>
>>> Then in the finally clause, pass that searcher to release.
>>>
>>> So, only one call to get() and one matching call to release().
>>>
>>> Mike
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>> Hi
>>>
>>>> The searchers are injected into the class via Spring.  So when a client
>>>> calls the class it is fully configured with a list of index searchers.
>>>> However I have removed this list and instead injecting a list of
>>>> directories which a

Re: Faceted Search using Lucene

2009-03-02 Thread Amin Mohammed-Coleman
Nope. If i remove the maybeReopen the search doesn't work.  It only works
when i cal maybeReopen followed by get().

Cheers
Amin

On Mon, Mar 2, 2009 at 12:56 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> That's not right; something must be wrong.
>
> get() before maybeReopen() should simply let you search based on the
> searcher before reopening.
>
> If you just do get() and don't call maybeReopen() does it work?
>
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  I noticed that if i do the get() before the maybeReopen then I get no
>> results.  But otherwise I can change it further.
>>
>> On Mon, Mar 2, 2009 at 11:46 AM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>
>>> There is no such thing as final code -- code is alive and is always
>>> changing ;)
>>>
>>> It looks good to me.
>>>
>>> Though one trivial thing is: I would move the code in the try clause up
>>> to
>>> and including the multiSearcher=get() out above the try.  I always
>>> attempt
>>> to "shrink wrap" what's inside a try clause to the minimum that needs to
>>> be
>>> there.  Ie, your code that creates a query, finds the right sort & filter
>>> to
>>> use, etc, can all happen outside the try, because you have not yet
>>> acquired
>>> the multiSearcher.
>>>
>>> If you do that, you also don't need the null check in the finally clause,
>>> because multiSearcher must be non-null on entering the try.
>>>
>>> Mike
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>> Hi there
>>>
>>>> Good morning!  Here is the final search code:
>>>>
>>>> public Summary[] search(final SearchRequest searchRequest)
>>>> throwsSearchExecutionException {
>>>>
>>>> final String searchTerm = searchRequest.getSearchTerm();
>>>>
>>>> if (StringUtils.isBlank(searchTerm)) {
>>>>
>>>> throw new SearchExecutionException("Search string cannot be empty. There
>>>> will be too many results to process.");
>>>>
>>>> }
>>>>
>>>> List summaryList = new ArrayList();
>>>>
>>>> StopWatch stopWatch = new StopWatch("searchStopWatch");
>>>>
>>>> stopWatch.start();
>>>>
>>>> MultiSearcher multiSearcher = null;
>>>>
>>>> try {
>>>>
>>>> LOGGER.debug("Ensuring all index readers are up to date...");
>>>>
>>>> maybeReopen();
>>>>
>>>> Query query = queryParser.parse(searchTerm);
>>>>
>>>> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
>>>> query.toString() +"'");
>>>>
>>>> Sort sort = null;
>>>>
>>>> sort = applySortIfApplicable(searchRequest);
>>>>
>>>> Filter[] filters =applyFiltersIfApplicable(searchRequest);
>>>>
>>>> ChainedFilter chainedFilter = null;
>>>>
>>>> if (filters != null) {
>>>>
>>>> chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);
>>>>
>>>> }
>>>>
>>>> multiSearcher = get();
>>>>
>>>> TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort);
>>>>
>>>> ScoreDoc[] scoreDocs = topDocs.scoreDocs;
>>>>
>>>> LOGGER.debug("total number of hits for [" + query.toString() + " ] =
>>>> "+topDocs.
>>>> totalHits);
>>>>
>>>> for (ScoreDoc scoreDoc : scoreDocs) {
>>>>
>>>> final Document doc = multiSearcher.doc(scoreDoc.doc);
>>>>
>>>> float score = scoreDoc.score;
>>>>
>>>> final BaseDocument baseDocument = new BaseDocument(doc, score);
>>>>
>>>> Summary documentSummary = new DocumentSummaryImpl(baseDocument);
>>>>
>>>> summaryList.add(documentSummary);
>>>>
>>>> }
>>>>
>>>> } catch (Exception e) {
>>>>
>>>> throw new IllegalStateException(e);
>>>>
>>>> } finally {
>>>>
>>>> if (multiSearcher != null) {
>>>>
>>>> release(multiSearcher);
>>>>
>>>> }
>>>>
>>>> }
>>

Re: Faceted Search using Lucene

2009-03-02 Thread Amin Mohammed-Coleman
In my test case I have a set up method that should populate the indexes
before I start using the document searcher.  I will start adding some more
debug statements.  So basically I should be able to do: get() followed by
maybeReopen.

I will let you know what the outcome is.


Cheers
Amin

On Mon, Mar 2, 2009 at 1:39 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> Is it possible that when you first create the SearcherManager, there is no
> index in each Directory?
>
> If not... you better start adding diagnostics.  EG inside your get(), print
> out the numDocs() of each IndexReader you get from the SearcherManager?
>
> Something is wrong and it's best to explain it...
>
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  Nope. If i remove the maybeReopen the search doesn't work.  It only works
>> when i cal maybeReopen followed by get().
>>
>> Cheers
>> Amin
>>
>> On Mon, Mar 2, 2009 at 12:56 PM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>
>>> That's not right; something must be wrong.
>>>
>>> get() before maybeReopen() should simply let you search based on the
>>> searcher before reopening.
>>>
>>> If you just do get() and don't call maybeReopen() does it work?
>>>
>>>
>>> Mike
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>> I noticed that if i do the get() before the maybeReopen then I get no
>>>
>>>> results.  But otherwise I can change it further.
>>>>
>>>> On Mon, Mar 2, 2009 at 11:46 AM, Michael McCandless <
>>>> luc...@mikemccandless.com> wrote:
>>>>
>>>>
>>>>  There is no such thing as final code -- code is alive and is always
>>>>> changing ;)
>>>>>
>>>>> It looks good to me.
>>>>>
>>>>> Though one trivial thing is: I would move the code in the try clause up
>>>>> to
>>>>> and including the multiSearcher=get() out above the try.  I always
>>>>> attempt
>>>>> to "shrink wrap" what's inside a try clause to the minimum that needs
>>>>> to
>>>>> be
>>>>> there.  Ie, your code that creates a query, finds the right sort &
>>>>> filter
>>>>> to
>>>>> use, etc, can all happen outside the try, because you have not yet
>>>>> acquired
>>>>> the multiSearcher.
>>>>>
>>>>> If you do that, you also don't need the null check in the finally
>>>>> clause,
>>>>> because multiSearcher must be non-null on entering the try.
>>>>>
>>>>> Mike
>>>>>
>>>>> Amin Mohammed-Coleman wrote:
>>>>>
>>>>> Hi there
>>>>>
>>>>>  Good morning!  Here is the final search code:
>>>>>>
>>>>>> public Summary[] search(final SearchRequest searchRequest)
>>>>>> throwsSearchExecutionException {
>>>>>>
>>>>>> final String searchTerm = searchRequest.getSearchTerm();
>>>>>>
>>>>>> if (StringUtils.isBlank(searchTerm)) {
>>>>>>
>>>>>> throw new SearchExecutionException("Search string cannot be empty.
>>>>>> There
>>>>>> will be too many results to process.");
>>>>>>
>>>>>> }
>>>>>>
>>>>>> List summaryList = new ArrayList();
>>>>>>
>>>>>> StopWatch stopWatch = new StopWatch("searchStopWatch");
>>>>>>
>>>>>> stopWatch.start();
>>>>>>
>>>>>> MultiSearcher multiSearcher = null;
>>>>>>
>>>>>> try {
>>>>>>
>>>>>> LOGGER.debug("Ensuring all index readers are up to date...");
>>>>>>
>>>>>> maybeReopen();
>>>>>>
>>>>>> Query query = queryParser.parse(searchTerm);
>>>>>>
>>>>>> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
>>>>>> query.toString() +"'");
>>>>>>
>>>>>> Sort sort = null;
>>>>>>
>>>>>> sort = applySortIfApplicable(searchRequest);
>>>>>>
>>>>>> Filter

Re: Faceted Search using Lucene

2009-03-02 Thread Amin Mohammed-Coleman

Hi

Just out of curiosity does it not make sense to call maybeReopen and  
then call get()? If I call get() then I have a new mulitsearcher, so a  
call to maybeopen won't reinitialise the multi searcher.  Unless I  
pass the multi searcher into the maybereopen method. But somehow that  
doesn't make sense. I maybe missing something here.



Cheers

Amin

On 2 Mar 2009, at 15:48, Amin Mohammed-Coleman  wrote:

I'm seeing some interesting behviour when i do get() first followed  
by maybeReopen then there are no documents in the directory  
(directory that i am interested in.  When i do the maybeReopen and  
then get() then the doc count is correct.  I can post stats later.


Weird...

On Mon, Mar 2, 2009 at 2:17 PM, Amin Mohammed-Coleman > wrote:

oh dear...i think i may cry...i'll debug.


On Mon, Mar 2, 2009 at 2:15 PM, Michael McCandless > wrote:


Or even just get() with no call to maybeReopen().  That should work  
fine as well.



Mike

Amin Mohammed-Coleman wrote:

In my test case I have a set up method that should populate the  
indexes
before I start using the document searcher.  I will start adding  
some more
debug statements.  So basically I should be able to do: get()  
followed by

maybeReopen.

I will let you know what the outcome is.


Cheers
Amin

On Mon, Mar 2, 2009 at 1:39 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:


Is it possible that when you first create the SearcherManager, there  
is no

index in each Directory?

If not... you better start adding diagnostics.  EG inside your  
get(), print
out the numDocs() of each IndexReader you get from the  
SearcherManager?


Something is wrong and it's best to explain it...


Mike

Amin Mohammed-Coleman wrote:

Nope. If i remove the maybeReopen the search doesn't work.  It only  
works

when i cal maybeReopen followed by get().

Cheers
Amin

On Mon, Mar 2, 2009 at 12:56 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:


That's not right; something must be wrong.

get() before maybeReopen() should simply let you search based on the
searcher before reopening.

If you just do get() and don't call maybeReopen() does it work?


Mike

Amin Mohammed-Coleman wrote:

I noticed that if i do the get() before the maybeReopen then I get no

results.  But otherwise I can change it further.

On Mon, Mar 2, 2009 at 11:46 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:


There is no such thing as final code -- code is alive and is always
changing ;)

It looks good to me.

Though one trivial thing is: I would move the code in the try clause  
up

to
and including the multiSearcher=get() out above the try.  I always
attempt
to "shrink wrap" what's inside a try clause to the minimum that needs
to
be
there.  Ie, your code that creates a query, finds the right sort &
filter
to
use, etc, can all happen outside the try, because you have not yet
acquired
the multiSearcher.

If you do that, you also don't need the null check in the finally
clause,
because multiSearcher must be non-null on entering the try.

Mike

Amin Mohammed-Coleman wrote:

Hi there

Good morning!  Here is the final search code:

public Summary[] search(final SearchRequest searchRequest)
throwsSearchExecutionException {

final String searchTerm = searchRequest.getSearchTerm();

if (StringUtils.isBlank(searchTerm)) {

throw new SearchExecutionException("Search string cannot be empty.
There
will be too many results to process.");

}

List summaryList = new ArrayList();

StopWatch stopWatch = new StopWatch("searchStopWatch");

stopWatch.start();

MultiSearcher multiSearcher = null;

try {

LOGGER.debug("Ensuring all index readers are up to date...");

maybeReopen();

Query query = queryParser.parse(searchTerm);

LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
query.toString() +"'");

Sort sort = null;

sort = applySortIfApplicable(searchRequest);

Filter[] filters =applyFiltersIfApplicable(searchRequest);

ChainedFilter chainedFilter = null;

if (filters != null) {

chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);

}

multiSearcher = get();

TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort);

ScoreDoc[] scoreDocs = topDocs.scoreDocs;

LOGGER.debug("total number of hits for [" + query.toString() + " ] =
"+topDocs.
totalHits);

for (ScoreDoc scoreDoc : scoreDocs) {

final Document doc = multiSearcher.doc(scoreDoc.doc);

float score = scoreDoc.score;

final BaseDocument baseDocument = new BaseDocument(doc, score);

Summary documentSummary = new DocumentSummaryImpl(baseDocument);

summaryList.add(documentSummary);

}

} catch (Exception e) {

throw new IllegalStateException(e);

} finally {

if (multiSearcher != null) {

release(multiSearcher);

}

}

stopWatch.stop();

LOGGER.debug("total time taken for document seach:

Re: Faceted Search using Lucene

2009-03-02 Thread Amin Mohammed-Coleman
queries pay the reopen/warming cost).
>
> If you call maybeReopen() after get(), then that search will not see the
> newly opened readers, but the next search will.
>
> I'm just thinking that since you see no results with get() alone, debug
> that case first.  Then put back the maybeReopen().
>
> Can you post your full code at this point?
>
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  Hi
>>
>> Just out of curiosity does it not make sense to call maybeReopen and then
>> call get()? If I call get() then I have a new mulitsearcher, so a call to
>> maybeopen won't reinitialise the multi searcher.  Unless I pass the multi
>> searcher into the maybereopen method. But somehow that doesn't make sense. I
>> maybe missing something here.
>>
>>
>> Cheers
>>
>> Amin
>>
>> On 2 Mar 2009, at 15:48, Amin Mohammed-Coleman  wrote:
>>
>>  I'm seeing some interesting behviour when i do get() first followed by
>>> maybeReopen then there are no documents in the directory (directory that i
>>> am interested in.  When i do the maybeReopen and then get() then the doc
>>> count is correct.  I can post stats later.
>>>
>>> Weird...
>>>
>>> On Mon, Mar 2, 2009 at 2:17 PM, Amin Mohammed-Coleman 
>>> wrote:
>>> oh dear...i think i may cry...i'll debug.
>>>
>>>
>>> On Mon, Mar 2, 2009 at 2:15 PM, Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>> Or even just get() with no call to maybeReopen().  That should work fine
>>> as well.
>>>
>>>
>>> Mike
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>> In my test case I have a set up method that should populate the indexes
>>> before I start using the document searcher.  I will start adding some
>>> more
>>> debug statements.  So basically I should be able to do: get() followed by
>>> maybeReopen.
>>>
>>> I will let you know what the outcome is.
>>>
>>>
>>> Cheers
>>> Amin
>>>
>>> On Mon, Mar 2, 2009 at 1:39 PM, Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>>
>>> Is it possible that when you first create the SearcherManager, there is
>>> no
>>> index in each Directory?
>>>
>>> If not... you better start adding diagnostics.  EG inside your get(),
>>> print
>>> out the numDocs() of each IndexReader you get from the SearcherManager?
>>>
>>> Something is wrong and it's best to explain it...
>>>
>>>
>>> Mike
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>> Nope. If i remove the maybeReopen the search doesn't work.  It only works
>>> when i cal maybeReopen followed by get().
>>>
>>> Cheers
>>> Amin
>>>
>>> On Mon, Mar 2, 2009 at 12:56 PM, Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>>
>>> That's not right; something must be wrong.
>>>
>>> get() before maybeReopen() should simply let you search based on the
>>> searcher before reopening.
>>>
>>> If you just do get() and don't call maybeReopen() does it work?
>>>
>>>
>>> Mike
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>> I noticed that if i do the get() before the maybeReopen then I get no
>>>
>>> results.  But otherwise I can change it further.
>>>
>>> On Mon, Mar 2, 2009 at 11:46 AM, Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>>
>>> There is no such thing as final code -- code is alive and is always
>>> changing ;)
>>>
>>> It looks good to me.
>>>
>>> Though one trivial thing is: I would move the code in the try clause up
>>> to
>>> and including the multiSearcher=get() out above the try.  I always
>>> attempt
>>> to "shrink wrap" what's inside a try clause to the minimum that needs
>>> to
>>> be
>>> there.  Ie, your code that creates a query, finds the right sort &
>>> filter
>>> to
>>> use, etc, can all happen outside the try, because you have not yet
>>> acquired
>>> the multiSearcher.
>>>
>>> If you do that, you also don't need the null check in the finally
>>> clause,
>>> because multiSearcher must be non-null on entering the t

Re: Faceted Search using Lucene

2009-03-02 Thread Amin Mohammed-Coleman
I think that is the case.  When my SearchManager is initialised the
directories are empty so when I do a get() nothing is present.  Subsequent
calls seem to work.  Is there something I can do? or do I accept this or
just do a maybeReopen and do a get().  As you mentioned it depends on
timiing but I would be keen to know what the best practice would be in this
situation...

Cheers

On Mon, Mar 2, 2009 at 8:43 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> Well the code looks fine.
>
> I can't explain why you see no search results if you don't call
> maybeReopen() in get, unless at the time you first create SearcherManager
> the Directories each have an empty index in them.
>
> Mike
>
> Amin Mohammed-Coleman wrote:
>
>  Hi
>> Here is the code that I am using, I've modified the get() method to
>> include
>> the maybeReopen() call.  Again I'm not sure if this is a good idea.
>>
>> public Summary[] search(final SearchRequest searchRequest)
>> throwsSearchExecutionException {
>>
>> final String searchTerm = searchRequest.getSearchTerm();
>>
>> if (StringUtils.isBlank(searchTerm)) {
>>
>> throw new SearchExecutionException("Search string cannot be empty. There
>> will be too many results to process.");
>>
>> }
>>
>> List summaryList = new ArrayList();
>>
>> StopWatch stopWatch = new StopWatch("searchStopWatch");
>>
>> stopWatch.start();
>>
>> MultiSearcher multiSearcher = get();
>>
>> try {
>>
>> LOGGER.debug("Ensuring all index readers are up to date...");
>>
>> Query query = queryParser.parse(searchTerm);
>>
>> LOGGER.debug("Search Term '" + searchTerm +"' > Lucene Query '" +
>> query.toString() +"'");
>>
>> Sort sort = null;
>>
>> sort = applySortIfApplicable(searchRequest);
>>
>> Filter[] filters =applyFiltersIfApplicable(searchRequest);
>>
>> ChainedFilter chainedFilter = null;
>>
>> if (filters != null) {
>>
>> chainedFilter = new ChainedFilter(filters, ChainedFilter.OR);
>>
>> }
>>
>> TopDocs topDocs = multiSearcher.search(query,chainedFilter ,100,sort);
>>
>> ScoreDoc[] scoreDocs = topDocs.scoreDocs;
>>
>> LOGGER.debug("total number of hits for [" + query.toString() + " ] =
>> "+topDocs.
>> totalHits);
>>
>> for (ScoreDoc scoreDoc : scoreDocs) {
>>
>> final Document doc = multiSearcher.doc(scoreDoc.doc);
>>
>> float score = scoreDoc.score;
>>
>> final BaseDocument baseDocument = new BaseDocument(doc, score);
>>
>> Summary documentSummary = new DocumentSummaryImpl(baseDocument);
>>
>> summaryList.add(documentSummary);
>>
>> }
>>
>> } catch (Exception e) {
>>
>> throw new IllegalStateException(e);
>>
>> } finally {
>>
>> if (multiSearcher != null) {
>>
>> release(multiSearcher);
>>
>> }
>>
>> }
>>
>> stopWatch.stop();
>>
>> LOGGER.debug("total time taken for document seach: " +
>> stopWatch.getTotalTimeMillis() + " ms");
>>
>> return summaryList.toArray(new Summary[] {});
>>
>> }
>>
>>
>> @Autowired
>>
>> public void setDirectories(@Qualifier("directories")ListFactoryBean
>> listFactoryBean) throws Exception {
>>
>> this.directories = (List) listFactoryBean.getObject();
>>
>> }
>>
>>  @PostConstruct
>>
>> public void initialiseDocumentSearcher() {
>>
>> StopWatch stopWatch = new StopWatch("document-search-initialiser");
>>
>> stopWatch.start();
>>
>> PerFieldAnalyzerWrapper analyzerWrapper = new PerFieldAnalyzerWrapper(
>> analyzer);
>>
>> analyzerWrapper.addAnalyzer(FieldNameEnum.TYPE.getDescription(),
>> newKeywordAnalyzer());
>>
>> queryParser =
>> newMultiFieldQueryParser(FieldNameEnum.fieldNameDescriptions(),
>> analyzerWrapper);
>>
>> try {
>>
>> LOGGER.debug("Initialising document searcher ");
>>
>> documentSearcherManagers = new
>> DocumentSearcherManager[directories.size()];
>>
>> for (int i = 0; i < directories.size() ;i++) {
>>
>> Directory directory = directories.get(i);
>>
>> DocumentSearcherManager documentSearcherManager =
>> newDocumentSearcherManager(directory);
>>
>> documentSearcherManagers[i]=documentSearcherManager;
>>
&g

Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
Hi
I am currently indexing documents (pdf, ms word, etc) that are uploaded,
these documents can be searched and what the search returns to the user are
summaries of the documents.  Currently the summaries are extracted when
indexing the file (summary constructed by taking the first 10 lines of the
document and stored in the index as field).  This is not ideal (static
summary), and I was wondering if it would be possible to create a dynamic
summary when a hit is found and highlight the terms found.  The content of
the document is not stored in the index.

So basically what I'm looking to do is:

1) PDF indexed
2) PDF body contains the word "search"
3) Do a search and return the hit
4) Construct a summary with the term "search" included.

I'm not sure how to go about doing this (I presume it is possible).  I would
be grateful for any advice.


Cheers
Amin


Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
hi
that's what i was thinking about.  i would need to get the file and extract
the text again and then pass through the highlighter.  The other option is
storing the content in the index the downside being index is going to be
large.  Which would be the recommended approach?

Cheers

Amin

On Sat, Mar 7, 2009 at 10:50 AM, Erik Hatcher wrote:

> With the caveat that if you're not storing the text you want highlighted,
> you'll have to retrieve it somehow and send it into the Highlighter
> yourself.
>
>Erik
>
>
> On Mar 7, 2009, at 5:40 AM, Michael McCandless wrote:
>
>
>> You should look at contrib/highlighter, which does exactly this.
>>
>> Mike
>>
>> Amin Mohammed-Coleman wrote:
>>
>>  Hi
>>> I am currently indexing documents (pdf, ms word, etc) that are uploaded,
>>> these documents can be searched and what the search returns to the user
>>> are
>>> summaries of the documents.  Currently the summaries are extracted when
>>> indexing the file (summary constructed by taking the first 10 lines of
>>> the
>>> document and stored in the index as field).  This is not ideal (static
>>> summary), and I was wondering if it would be possible to create a dynamic
>>> summary when a hit is found and highlight the terms found.  The content
>>> of
>>> the document is not stored in the index.
>>>
>>> So basically what I'm looking to do is:
>>>
>>> 1) PDF indexed
>>> 2) PDF body contains the word "search"
>>> 3) Do a search and return the hit
>>> 4) Construct a summary with the term "search" included.
>>>
>>> I'm not sure how to go about doing this (I presume it is possible).  I
>>> would
>>> be grateful for any advice.
>>>
>>>
>>> Cheers
>>> Amin
>>>
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
cool.  i will use compression and store in index. is there anything special
i need to for decompressing the text? i presume i can just do
doc.get("content")?
thanks for your advice all!

On Sat, Mar 7, 2009 at 11:50 AM, Uwe Schindler  wrote:

> You could store the text contents compressed; I think extracting text from
> PDF files is much more time-intensive than decompressing a stored field.
> And
> text-only contents often compress very good. In my opinion, if the
> (uncompressed) contents of the docs are not very large (so I mean several
> megabytes each), I would prefer storing it in index.
>
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
> > -Original Message-
> > From: Erik Hatcher [mailto:e...@ehatchersolutions.com]
> > Sent: Saturday, March 07, 2009 12:46 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene Highlighting and Dynamic Summaries
> >
> > It depends :)
> >
> > It's a trade-off.  If storing is not prohibitive, I recommend that as
> > it makes life easier for highlighting.
> >
> >   Erik
> >
> > On Mar 7, 2009, at 6:37 AM, Amin Mohammed-Coleman wrote:
> >
> > > hi
> > > that's what i was thinking about.  i would need to get the file and
> > > extract
> > > the text again and then pass through the highlighter.  The other
> > > option is
> > > storing the content in the index the downside being index is going
> > > to be
> > > large.  Which would be the recommended approach?
> > >
> > > Cheers
> > >
> > > Amin
> > >
> > > On Sat, Mar 7, 2009 at 10:50 AM, Erik Hatcher
> >  > > >wrote:
> > >
> > >> With the caveat that if you're not storing the text you want
> > >> highlighted,
> > >> you'll have to retrieve it somehow and send it into the Highlighter
> > >> yourself.
> > >>
> > >>   Erik
> > >>
> > >>
> > >> On Mar 7, 2009, at 5:40 AM, Michael McCandless wrote:
> > >>
> > >>
> > >>> You should look at contrib/highlighter, which does exactly this.
> > >>>
> > >>> Mike
> > >>>
> > >>> Amin Mohammed-Coleman wrote:
> > >>>
> > >>> Hi
> > >>>> I am currently indexing documents (pdf, ms word, etc) that are
> > >>>> uploaded,
> > >>>> these documents can be searched and what the search returns to
> > >>>> the user
> > >>>> are
> > >>>> summaries of the documents.  Currently the summaries are
> > >>>> extracted when
> > >>>> indexing the file (summary constructed by taking the first 10
> > >>>> lines of
> > >>>> the
> > >>>> document and stored in the index as field).  This is not ideal
> > >>>> (static
> > >>>> summary), and I was wondering if it would be possible to create a
> > >>>> dynamic
> > >>>> summary when a hit is found and highlight the terms found.  The
> > >>>> content
> > >>>> of
> > >>>> the document is not stored in the index.
> > >>>>
> > >>>> So basically what I'm looking to do is:
> > >>>>
> > >>>> 1) PDF indexed
> > >>>> 2) PDF body contains the word "search"
> > >>>> 3) Do a search and return the hit
> > >>>> 4) Construct a summary with the term "search" included.
> > >>>>
> > >>>> I'm not sure how to go about doing this (I presume it is
> > >>>> possible).  I
> > >>>> would
> > >>>> be grateful for any advice.
> > >>>>
> > >>>>
> > >>>> Cheers
> > >>>> Amin
> > >>>>
> > >>>
> > >>>
> > >>> -
> > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >>>
> > >>
> > >>
> > >> -
> > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >>
> > >>
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
Thanks!  The final piece that I needed to do for the project!
Cheers

Amin

On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler  wrote:

> > cool.  i will use compression and store in index. is there anything
> > special
> > i need to for decompressing the text? i presume i can just do
> > doc.get("content")?
> > thanks for your advice all!
>
> No just use Field.Store.COMPRESS when adding to index and Document.get()
> when fetching. The decompression is automatically done.
>
> You may think, why not enable compression for all fields? The case is, that
> this is an overhead for very small and short fields. So you should only use
> it for large contents (it's the same like compressing very small files as
> ZIP/GZIP: These files mostly get larger than without compression).
>
> Uwe
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene Highlighting and Dynamic Summaries

2009-03-07 Thread Amin Mohammed-Coleman
Hi
Got it working!  Thanks again for your help!


Amin

On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman wrote:

> Thanks!  The final piece that I needed to do for the project!
> Cheers
>
> Amin
>
> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler  wrote:
>
>> > cool.  i will use compression and store in index. is there anything
>> > special
>> > i need to for decompressing the text? i presume i can just do
>> > doc.get("content")?
>> > thanks for your advice all!
>>
>> No just use Field.Store.COMPRESS when adding to index and Document.get()
>> when fetching. The decompression is automatically done.
>>
>> You may think, why not enable compression for all fields? The case is,
>> that
>> this is an overhead for very small and short fields. So you should only
>> use
>> it for large contents (it's the same like compressing very small files as
>> ZIP/GZIP: These files mostly get larger than without compression).
>>
>> Uwe
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>


Re: Lucene Highlighting and Dynamic Summaries

2009-03-09 Thread Amin Mohammed-Coleman
Hi
I am seeing some strange behaviour with the highlighter and I'm wondering if
anyone else is experiencing this.  In certain instances I don't get a
summary being generated.  I perform the search and the search returns the
correct document.  I can see that the lucene document contains the text in
the field.  However after doing:

SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("", "");

//required for highlighting

Query query2 = multiSearcher.rewrite(query);

Highlighter highlighter = new Highlighter(simpleHTMLFormatter,
newQueryScorer(query2));

...

String text= doc.get(FieldNameEnum.BODY.getDescription());

TokenStream tokenStream = analyzer
.tokenStream(FieldNameEnum.BODY.getDescription(), new StringReader(text));

String result = highlighter.getBestFragments(tokenStream,
text, 3, "...");




the string result is empty.  This is very strange, if i try a different term
that exists in the document then I get a summary.  For example I have a word
document that contains the term "document" and "aspectj".  If I search for
"document" I get the correct document but no highlighted summary.  However
if I search using "aspectj" I get the same doucment with highlighted
summary.


Just to mentioned I do rewrite the original query before performing the
highlighting.


I'm not sure what i'm missing here.  Any help would be appreciated.


Cheers

Amin

On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman wrote:

> Hi
> Got it working!  Thanks again for your help!
>
>
> Amin
>
>
> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman 
> wrote:
>
>> Thanks!  The final piece that I needed to do for the project!
>> Cheers
>>
>> Amin
>>
>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler  wrote:
>>
>>> > cool.  i will use compression and store in index. is there anything
>>> > special
>>> > i need to for decompressing the text? i presume i can just do
>>> > doc.get("content")?
>>> > thanks for your advice all!
>>>
>>> No just use Field.Store.COMPRESS when adding to index and Document.get()
>>> when fetching. The decompression is automatically done.
>>>
>>> You may think, why not enable compression for all fields? The case is,
>>> that
>>> this is an overhead for very small and short fields. So you should only
>>> use
>>> it for large contents (it's the same like compressing very small files as
>>> ZIP/GZIP: These files mostly get larger than without compression).
>>>
>>> Uwe
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>
>


Re: Lucene Highlighting and Dynamic Summaries

2009-03-11 Thread Amin Mohammed-Coleman

Hi

Apologies for re sending this mail. Just wondering if anyone has  
experienced the below. I'm not sure if this could happen due nature of  
document. It does seem strange one term search returns summary while  
another does not even though same document is being returned.


I'm asking this so I can code around this if is normal.


Apologies again for re sending this mail

Cheers

Amin

Sent from my iPhone

On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman  wrote:


Hi

I am seeing some strange behaviour with the highlighter and I'm  
wondering if anyone else is experiencing this.  In certain instances  
I don't get a summary being generated.  I perform the search and the  
search returns the correct document.  I can see that the lucene  
document contains the text in the field.  However after doing:


	SimpleHTMLFormatter simpleHTMLFormatter = new  
SimpleHTMLFormatter("", "");

//required for highlighting
Query query2 = multiSearcher.rewrite(query);
			Highlighter highlighter = new Highlighter(simpleHTMLFormatter,  
new QueryScorer(query2));

...

String text= doc.get(FieldNameEnum.BODY.getDescription());
TokenStream tokenStream =  
analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new  
StringReader(text));
String result =  
highlighter.getBestFragments(tokenStream, text, 3, "...");



the string result is empty.  This is very strange, if i try a  
different term that exists in the document then I get a summary.   
For example I have a word document that contains the term "document"  
and "aspectj".  If I search for "document" I get the correct  
document but no highlighted summary.  However if I search using  
"aspectj" I get the same doucment with highlighted summary.


Just to mentioned I do rewrite the original query before performing  
the highlighting.


I'm not sure what i'm missing here.  Any help would be appreciated.

Cheers
Amin

On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman > wrote:

Hi

Got it working!  Thanks again for your help!


Amin


On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman > wrote:

Thanks!  The final piece that I needed to do for the project!

Cheers

Amin

On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler   
wrote:

> cool.  i will use compression and store in index. is there anything
> special
> i need to for decompressing the text? i presume i can just do
> doc.get("content")?
> thanks for your advice all!

No just use Field.Store.COMPRESS when adding to index and  
Document.get()

when fetching. The decompression is automatically done.

You may think, why not enable compression for all fields? The case  
is, that
this is an overhead for very small and short fields. So you should  
only use
it for large contents (it's the same like compressing very small  
files as

ZIP/GZIP: These files mostly get larger than without compression).

Uwe


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org






Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
Hi
Please find attadched a test case plus a document.  Just to mention this
occurs sometimes for other files.


Cheers
Amin

On Wed, Mar 11, 2009 at 6:11 PM, markharw00d wrote:

> If you can supply a Junit test that recreates the problem I think we can
> start to make progress on this.
>
>
>
> Amin Mohammed-Coleman wrote:
>
>> Hi
>>
>> Apologies for re sending this mail. Just wondering if anyone has
>> experienced the below. I'm not sure if this could happen due nature of
>> document. It does seem strange one term search returns summary while another
>> does not even though same document is being returned.
>>
>> I'm asking this so I can code around this if is normal.
>>
>>
>> Apologies again for re sending this mail
>>
>> Cheers
>>
>> Amin
>>
>> Sent from my iPhone
>>
>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman  wrote:
>>
>>  Hi
>>>
>>> I am seeing some strange behaviour with the highlighter and I'm wondering
>>> if anyone else is experiencing this.  In certain instances I don't get a
>>> summary being generated.  I perform the search and the search returns the
>>> correct document.  I can see that the lucene document contains the text in
>>> the field.  However after doing:
>>>
>>>SimpleHTMLFormatter simpleHTMLFormatter = new
>>> SimpleHTMLFormatter("", "");
>>>//required for highlighting
>>>Query query2 = multiSearcher.rewrite(query);
>>>Highlighter highlighter = new Highlighter(simpleHTMLFormatter,
>>> new QueryScorer(query2));
>>> ...
>>>
>>> String text= doc.get(FieldNameEnum.BODY.getDescription());
>>>TokenStream tokenStream =
>>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
>>> StringReader(text));
>>>String result = highlighter.getBestFragments(tokenStream,
>>> text, 3, "...");
>>>
>>>
>>> the string result is empty.  This is very strange, if i try a different
>>> term that exists in the document then I get a summary.  For example I have a
>>> word document that contains the term "document" and "aspectj".  If I search
>>> for "document" I get the correct document but no highlighted summary.
>>>  However if I search using "aspectj" I get the same doucment with
>>> highlighted summary.
>>>
>>> Just to mentioned I do rewrite the original query before performing the
>>> highlighting.
>>>
>>> I'm not sure what i'm missing here.  Any help would be appreciated.
>>>
>>> Cheers
>>> Amin
>>>
>>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman 
>>> wrote:
>>> Hi
>>>
>>> Got it working!  Thanks again for your help!
>>>
>>>
>>> Amin
>>>
>>>
>>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman 
>>> wrote:
>>> Thanks!  The final piece that I needed to do for the project!
>>>
>>> Cheers
>>>
>>> Amin
>>>
>>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler  wrote:
>>> > cool.  i will use compression and store in index. is there anything
>>> > special
>>> > i need to for decompressing the text? i presume i can just do
>>> > doc.get("content")?
>>> > thanks for your advice all!
>>>
>>> No just use Field.Store.COMPRESS when adding to index and Document.get()
>>> when fetching. The decompression is automatically done.
>>>
>>> You may think, why not enable compression for all fields? The case is,
>>> that
>>> this is an overhead for very small and short fields. So you should only
>>> use
>>> it for large contents (it's the same like compressing very small files as
>>> ZIP/GZIP: These files mostly get larger than without compression).
>>>
>>> Uwe
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>>
>>>
>>>
>> 
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database:
>> 270.11.10/1995 - Release Date: 03/11/09 08:28:00
>>
>>
>>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
Hi

Did both attachments not come through?

Cheers
Amin

On Thu, Mar 12, 2009 at 9:52 AM, mark harwood wrote:

> The attachment didn't make it through here. Can you add it as an attachment
> to a new JIRA issue?
>
> Thanks,
> Mark
>
>
>
>
>
> ________
> From: Amin Mohammed-Coleman 
> To: java-user@lucene.apache.org
> Sent: Thursday, 12 March, 2009 7:47:20
> Subject: Re: Lucene Highlighting and Dynamic Summaries
>
> Hi
>
> Please find attadched a test case plus a document.  Just to mention this
> occurs sometimes for other files.
>
>
> Cheers
> Amin
>
>
> On Wed, Mar 11, 2009 at 6:11 PM, markharw00d 
> wrote:
>
> If you can supply a Junit test that recreates the problem I think we can
> start to make progress on this.
>
>
>
> Amin Mohammed-Coleman wrote:
>
> Hi
>
> Apologies for re sending this mail. Just wondering if anyone has
> experienced the below.. I'm not sure if this could happen due nature of
> document. It does seem strange one term search returns summary while another
> does not even though same document is being returned.
>
> I'm asking this so I can code around this if is normal.
>
>
> Apologies again for re sending this mail
>
> Cheers
>
> Amin
>
> Sent from my iPhone
>
> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman  wrote:
>
>
> Hi
>
> I am seeing some strange behaviour with the highlighter and I'm wondering
> if anyone else is experiencing this.  In certain instances I don't get a
> summary being generated.  I perform the search and the search returns the
> correct document.  I can see that the lucene document contains the text in
> the field.  However after doing:
>
>   SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter(" class=\"highlight\">", "");
>   //required for highlighting
>   Query query2 = multiSearcher.rewrite(query);
>   Highlighter highlighter = new Highlighter(simpleHTMLFormatter,
> new QueryScorer(query2));
> ...
>
> String text= doc.get(FieldNameEnum.BODY.getDescription());
>   TokenStream tokenStream =
> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
> StringReader(text));
>   String result = highlighter.getBestFragments(tokenStream,
> text, 3, "...");
>
>
> the string result is empty.  This is very strange, if i try a different
> term that exists in the document then I get a summary.  For example I have a
> word document that contains the term "document" and "aspectj".  If I search
> for "document" I get the correct document but no highlighted summary.
>  However if I search using "aspectj" I get the same doucment with
> highlighted summary.
>
> Just to mentioned I do rewrite the original query before performing the
> highlighting.
>
> I'm not sure what i'm missing here.  Any help would be appreciated.
>
> Cheers
> Amin
>
> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman 
> wrote:
> Hi
>
> Got it working!  Thanks again for your help!
>
>
> Amin
>
>
> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman 
> wrote:
> Thanks!  The final piece that I needed to do for the project!
>
> Cheers
>
> Amin
>
> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler  wrote:
> > cool.  i will use compression and store in index. is there anything
> > special
> > i need to for decompressing the text? i presume i can just do
> > doc.get("content")?
> > thanks for your advice all!
>
> No just use Field.Store.COMPRESS when adding to index and Document.get()
> when fetching. The decompression is automatically done.
>
> You may think, why not enable compression for all fields? The case is, that
> this is an overhead for very small and short fields. So you should only use
> it for large contents (it's the same like compressing very small files as
> ZIP/GZIP: These files mostly get larger than without compression).
>
> Uwe
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
>
>
>
> 
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database:
> 270.11.10/1995 - Release Date: 03/11/09 08:28:00
>
>
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>
>
>


Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
JIRA raised:

https://issues.apache.org/jira/browse/LUCENE-1559

Thanks

On Thu, Mar 12, 2009 at 11:29 AM, Amin Mohammed-Coleman wrote:

> Hi
>
> Did both attachments not come through?
>
> Cheers
> Amin
>
>
> On Thu, Mar 12, 2009 at 9:52 AM, mark harwood wrote:
>
>> The attachment didn't make it through here. Can you add it as an
>> attachment to a new JIRA issue?
>>
>> Thanks,
>> Mark
>>
>>
>>
>>
>>
>> 
>> From: Amin Mohammed-Coleman 
>> To: java-user@lucene.apache.org
>> Sent: Thursday, 12 March, 2009 7:47:20
>> Subject: Re: Lucene Highlighting and Dynamic Summaries
>>
>> Hi
>>
>> Please find attadched a test case plus a document.  Just to mention this
>> occurs sometimes for other files.
>>
>>
>> Cheers
>> Amin
>>
>>
>> On Wed, Mar 11, 2009 at 6:11 PM, markharw00d 
>> wrote:
>>
>> If you can supply a Junit test that recreates the problem I think we can
>> start to make progress on this.
>>
>>
>>
>> Amin Mohammed-Coleman wrote:
>>
>> Hi
>>
>> Apologies for re sending this mail. Just wondering if anyone has
>> experienced the below.. I'm not sure if this could happen due nature of
>> document. It does seem strange one term search returns summary while another
>> does not even though same document is being returned.
>>
>> I'm asking this so I can code around this if is normal.
>>
>>
>> Apologies again for re sending this mail
>>
>> Cheers
>>
>> Amin
>>
>> Sent from my iPhone
>>
>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman  wrote:
>>
>>
>> Hi
>>
>> I am seeing some strange behaviour with the highlighter and I'm wondering
>> if anyone else is experiencing this.  In certain instances I don't get a
>> summary being generated.  I perform the search and the search returns the
>> correct document.  I can see that the lucene document contains the text in
>> the field.  However after doing:
>>
>>   SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("> class=\"highlight\">", "");
>>   //required for highlighting
>>   Query query2 = multiSearcher.rewrite(query);
>>   Highlighter highlighter = new Highlighter(simpleHTMLFormatter,
>> new QueryScorer(query2));
>> ...
>>
>> String text= doc.get(FieldNameEnum.BODY.getDescription());
>>   TokenStream tokenStream =
>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
>> StringReader(text));
>>   String result = highlighter.getBestFragments(tokenStream,
>> text, 3, "...");
>>
>>
>> the string result is empty.  This is very strange, if i try a different
>> term that exists in the document then I get a summary.  For example I have a
>> word document that contains the term "document" and "aspectj".  If I search
>> for "document" I get the correct document but no highlighted summary.
>>  However if I search using "aspectj" I get the same doucment with
>> highlighted summary.
>>
>> Just to mentioned I do rewrite the original query before performing the
>> highlighting.
>>
>> I'm not sure what i'm missing here.  Any help would be appreciated.
>>
>> Cheers
>> Amin
>>
>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman 
>> wrote:
>> Hi
>>
>> Got it working!  Thanks again for your help!
>>
>>
>> Amin
>>
>>
>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman 
>> wrote:
>> Thanks!  The final piece that I needed to do for the project!
>>
>> Cheers
>>
>> Amin
>>
>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler  wrote:
>> > cool.  i will use compression and store in index. is there anything
>> > special
>> > i need to for decompressing the text? i presume i can just do
>> > doc.get("content")?
>> > thanks for your advice all!
>>
>> No just use Field.Store.COMPRESS when adding to index and Document.get()
>> when fetching. The decompression is automatically done.
>>
>> You may think, why not enable compression for all fields? The case is,
>> that
>> this is an overhead for very small and short fields. So you should only
>> use
>> it for large contents (it's the same like compressing very small files as
>> ZIP/GZIP: These files m

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman

Hi

I have found that it is not issue with POI. I extracted text using PoI  
but differenlty and the term is extracted properly.  When I store the  
text and retrieve it the term exists. However running the text through  
highlighter doesn't work


I will post test case with plain text file on JIRA. Currently on a  
cramped train!


Cheers


On 11 Mar 2009, at 18:11, markharw00d  wrote:

If you can supply a Junit test that recreates the problem I think we  
can start to make progress on this.




Amin Mohammed-Coleman wrote:

Hi

Apologies for re sending this mail. Just wondering if anyone has  
experienced the below. I'm not sure if this could happen due nature  
of document. It does seem strange one term search returns summary  
while another does not even though same document is being returned.


I'm asking this so I can code around this if is normal.


Apologies again for re sending this mail

Cheers

Amin

Sent from my iPhone

On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman   
wrote:



Hi

I am seeing some strange behaviour with the highlighter and I'm  
wondering if anyone else is experiencing this.  In certain  
instances I don't get a summary being generated.  I perform the  
search and the search returns the correct document.  I can see  
that the lucene document contains the text in the field.  However  
after doing:


   SimpleHTMLFormatter simpleHTMLFormatter = new  
SimpleHTMLFormatter("", "");

   //required for highlighting
   Query query2 = multiSearcher.rewrite(query);
   Highlighter highlighter = new  
Highlighter(simpleHTMLFormatter, new QueryScorer(query2));

...

String text= doc.get(FieldNameEnum.BODY.getDescription());
   TokenStream tokenStream =  
analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new  
StringReader(text));
   String result =  
highlighter.getBestFragments(tokenStream, text, 3, "...");



the string result is empty.  This is very strange, if i try a  
different term that exists in the document then I get a summary.   
For example I have a word document that contains the term  
"document" and "aspectj".  If I search for "document" I get the  
correct document but no highlighted summary.  However if I search  
using "aspectj" I get the same doucment with highlighted summary.


Just to mentioned I do rewrite the original query before  
performing the highlighting.


I'm not sure what i'm missing here.  Any help would be appreciated.

Cheers
Amin

On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman > wrote:

Hi

Got it working!  Thanks again for your help!


Amin


On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman > wrote:

Thanks!  The final piece that I needed to do for the project!

Cheers

Amin

On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler   
wrote:
> cool.  i will use compression and store in index. is there  
anything

> special
> i need to for decompressing the text? i presume i can just do
> doc.get("content")?
> thanks for your advice all!

No just use Field.Store.COMPRESS when adding to index and  
Document.get()

when fetching. The decompression is automatically done.

You may think, why not enable compression for all fields? The case  
is, that
this is an overhead for very small and short fields. So you should  
only use
it for large contents (it's the same like compressing very small  
files as

ZIP/GZIP: These files mostly get larger than without compression).

Uwe


--- 
--

To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org






--- 
-



No virus found in this incoming message.
Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database: 270.11.10/1995 
 - Release Date: 03/11/09 08:28:00







-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
JIRA updated.  Includes new testcase which shows highlighter not working as
expected.

On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman wrote:

> Hi
>
> I have found that it is not issue with POI. I extracted text using PoI but
> differenlty and the term is extracted properly.  When I store the text and
> retrieve it the term exists. However running the text through highlighter
> doesn't work
>
> I will post test case with plain text file on JIRA. Currently on a cramped
> train!
>
> Cheers
>
>
>
> On 11 Mar 2009, at 18:11, markharw00d  wrote:
>
>  If you can supply a Junit test that recreates the problem I think we can
>> start to make progress on this.
>>
>>
>>
>> Amin Mohammed-Coleman wrote:
>>
>>> Hi
>>>
>>> Apologies for re sending this mail. Just wondering if anyone has
>>> experienced the below. I'm not sure if this could happen due nature of
>>> document. It does seem strange one term search returns summary while another
>>> does not even though same document is being returned.
>>>
>>> I'm asking this so I can code around this if is normal.
>>>
>>>
>>> Apologies again for re sending this mail
>>>
>>> Cheers
>>>
>>> Amin
>>>
>>> Sent from my iPhone
>>>
>>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman  wrote:
>>>
>>>  Hi
>>>>
>>>> I am seeing some strange behaviour with the highlighter and I'm
>>>> wondering if anyone else is experiencing this.  In certain instances I 
>>>> don't
>>>> get a summary being generated.  I perform the search and the search returns
>>>> the correct document.  I can see that the lucene document contains the text
>>>> in the field.  However after doing:
>>>>
>>>>   SimpleHTMLFormatter simpleHTMLFormatter = new
>>>> SimpleHTMLFormatter("", "");
>>>>   //required for highlighting
>>>>   Query query2 = multiSearcher.rewrite(query);
>>>>   Highlighter highlighter = new Highlighter(simpleHTMLFormatter,
>>>> new QueryScorer(query2));
>>>> ...
>>>>
>>>> String text= doc.get(FieldNameEnum.BODY.getDescription());
>>>>   TokenStream tokenStream =
>>>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
>>>> StringReader(text));
>>>>   String result = highlighter.getBestFragments(tokenStream,
>>>> text, 3, "...");
>>>>
>>>>
>>>> the string result is empty.  This is very strange, if i try a different
>>>> term that exists in the document then I get a summary.  For example I have 
>>>> a
>>>> word document that contains the term "document" and "aspectj".  If I search
>>>> for "document" I get the correct document but no highlighted summary.
>>>>  However if I search using "aspectj" I get the same doucment with
>>>> highlighted summary.
>>>>
>>>> Just to mentioned I do rewrite the original query before performing the
>>>> highlighting.
>>>>
>>>> I'm not sure what i'm missing here.  Any help would be appreciated.
>>>>
>>>> Cheers
>>>> Amin
>>>>
>>>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman 
>>>> wrote:
>>>> Hi
>>>>
>>>> Got it working!  Thanks again for your help!
>>>>
>>>>
>>>> Amin
>>>>
>>>>
>>>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman <
>>>> ami...@gmail.com> wrote:
>>>> Thanks!  The final piece that I needed to do for the project!
>>>>
>>>> Cheers
>>>>
>>>> Amin
>>>>
>>>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler  wrote:
>>>> > cool.  i will use compression and store in index. is there anything
>>>> > special
>>>> > i need to for decompressing the text? i presume i can just do
>>>> > doc.get("content")?
>>>> > thanks for your advice all!
>>>>
>>>> No just use Field.Store.COMPRESS when adding to index and Document.get()
>>>> when fetching. The decompression is automatically done.
>>>>
>>>> You may think, why not enable compression for all fields? The case is,
>>>> that
>>>> this is an overhead for very small and short fields. So you should only
>>>> use
>>>> it for large contents (it's the same like compressing very small files
>>>> as
>>>> ZIP/GZIP: These files mostly get larger than without compression).
>>>>
>>>> Uwe
>>>>
>>>>
>>>> -
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>>>
>>> 
>>>
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database:
>>> 270.11.10/1995 - Release Date: 03/11/09 08:28:00
>>>
>>>
>>>
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>


Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman
I did the following:

highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);


which works.

On Thu, Mar 12, 2009 at 6:41 PM, Amin Mohammed-Coleman wrote:

> JIRA updated.  Includes new testcase which shows highlighter not working as
> expected.
>
>
> On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman 
> wrote:
>
>> Hi
>>
>> I have found that it is not issue with POI. I extracted text using PoI but
>> differenlty and the term is extracted properly.  When I store the text and
>> retrieve it the term exists. However running the text through highlighter
>> doesn't work
>>
>> I will post test case with plain text file on JIRA. Currently on a cramped
>> train!
>>
>> Cheers
>>
>>
>>
>> On 11 Mar 2009, at 18:11, markharw00d  wrote:
>>
>>  If you can supply a Junit test that recreates the problem I think we can
>>> start to make progress on this.
>>>
>>>
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>>> Hi
>>>>
>>>> Apologies for re sending this mail. Just wondering if anyone has
>>>> experienced the below. I'm not sure if this could happen due nature of
>>>> document. It does seem strange one term search returns summary while 
>>>> another
>>>> does not even though same document is being returned.
>>>>
>>>> I'm asking this so I can code around this if is normal.
>>>>
>>>>
>>>> Apologies again for re sending this mail
>>>>
>>>> Cheers
>>>>
>>>> Amin
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman 
>>>> wrote:
>>>>
>>>>  Hi
>>>>>
>>>>> I am seeing some strange behaviour with the highlighter and I'm
>>>>> wondering if anyone else is experiencing this.  In certain instances I 
>>>>> don't
>>>>> get a summary being generated.  I perform the search and the search 
>>>>> returns
>>>>> the correct document.  I can see that the lucene document contains the 
>>>>> text
>>>>> in the field.  However after doing:
>>>>>
>>>>>   SimpleHTMLFormatter simpleHTMLFormatter = new
>>>>> SimpleHTMLFormatter("", "");
>>>>>   //required for highlighting
>>>>>   Query query2 = multiSearcher.rewrite(query);
>>>>>   Highlighter highlighter = new
>>>>> Highlighter(simpleHTMLFormatter, new QueryScorer(query2));
>>>>> ...
>>>>>
>>>>> String text= doc.get(FieldNameEnum.BODY.getDescription());
>>>>>   TokenStream tokenStream =
>>>>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
>>>>> StringReader(text));
>>>>>   String result = highlighter.getBestFragments(tokenStream,
>>>>> text, 3, "...");
>>>>>
>>>>>
>>>>> the string result is empty.  This is very strange, if i try a different
>>>>> term that exists in the document then I get a summary.  For example I 
>>>>> have a
>>>>> word document that contains the term "document" and "aspectj".  If I 
>>>>> search
>>>>> for "document" I get the correct document but no highlighted summary.
>>>>>  However if I search using "aspectj" I get the same doucment with
>>>>> highlighted summary.
>>>>>
>>>>> Just to mentioned I do rewrite the original query before performing the
>>>>> highlighting.
>>>>>
>>>>> I'm not sure what i'm missing here.  Any help would be appreciated.
>>>>>
>>>>> Cheers
>>>>> Amin
>>>>>
>>>>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman <
>>>>> ami...@gmail.com> wrote:
>>>>> Hi
>>>>>
>>>>> Got it working!  Thanks again for your help!
>>>>>
>>>>>
>>>>> Amin
>>>>>
>>>>>
>>>>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman <
>>>>> ami...@gmail.com> wrote:
>>>>> Thanks!  The final piece that I needed to do for the project!
>>>>>
>>>>> Cheers
>>>>&g

Re: Lucene Highlighting and Dynamic Summaries

2009-03-12 Thread Amin Mohammed-Coleman

Hi

I think that would be good. Probably a silly thing to ask but I guess  
there is a performance implication by setting it to max value.


Is there a general setting that other developers use?

Cheers

Amin



On 12 Mar 2009, at 22:03, Michael McCandless  
 wrote:




IndexWriter has such behavior too, and because it was such a common  
trap
(developers could not understand why their content was being  
truncated), we

made that setting explicit, up front so you were aware of it.

I think this in general is a reasonable approach for settings that  
"lose" stuff (content,

highlighted terms, etc.).

Maybe we should do the same for highlighter?

Mike

Amin Mohammed-Coleman wrote:


I did the following:

highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);


which works.

On Thu, Mar 12, 2009 at 6:41 PM, Amin Mohammed-Coleman >wrote:


JIRA updated.  Includes new testcase which shows highlighter not  
working as

expected.


On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman >wrote:



Hi

I have found that it is not issue with POI. I extracted text  
using PoI but
differenlty and the term is extracted properly.  When I store the  
text and
retrieve it the term exists. However running the text through  
highlighter

doesn't work

I will post test case with plain text file on JIRA. Currently on  
a cramped

train!

Cheers



On 11 Mar 2009, at 18:11, markharw00d   
wrote:


If you can supply a Junit test that recreates the problem I think  
we can

start to make progress on this.



Amin Mohammed-Coleman wrote:


Hi

Apologies for re sending this mail. Just wondering if anyone has
experienced the below. I'm not sure if this could happen due  
nature of
document. It does seem strange one term search returns summary  
while another

does not even though same document is being returned.

I'm asking this so I can code around this if is normal.


Apologies again for re sending this mail

Cheers

Amin

Sent from my iPhone

On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman 
wrote:

Hi


I am seeing some strange behaviour with the highlighter and I'm
wondering if anyone else is experiencing this.  In certain  
instances I don't
get a summary being generated.  I perform the search and the  
search returns
the correct document.  I can see that the lucene document  
contains the text

in the field.  However after doing:

SimpleHTMLFormatter simpleHTMLFormatter = new
SimpleHTMLFormatter("", "span>");

//required for highlighting
Query query2 = multiSearcher.rewrite(query);
Highlighter highlighter = new
Highlighter(simpleHTMLFormatter, new QueryScorer(query2));
...

String text= doc.get(FieldNameEnum.BODY.getDescription());
TokenStream tokenStream =
analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
StringReader(text));
String result =  
highlighter.getBestFragments(tokenStream,

text, 3, "...");


the string result is empty.  This is very strange, if i try a  
different
term that exists in the document then I get a summary.  For  
example I have a
word document that contains the term "document" and  
"aspectj".  If I search
for "document" I get the correct document but no highlighted  
summary.

However if I search using "aspectj" I get the same doucment with
highlighted summary.

Just to mentioned I do rewrite the original query before  
performing the

highlighting.

I'm not sure what i'm missing here.  Any help would be  
appreciated.


Cheers
Amin

On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman <
ami...@gmail.com> wrote:
Hi

Got it working!  Thanks again for your help!


Amin


On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman <
ami...@gmail.com> wrote:
Thanks!  The final piece that I needed to do for the project!

Cheers

Amin

On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler 
wrote:
cool.  i will use compression and store in index. is there  
anything

special
i need to for decompressing the text? i presume i can just do
doc.get("content")?
thanks for your advice all!


No just use Field.Store.COMPRESS when adding to index and
Document.get()
when fetching. The decompression is automatically done.

You may think, why not enable compression for all fields? The  
case is,

that
this is an overhead for very small and short fields. So you  
should only

use
it for large contents (it's the same like compressing very  
small files

as
ZIP/GZIP: These files mostly get larger than without  
compression).


Uwe


--- 
--- 
---

To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user- 
h...@lucene.apache.org






--- 
--- 
--- 
---



No virus found in this incoming message.
Checked by AVG - www.avg.com Version: 8.0.237 / Virus Data

Re: Lucene Highlighting and Dynamic Summaries

2009-03-13 Thread Amin Mohammed-Coleman
Sweet!  When will this highlighter be available?  Can I use this now?

Cheers!


On Fri, Mar 13, 2009 at 10:10 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> Amin Mohammed-Coleman wrote:
>
>  I think that would be good.
>>
>
> I'll open an issue.
>
>  Probably a silly thing to ask but I guess there is a performance
>> implication by setting it to max value.
>>
>
> Right.  And it's tough choosing a default in situations like this --
> performance vs losing stuff.
>
> However, there's a new highlighter:
>
>https://issues.apache.org/jira/browse/LUCENE-1522
>
> which looks like it may have promising performance and no default "loses
> highlighted terms" limit, I think.
>
> Mike
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene Highlighting and Dynamic Summaries

2009-03-13 Thread Amin Mohammed-Coleman
Absolutely!  I have received considerable help from the community and there
are so many more stuff I want to ask!

Cheers!

Amin

On Fri, Mar 13, 2009 at 10:41 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> Well, it's not yet committed.
>
> You can use it now by pulling the patch attached to the issue & testing it
> yourself.  If you do so, please report back!  This is how Lucene improves.
>
> I'm hoping we can include it in 2.9...
>
> Mike
>
>
> On Mar 13, 2009, at 6:35 AM, Amin Mohammed-Coleman wrote:
>
>  Sweet!  When will this highlighter be available?  Can I use this now?
>>
>> Cheers!
>>
>>
>> On Fri, Mar 13, 2009 at 10:10 AM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>> I think that would be good.
>>>
>>>>
>>>>
>>> I'll open an issue.
>>>
>>> Probably a silly thing to ask but I guess there is a performance
>>>
>>>> implication by setting it to max value.
>>>>
>>>>
>>> Right.  And it's tough choosing a default in situations like this --
>>> performance vs losing stuff.
>>>
>>> However, there's a new highlighter:
>>>
>>>  https://issues.apache.org/jira/browse/LUCENE-1522
>>>
>>> which looks like it may have promising performance and no default "loses
>>> highlighted terms" limit, I think.
>>>
>>> Mike
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>
>>>
>>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene Highlighting and Dynamic Summaries

2009-03-13 Thread Amin Mohammed-Coleman
Ok.  I tried to apply the patch(s) and completely messed it up (user
error).  Is there a full example of the highlighter that is available that I
can apply and test?

Cheers
Amin


On Fri, Mar 13, 2009 at 12:09 PM, Amin Mohammed-Coleman wrote:

> Absolutely!  I have received considerable help from the community and there
> are so many more stuff I want to ask!
>
> Cheers!
>
> Amin
>
>
> On Fri, Mar 13, 2009 at 10:41 AM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>>
>> Well, it's not yet committed.
>>
>> You can use it now by pulling the patch attached to the issue & testing it
>> yourself.  If you do so, please report back!  This is how Lucene improves.
>>
>> I'm hoping we can include it in 2.9...
>>
>> Mike
>>
>>
>> On Mar 13, 2009, at 6:35 AM, Amin Mohammed-Coleman wrote:
>>
>>  Sweet!  When will this highlighter be available?  Can I use this now?
>>>
>>> Cheers!
>>>
>>>
>>> On Fri, Mar 13, 2009 at 10:10 AM, Michael McCandless <
>>> luc...@mikemccandless.com> wrote:
>>>
>>>
>>>> Amin Mohammed-Coleman wrote:
>>>>
>>>> I think that would be good.
>>>>
>>>>>
>>>>>
>>>> I'll open an issue.
>>>>
>>>> Probably a silly thing to ask but I guess there is a performance
>>>>
>>>>> implication by setting it to max value.
>>>>>
>>>>>
>>>> Right.  And it's tough choosing a default in situations like this --
>>>> performance vs losing stuff.
>>>>
>>>> However, there's a new highlighter:
>>>>
>>>>  https://issues.apache.org/jira/browse/LUCENE-1522
>>>>
>>>> which looks like it may have promising performance and no default "loses
>>>> highlighted terms" limit, I think.
>>>>
>>>> Mike
>>>>
>>>>
>>>> -
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>>
>>>>
>>>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>


Pagination with MultiSearcher

2009-03-14 Thread Amin Mohammed-Coleman

Hi

I'm looking at trying to implement pagination for my search project.  
I've been google-ing for a solution. So far no luck. I've seen  
implementations of HitCollector which looks promising, however my  
search method has to completely change.


For example I'm currently using the following:

search ( query, filter,int, sort)

If I use a HitCollector there isn't a search to apply  
query,hitcollector,sort and filter, unless I'm supposed to apply sort  
and filter in the hit collector.


I would be grateul if anyone could advise me what approach to take.

One a side note I just want to thank you all for helping me with many  
of my issues. I'm hoping this is my last question!  Thanks for your  
patience!



Cheers

Amin


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: how to index keyword and value

2009-03-15 Thread Amin Mohammed-Coleman
Why don't you create a Lucene document that represents a Person and then
index the fields name, age, phone number, etc.  Search on the name and then
get the corresponding phone number from the search.
Cheers
Amin

On Sun, Mar 15, 2009 at 10:56 AM, Seid Mohammed  wrote:

> I want to Index Person_Name and associated phone number.
> Example: Abebe ===>+2519112332
> later, When I search for Abebe, it should display +2519112332
> any hint
>
> seid M
>
> --
> "RABI ZIDNI ILMA"
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Pagination with MultiSearcher

2009-03-15 Thread Amin Mohammed-Coleman
HI Erick
Thanks for your reply, glad to see I'm not the only person
working/developing on a Sunday!  I'm not sure how the FieldSortedHitQueue
works and how it can be applied to the search method exposed by
MultiSearcher.  Would it be possible to clarify abit more or even point to
some reference documentation?

Cheers
Amin

On Sun, Mar 15, 2009 at 1:08 PM, Erick Erickson wrote:

> You could do something with FieldSortedHitQueue as a post-search
> sort, but I wonder if this would work for you...
>
> public TopFieldDocs
> <
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/TopFieldDocs.html
> >
> *search*(Query <
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Query.html
> >
> query,
>   Filter
> <
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Filter.html
> >
> filter,
>   int n,
>   Sort
> <
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Sort.html
> >
> sort)
>throws IOException
> <http://java.sun.com/j2se/1.4/docs/api/java/io/IOException.html>
>
>
> Best
> Erick
>
>
> On Sun, Mar 15, 2009 at 2:12 AM, Amin Mohammed-Coleman  >wrote:
>
> > Hi
> >
> > I'm looking at trying to implement pagination for my search project. I've
> > been google-ing for a solution. So far no luck. I've seen implementations
> of
> > HitCollector which looks promising, however my search method has to
> > completely change.
> >
> > For example I'm currently using the following:
> >
> > search ( query, filter,int, sort)
> >
> > If I use a HitCollector there isn't a search to apply
> > query,hitcollector,sort and filter, unless I'm supposed to apply sort and
> > filter in the hit collector.
> >
> > I would be grateul if anyone could advise me what approach to take.
> >
> > One a side note I just want to thank you all for helping me with many of
> my
> > issues. I'm hoping this is my last question!  Thanks for your patience!
> >
> >
> > Cheers
> >
> > Amin
> >
> >
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>


Re: how to index keyword and value

2009-03-15 Thread Amin Mohammed-Coleman
When you create a query to the searcher you can specify which field to
search on for example:

Query query = queryParser.parse(searchTerm);


QueryParser is constructed like this:


QueryParser queryParser = new
AnalyzingQueryParser<http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html#AnalyzingQueryParser(java.lang.String,%20org.apache.lucene.analysis.Analyzer)>("name",new
StandardAnalyzer());


Pass the query to the IndexSearcher and you get hits.  From the hits you can
get the documents and from each matching doucment you can get the phone
number field (if you store the number in the index).


HTH


On Sun, Mar 15, 2009 at 1:32 PM, Seid Mohammed  wrote:

> dear Erick, that one I have tried the very begining on playing lucene.
> I know how to create documents, but my question is I want to create
> documents with fields such as  person-name and phone-number and so on.
> while searching, i will submit a person name so that it will return me
> the phone number of that person.
>
> hope you get my problem
>
> Thanks a lot
>
> Seid M
>
> On 3/15/09, Erick Erickson  wrote:
> > Have you tried working through the getting started guide at
> > http://lucene.apache.org/java/2_4_1/gettingstarted.html? That
> > should give you a good idea of how to create a document in Lucene.
> >
> >
> > Best
> > Erick
> >
> > On Sun, Mar 15, 2009 at 8:49 AM, Seid Mohammed 
> wrote:
> >
> >> that is exactly my question
> >> how can I do that?
> >>
> >> thanks a lot
> >> Seid M
> >>
> >> On 3/15/09, Amin Mohammed-Coleman  wrote:
> >> > Why don't you create a Lucene document that represents a Person and
> then
> >> > index the fields name, age, phone number, etc.  Search on the name and
> >> then
> >> > get the corresponding phone number from the search.
> >> > Cheers
> >> > Amin
> >> >
> >> > On Sun, Mar 15, 2009 at 10:56 AM, Seid Mohammed 
> >> wrote:
> >> >
> >> >> I want to Index Person_Name and associated phone number.
> >> >> Example: Abebe ===>+2519112332
> >> >> later, When I search for Abebe, it should display +2519112332
> >> >> any hint
> >> >>
> >> >> seid M
> >> >>
> >> >> --
> >> >> "RABI ZIDNI ILMA"
> >> >>
> >> >> -
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>
> >> >>
> >> >
> >>
> >>
> >> --
> >> "RABI ZIDNI ILMA"
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
>
>
> --
> "RABI ZIDNI ILMA"
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Pagination with MultiSearcher

2009-03-16 Thread Amin Mohammed-Coleman
Hi Erick

I've seen the following:

TopDocCollector collector = new TopDocCollector(hitsPerPage) and then pass
the collector to the seacher.  But I'm not sure how I increment the
hitsPerPage.  Also how do I get the total results returned?

In relation to sorting I could basically use Collections.sort(..) or
something similar.  My search returns a collection of summary objects which
I could sort at that stage rather than passing it to the search code.  This
would mean I could use a collector to do this.

Cheers
Amin



On Mon, Mar 16, 2009 at 1:42 AM, Erick Erickson wrote:

> Basically, the FileSortedHitQueue is just a sorting mechanism you
> implement yourself. But I can't help but think that there's an easier
> way, although I'll have to admit I haven't used MultiSearcher enough
> to offer much guidance. That'll teach me to send something off
> on Sunday that I don't really understand well enough
>
> Sorry 'bout that
> Erick
>
> On Sun, Mar 15, 2009 at 9:15 AM, Amin Mohammed-Coleman  >wrote:
>
> > HI Erick
> > Thanks for your reply, glad to see I'm not the only person
> > working/developing on a Sunday!  I'm not sure how the FieldSortedHitQueue
> > works and how it can be applied to the search method exposed by
> > MultiSearcher.  Would it be possible to clarify abit more or even point
> to
> > some reference documentation?
> >
> > Cheers
> > Amin
> >
> > On Sun, Mar 15, 2009 at 1:08 PM, Erick Erickson  > >wrote:
> >
> > > You could do something with FieldSortedHitQueue as a post-search
> > > sort, but I wonder if this would work for you...
> > >
> > > public TopFieldDocs
> > > <
> > >
> >
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/TopFieldDocs.html
> > > >
> > > *search*(Query <
> > >
> >
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Query.html
> > > >
> > > query,
> > >   Filter
> > > <
> > >
> >
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Filter.html
> > > >
> > > filter,
> > >   int n,
> > >   Sort
> > > <
> > >
> >
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Sort.html
> > > >
> > > sort)
> > >throws IOException
> > > <http://java.sun.com/j2se/1.4/docs/api/java/io/IOException.html>
> > >
> > >
> > > Best
> > > Erick
> > >
> > >
> > > On Sun, Mar 15, 2009 at 2:12 AM, Amin Mohammed-Coleman <
> ami...@gmail.com
> > > >wrote:
> > >
> > > > Hi
> > > >
> > > > I'm looking at trying to implement pagination for my search project.
> > I've
> > > > been google-ing for a solution. So far no luck. I've seen
> > implementations
> > > of
> > > > HitCollector which looks promising, however my search method has to
> > > > completely change.
> > > >
> > > > For example I'm currently using the following:
> > > >
> > > > search ( query, filter,int, sort)
> > > >
> > > > If I use a HitCollector there isn't a search to apply
> > > > query,hitcollector,sort and filter, unless I'm supposed to apply sort
> > and
> > > > filter in the hit collector.
> > > >
> > > > I would be grateul if anyone could advise me what approach to take.
> > > >
> > > > One a side note I just want to thank you all for helping me with many
> > of
> > > my
> > > > issues. I'm hoping this is my last question!  Thanks for your
> patience!
> > > >
> > > >
> > > > Cheers
> > > >
> > > > Amin
> > > >
> > > >
> > > > -
> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > > >
> > > >
> > >
> >
>


Re: Pagination with MultiSearcher

2009-03-16 Thread Amin Mohammed-Coleman
Hi
I've come across the PageHitCollector class from the:

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200707.mbox/%3c070320071521.6119.468a6964000b3e7517e72205884484070a9c0701030...@comcast.net%3e

I'm looking at using this in the multisearcher class, and do:

search(query,filter,pageHitCollector)

I intend to use comparators to do the sorting and use collections.sort().

I would be grateful for any feedback on whether this is a good approach.

Cheers
Amin

On Mon, Mar 16, 2009 at 8:03 AM, Amin Mohammed-Coleman wrote:

> Hi Erick
>
> I've seen the following:
>
> TopDocCollector collector = new TopDocCollector(hitsPerPage) and then pass
> the collector to the seacher.  But I'm not sure how I increment the
> hitsPerPage.  Also how do I get the total results returned?
>
> In relation to sorting I could basically use Collections.sort(..) or
> something similar.  My search returns a collection of summary objects which
> I could sort at that stage rather than passing it to the search code.  This
> would mean I could use a collector to do this.
>
> Cheers
> Amin
>
>
>
>
> On Mon, Mar 16, 2009 at 1:42 AM, Erick Erickson 
> wrote:
>
>> Basically, the FileSortedHitQueue is just a sorting mechanism you
>> implement yourself. But I can't help but think that there's an easier
>> way, although I'll have to admit I haven't used MultiSearcher enough
>> to offer much guidance. That'll teach me to send something off
>> on Sunday that I don't really understand well enough
>>
>> Sorry 'bout that
>> Erick
>>
>> On Sun, Mar 15, 2009 at 9:15 AM, Amin Mohammed-Coleman > >wrote:
>>
>> > HI Erick
>> > Thanks for your reply, glad to see I'm not the only person
>> > working/developing on a Sunday!  I'm not sure how the
>> FieldSortedHitQueue
>> > works and how it can be applied to the search method exposed by
>> > MultiSearcher.  Would it be possible to clarify abit more or even point
>> to
>> > some reference documentation?
>> >
>> > Cheers
>> > Amin
>> >
>> > On Sun, Mar 15, 2009 at 1:08 PM, Erick Erickson <
>> erickerick...@gmail.com
>> > >wrote:
>> >
>> > > You could do something with FieldSortedHitQueue as a post-search
>> > > sort, but I wonder if this would work for you...
>> > >
>> > > public TopFieldDocs
>> > > <
>> > >
>> >
>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/TopFieldDocs.html
>> > > >
>> > > *search*(Query <
>> > >
>> >
>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Query.html
>> > > >
>> > > query,
>> > >   Filter
>> > > <
>> > >
>> >
>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Filter.html
>> > > >
>> > > filter,
>> > >   int n,
>> > >   Sort
>> > > <
>> > >
>> >
>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Sort.html
>> > > >
>> > > sort)
>> > >throws IOException
>> > > <http://java.sun.com/j2se/1.4/docs/api/java/io/IOException.html>
>> > >
>> > >
>> > > Best
>> > > Erick
>> > >
>> > >
>> > > On Sun, Mar 15, 2009 at 2:12 AM, Amin Mohammed-Coleman <
>> ami...@gmail.com
>> > > >wrote:
>> > >
>> > > > Hi
>> > > >
>> > > > I'm looking at trying to implement pagination for my search project.
>> > I've
>> > > > been google-ing for a solution. So far no luck. I've seen
>> > implementations
>> > > of
>> > > > HitCollector which looks promising, however my search method has to
>> > > > completely change.
>> > > >
>> > > > For example I'm currently using the following:
>> > > >
>> > > > search ( query, filter,int, sort)
>> > > >
>> > > > If I use a HitCollector there isn't a search to apply
>> > > > query,hitcollector,sort and filter, unless I'm supposed to apply
>> sort
>> > and
>> > > > filter in the hit collector.
>> > > >
>> > > > I would be grateul if anyone could advise me what approach to take.
>> > > >
>> > > > One a side note I just want to thank you all for helping me with
>> many
>> > of
>> > > my
>> > > > issues. I'm hoping this is my last question!  Thanks for your
>> patience!
>> > > >
>> > > >
>> > > > Cheers
>> > > >
>> > > > Amin
>> > > >
>> > > >
>> > > >
>> -
>> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
>> > > >
>> > > >
>> > >
>> >
>>
>
>


Re: Pagination with MultiSearcher

2009-03-19 Thread Amin Mohammed-Coleman
Hi

I've implemented the solution using the PageHitCounter from the link and I
have noticed that in certain instances I get a 0 score for queries like
"document OR aspectj".

has anyone else experienced this?

Cheers
Amin

On Mon, Mar 16, 2009 at 8:07 PM, Amin Mohammed-Coleman wrote:

> Hi
> I've come across the PageHitCollector class from the:
>
>
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/200707.mbox/%3c070320071521.6119.468a6964000b3e7517e72205884484070a9c0701030...@comcast.net%3e
>
> I'm looking at using this in the multisearcher class, and do:
>
> search(query,filter,pageHitCollector)
>
> I intend to use comparators to do the sorting and use collections.sort().
>
> I would be grateful for any feedback on whether this is a good approach.
>
> Cheers
> Amin
>
> On Mon, Mar 16, 2009 at 8:03 AM, Amin Mohammed-Coleman 
> wrote:
>
>> Hi Erick
>>
>> I've seen the following:
>>
>> TopDocCollector collector = new TopDocCollector(hitsPerPage) and then pass
>> the collector to the seacher.  But I'm not sure how I increment the
>> hitsPerPage.  Also how do I get the total results returned?
>>
>> In relation to sorting I could basically use Collections.sort(..) or
>> something similar.  My search returns a collection of summary objects which
>> I could sort at that stage rather than passing it to the search code.  This
>> would mean I could use a collector to do this.
>>
>> Cheers
>> Amin
>>
>>
>>
>>
>> On Mon, Mar 16, 2009 at 1:42 AM, Erick Erickson 
>> wrote:
>>
>>> Basically, the FileSortedHitQueue is just a sorting mechanism you
>>> implement yourself. But I can't help but think that there's an easier
>>> way, although I'll have to admit I haven't used MultiSearcher enough
>>> to offer much guidance. That'll teach me to send something off
>>> on Sunday that I don't really understand well enough
>>>
>>> Sorry 'bout that
>>> Erick
>>>
>>> On Sun, Mar 15, 2009 at 9:15 AM, Amin Mohammed-Coleman >> >wrote:
>>>
>>> > HI Erick
>>> > Thanks for your reply, glad to see I'm not the only person
>>> > working/developing on a Sunday!  I'm not sure how the
>>> FieldSortedHitQueue
>>> > works and how it can be applied to the search method exposed by
>>> > MultiSearcher.  Would it be possible to clarify abit more or even point
>>> to
>>> > some reference documentation?
>>> >
>>> > Cheers
>>> > Amin
>>> >
>>> > On Sun, Mar 15, 2009 at 1:08 PM, Erick Erickson <
>>> erickerick...@gmail.com
>>> > >wrote:
>>> >
>>> > > You could do something with FieldSortedHitQueue as a post-search
>>> > > sort, but I wonder if this would work for you...
>>> > >
>>> > > public TopFieldDocs
>>> > > <
>>> > >
>>> >
>>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/TopFieldDocs.html
>>> > > >
>>> > > *search*(Query <
>>> > >
>>> >
>>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Query.html
>>> > > >
>>> > > query,
>>> > >   Filter
>>> > > <
>>> > >
>>> >
>>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Filter.html
>>> > > >
>>> > > filter,
>>> > >   int n,
>>> > >   Sort
>>> > > <
>>> > >
>>> >
>>> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Sort.html
>>> > > >
>>> > > sort)
>>> > >throws IOException
>>> > > <http://java.sun.com/j2se/1.4/docs/api/java/io/IOException.html>
>>> > >
>>> > >
>>> > > Best
>>> > > Erick
>>> > >
>>> > >
>>> > > On Sun, Mar 15, 2009 at 2:12 AM, Amin Mohammed-Coleman <
>>> ami...@gmail.com
>>> > > >wrote:
>>> > >
>>> > > > Hi
>>> > > >
>>> > > > I'm looking at trying to implement pagination for my search
>>> project.
>>> > I've
>>> > > > been google-ing for a solution. So far no luck. I've seen
>>> > implementations
>>> > > of
>>> > > > HitCollector which looks promising, however my search method has to
>>> > > > completely change.
>>> > > >
>>> > > > For example I'm currently using the following:
>>> > > >
>>> > > > search ( query, filter,int, sort)
>>> > > >
>>> > > > If I use a HitCollector there isn't a search to apply
>>> > > > query,hitcollector,sort and filter, unless I'm supposed to apply
>>> sort
>>> > and
>>> > > > filter in the hit collector.
>>> > > >
>>> > > > I would be grateul if anyone could advise me what approach to take.
>>> > > >
>>> > > > One a side note I just want to thank you all for helping me with
>>> many
>>> > of
>>> > > my
>>> > > > issues. I'm hoping this is my last question!  Thanks for your
>>> patience!
>>> > > >
>>> > > >
>>> > > > Cheers
>>> > > >
>>> > > > Amin
>>> > > >
>>> > > >
>>> > > >
>>> -
>>> > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> > > > For additional commands, e-mail: java-user-h...@lucene.apache.org
>>> > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>


Re: Pagination with MultiSearcher

2009-03-19 Thread Amin Mohammed-Coleman

Hi

Please ignore the problem I raised. User error !

Sorry

Amin

On 19 Mar 2009, at 09:41, Amin Mohammed-Coleman   
wrote:



Hi

I've implemented the solution using the PageHitCounter from the link  
and I have noticed that in certain instances I get a 0 score for  
queries like "document OR aspectj".


has anyone else experienced this?

Cheers
Amin

On Mon, Mar 16, 2009 at 8:07 PM, Amin Mohammed-Coleman > wrote:

Hi

I've come across the PageHitCollector class from the:

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200707.mbox/%3c070320071521.6119.468a6964000b3e7517e72205884484070a9c0701030...@comcast.net%3e

I'm looking at using this in the multisearcher class, and do:

search(query,filter,pageHitCollector)

I intend to use comparators to do the sorting and use  
collections.sort().


I would be grateful for any feedback on whether this is a good  
approach.


Cheers
Amin

On Mon, Mar 16, 2009 at 8:03 AM, Amin Mohammed-Coleman > wrote:

Hi Erick

I've seen the following:

TopDocCollector collector = new TopDocCollector(hitsPerPage) and  
then pass the collector to the seacher.  But I'm not sure how I  
increment the hitsPerPage.  Also how do I get the total results  
returned?


In relation to sorting I could basically use Collections.sort(..) or  
something similar.  My search returns a collection of summary  
objects which I could sort at that stage rather than passing it to  
the search code.  This would mean I could use a collector to do this.


Cheers
Amin




On Mon, Mar 16, 2009 at 1:42 AM, Erick Erickson > wrote:

Basically, the FileSortedHitQueue is just a sorting mechanism you
implement yourself. But I can't help but think that there's an easier
way, although I'll have to admit I haven't used MultiSearcher enough
to offer much guidance. That'll teach me to send something off
on Sunday that I don't really understand well enough....

Sorry 'bout that
Erick

On Sun, Mar 15, 2009 at 9:15 AM, Amin Mohammed-Coleman >wrote:


> HI Erick
> Thanks for your reply, glad to see I'm not the only person
> working/developing on a Sunday!  I'm not sure how the  
FieldSortedHitQueue

> works and how it can be applied to the search method exposed by
> MultiSearcher.  Would it be possible to clarify abit more or even  
point to

> some reference documentation?
>
> Cheers
> Amin
>
> On Sun, Mar 15, 2009 at 1:08 PM, Erick Erickson  >wrote:
>
> > You could do something with FieldSortedHitQueue as a post-search
> > sort, but I wonder if this would work for you...
> >
> > public TopFieldDocs
> > <
> >
> 
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/TopFieldDocs.html
> > >
> > *search*(Query <
> >
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Query.html
> > >
> > query,
> >   Filter
> > <
> >
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Filter.html
> > >
> > filter,
> >   int n,
> >   Sort
> > <
> >
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Sort.html
> > >
> > sort)
> >throws IOException
> > <http://java.sun.com/j2se/1.4/docs/api/java/io/IOException.html>
> >
> >
> > Best
> > Erick
> >
> >
> > On Sun, Mar 15, 2009 at 2:12 AM, Amin Mohammed-Coleman  > >wrote:
> >
> > > Hi
> > >
> > > I'm looking at trying to implement pagination for my search  
project.

> I've
> > > been google-ing for a solution. So far no luck. I've seen
> implementations
> > of
> > > HitCollector which looks promising, however my search method  
has to

> > > completely change.
> > >
> > > For example I'm currently using the following:
> > >
> > > search ( query, filter,int, sort)
> > >
> > > If I use a HitCollector there isn't a search to apply
> > > query,hitcollector,sort and filter, unless I'm supposed to  
apply sort

> and
> > > filter in the hit collector.
> > >
> > > I would be grateul if anyone could advise me what approach to  
take.

> > >
> > > One a side note I just want to thank you all for helping me  
with many

> of
> > my
> > > issues. I'm hoping this is my last question!  Thanks for your  
patience!

> > >
> > >
> > > Cheers
> > >
> > > Amin
> > >
> > >
> > >  
-

> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > > For additional commands, e-mail: java-user- 
h...@lucene.apache.org

> > >
> > >
> >
>





Similarity and Lucene

2009-03-20 Thread Amin Mohammed-Coleman
Hi

If I choose to subclass  the default similarity, do I need to apply the same
subclassed Similarity to IndexReader, IndexWriter and IndexSearcher?

I am interested in doing the below:

Similarity sim = new DefaultSimilarity() {
  public float lengthNorm(String field, int numTerms) {
if(field.equals("body")) return (float) (0.1 * Math.log(numTerms));
else return super.lengthNorm(field, numTerms);
  }
}

[taken from http://www.lucenetutorial.com/advanced-topics/scoring.html]

Is this approach advisable?


Cheers
Amin


Re: Similarity and Lucene

2009-03-20 Thread Amin Mohammed-Coleman
Allthough (I could be wrong) but I'm wondering if the lenthNorm is the
correct one I should be overriding.  I'm interested in the number of times a
term occurs found in a document (more occurance the higher the score) which
I believe is coord.  I may well be i am barking up the wrong tree.

Cheers
Amin

On Fri, Mar 20, 2009 at 4:20 PM, Amin Mohammed-Coleman wrote:

> Hi
>
> If I choose to subclass  the default similarity, do I need to apply the
> same subclassed Similarity to IndexReader, IndexWriter and IndexSearcher?
>
> I am interested in doing the below:
>
> Similarity sim = new DefaultSimilarity() {
>   public float lengthNorm(String field, int numTerms) {
> if(field.equals("body")) return (float) (0.1 * Math.log(numTerms));
> else return super.lengthNorm(field, numTerms);
>   }
> }
>
> [taken from http://www.lucenetutorial.com/advanced-topics/scoring.html]
>
> Is this approach advisable?
>
>
> Cheers
> Amin
>


Re: Performance tips on searching

2009-03-20 Thread Amin Mohammed-Coleman

Hi

How do you expose a pagination without a customized hit collector. The  
multi searcher does not expose a method for hit collector and sort.  
Maybe this is not an issue for people ...


Cheers

Amin

On 20 Mar 2009, at 17:25, "Uwe Schindler"  wrote:

Why not use a MultiSearcher an all single searchers? Or a Searcher  
on a
MultiReader consisting of all IndexReaders? With that you do not  
need to

merge the results.

By the way: instead of creating a TopDocCollector, you could also call
directly,

Searcher.search(Query query, Filter filter, int n, Sort sort)
Searcher.search(Query query, Filter filter, int n)

Filter can be null.

It's shorter and if sorting is also involved, simplier to handle  
(you do not

need to switch between ToDocCollector and TopFieldDocCollector).

Important: With Lucene 2.9, the searches will be faster using this API
(because then each index segment uses an own collector).

Uwe


-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


-Original Message-
From: Paul Taylor [mailto:paul_t...@fastmail.fm]
Sent: Friday, March 20, 2009 6:02 PM
To: java-user@lucene.apache.org
Subject: Performance tips on searching


Hi, my code receives a search query from the web, there are 5  
different
searches that can be searched on - each index is searched with a  
single
IndexSearcher referenced in a map. it parses then  performs the  
search
and return the best 10 results, with scores readjusted over the  
results
so that the best score returns 1.0. Am I performing the optiminal  
search

methods to do what I want ?

thanks Paul

   IndexSearcher searcher = searchers.get(indexName);
   QueryParser parser = new QueryParser(indexName, analyzer);
   TopDocCollector collector = new TopDocCollector(10);
   try {
   searcher.search(parser.parse(query), collector);
   }
   catch (ParseException e) {
   }
   Results results = new Results();
   results.totalHits = collector.getTotalHits();
   TopDocs topDocs = collector.topDocs();
   ScoreDoc docs[] = topDocs.scoreDocs;
   float maxScore = topDocs.getMaxScore();
   for (int i = 0; i < docs.length; i++) {
   Result result = new Result();
   result.score = docs[i].score / maxScore;
   result.doc = searcher.doc(docs[i].doc);
   results.results.add(result);
   }
   return results;

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Performance tips on searching

2009-03-20 Thread Amin Mohammed-Coleman

Hi

I wrote last week about the best way to paginate. I will reply back  
with that email if that ok. This isn't my thread and I don't want to  
deviate from the original topic.



Cheers

Amin

On 20 Mar 2009, at 17:50, "Uwe Schindler"  wrote:


No, the MultiSearcher also exposes all methods, IndexSearcher/Seracher
exposes (it inherits it from the superclass IndexSearcher). And a  
call to
the collector is never sortable, because the sorting is done  
*inside* the

hit collector.

Where is your problem with pagination? Normally you choose n to be
paginationoffset+count and then display Scoredocs between n .. n 
+count-1.
There is no TopDocCollector that can only collect results 100 to  
109. To
display results 100 to 109, you need to collect all results up to  
109, so

call with n=110 and then display scoredoc[100]..scoredoc[109]

This is exactly how the old Hits worked.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de



-Original Message-
From: Amin Mohammed-Coleman [mailto:ami...@gmail.com]
Sent: Friday, March 20, 2009 6:43 PM
To: java-user@lucene.apache.org
Cc: ; 
Subject: Re: Performance tips on searching

Hi

How do you expose a pagination without a customized hit collector.  
The

multi searcher does not expose a method for hit collector and sort.
Maybe this is not an issue for people ...

Cheers

Amin

On 20 Mar 2009, at 17:25, "Uwe Schindler"  wrote:


Why not use a MultiSearcher an all single searchers? Or a Searcher
on a
MultiReader consisting of all IndexReaders? With that you do not
need to
merge the results.

By the way: instead of creating a TopDocCollector, you could also  
call

directly,

Searcher.search(Query query, Filter filter, int n, Sort sort)
Searcher.search(Query query, Filter filter, int n)

Filter can be null.

It's shorter and if sorting is also involved, simplier to handle
(you do not
need to switch between ToDocCollector and TopFieldDocCollector).

Important: With Lucene 2.9, the searches will be faster using this  
API

(because then each index segment uses an own collector).

Uwe


-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


-Original Message-
From: Paul Taylor [mailto:paul_t...@fastmail.fm]
Sent: Friday, March 20, 2009 6:02 PM
To: java-user@lucene.apache.org
Subject: Performance tips on searching


Hi, my code receives a search query from the web, there are 5
different
searches that can be searched on - each index is searched with a
single
IndexSearcher referenced in a map. it parses then  performs the
search
and return the best 10 results, with scores readjusted over the
results
so that the best score returns 1.0. Am I performing the optiminal
search
methods to do what I want ?

thanks Paul

  IndexSearcher searcher = searchers.get(indexName);
  QueryParser parser = new QueryParser(indexName, analyzer);
  TopDocCollector collector = new TopDocCollector(10);
  try {
  searcher.search(parser.parse(query), collector);
  }
  catch (ParseException e) {
  }
  Results results = new Results();
  results.totalHits = collector.getTotalHits();
  TopDocs topDocs = collector.topDocs();
  ScoreDoc docs[] = topDocs.scoreDocs;
  float maxScore = topDocs.getMaxScore();
  for (int i = 0; i < docs.length; i++) {
  Result result = new Result();
  result.score = docs[i].score / maxScore;
  result.doc = searcher.doc(docs[i].doc);
  results.results.add(result);
  }
  return results;

--- 
--

To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




--- 
--

To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: question about grouping text

2009-03-26 Thread Amin Mohammed-Coleman
Hi

I was wondering if soemthing like LingPipe or Gate (for text extraction)
might be an idea?  I've started looking at it and I'm just thinking it may
be applicable (I maybe wrong).

Cheers
Amin

On Wed, Mar 25, 2009 at 4:18 PM, Grant Ingersoll wrote:

> Hi MFM,
>
> This comes down to a preprocessing step that you would have to do before
> putting into Lucene, although I suppose you might be able to identify it
> during analysis and use the TeeTokenFilter and the SinkTokenizer.  Once you
> do this, then you can add them as fields on a Document.  I know that's not a
> great help, but not much Lucene can do b/c it is application specific.
>
> Document/field wise, I would probably have:
> Document
>   question
>   answer
>
> Then, when you search in the question field, you can also retrieve the
> answer.
>
> -Grant
>
>
> On Mar 24, 2009, at 4:04 PM, MFM wrote:
>
>
>> I have been able to successfully index and search text from structured
>> documents like PDF and MS Word. I am having a real hard time trying to
>> figure out how to group the index strings together e.g. if my document had
>> a
>> question and answer in a table, the search will produce the text with the
>> question based on the keyword. How would I group or associate the question
>> and answer as part of the indexing ? I have tried using POI to read thru
>> the
>> MS Word file and try and group them, but then it gets really intense into
>> pattern matching.
>>
>> Thanks
>> MFM
>> --
>> View this message in context:
>> http://www.nabble.com/question-about-grouping-text-tp22682433p22682433.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Syncing lucene index with a database

2009-03-27 Thread Amin Mohammed-Coleman

Hi

I was going to suggest looking at hibernate search. It comes with  
event listeners that modify your indexes when the persistent entity  
changes. It use lucene under the hood so if you need to access lucene  
the you can.


Indexing can be done sync or async and the documentation shows how to  
set jms.


There other benefits of hibernate search which you find on the site  
and documentation.


HTH
Amin


On 27 Mar 2009, at 00:03, Tim Williams  wrote:

On Thu, Mar 26, 2009 at 6:28 PM, Matt Schraeder  
 wrote:
I'm new to Lucene and just beginning my project of adding it to our  
web

app.  We are indexing data from a MS SQL 2000 database and building
full-text search from it.

Everything I have read says that building the index is a resource  
heavy
operation so we should use it sparingly.  For the most part the  
database
table we are working from is updated once a day so as soon as the  
table
itself is updated we can rebuild our Lucene indexes.  However,  
there are
a few feilds that get updated with a cronjob every 15 minutes.  In  
terms
of speed and efficiency, what would be a better system for keeping  
our

data synced between the database and Lucene?

Of course one option would be to rebuild the Lucene index each time  
the

cronjob runs to keep the database and Lucene index synced.  We could
either return the entire database table, loop through the rows, get a
row's document in lucene remove/readd it, and do that for each row.
Alternatively after we update the main table we return just the rows
that were changed, loop through those and remove/readd them in  
lucene,

and do that for just the rows that have changed.

Alternatively I have thought of using Lucene purely for search to
return just the primary key of items from our database table, then  
query
the database for those items and get the most up to date data from  
the
database to actually display our search results.  This would let us  
use
Lucene's superior searching capabilities and searching speed, but  
would

still require us to pull the data to be displayed from the database.

Another option is that we could do the same, but only return the  
fields
that could change frequently.  This would use Lucene to store and  
index
the majority of what is displayed on a search results page, only  
using
the database to return the 2 or 3 fields that might change in a  
search

for each row that lucene returns.

I'm honestly not sure what the "proper" choice should be, or if it
really depends on our own test cases.  Is it perfectly okay to run an
index update every 15 minutes? How much difference would it make in
terms of search time to search with lucene AND pull from the  
database?

My main issue with searching with lucene but getting the actual data
from the database is that it seems like that would make our current
search system that is entirely database driven to run slower.



Not sure what ORM framework, if any, you might be using, but some
colleagues have had some success using Hibernate Search[1] for this
sorta thing.  I've not used it, just a pointer in case you haven't
come across it...  seems that it would keep you above some low-level
details if it fits...

--tim

[1] - http://www.hibernate.org/410.html

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



  1   2   >