Hi,

This question is going to be a little complicated to explain, but let me try.

I have implemented an indexer app based on the demo IndexFiles app, and a web 
app based on the luceneweb web app for the searching.

In my case, the "Documents" that I'm indexing are a proprietary file type, and 
each document has kind of "sub-documents".  So, in my indexer, I parse each of 
the sub-documents, and, for a given "Document", I build a long string 
containing terms that I extracted from each of the sub-documents, then I do:

doc.add(new Field("contents", longstring, Field.Store.YES, 
Field.Index.ANALYZED));

I also add the longstring to another non-indexed field, summary:

doc.add(new Field("summary", longstring, Field.Store.YES, Field.Index.NO));

The modified luceneweb web app that I use is pretty vanilla, and originally, 
what I was asked to do was to be able to search just for a Document, i.e., 
given a query like "X and Y" (document containing both term=X and term=Y), 
return the file path+name for the document.  I also was displaying the terms 
associated with each sub-document by parsing the 'summary' string.

So, for example, if "Document1" contained 3 sub-documents (which contained 
(term1, term2), (term1a, term2a), and (term1b, term2b), respectively), and if I 
queried for "term1a AND term2a", the web app would display something like:

Document1                 subdoc1 term1 term2
                                      subdoc2 term1a term2a
                                      subdoc3 term1b term2b

However, I've now been asked to implement the ability to query the 
sub-documents. 

In other words, rather than the web app displaying what I showed above, they 
want it to return something like just:

Document1                 subdoc2 term1a term2a

Right now, the web app gets the 'summary' (again, in a long string), then just 
breaks it into subdoc1, subdoc2, and subdoc3 lines, just for display purposes, 
so to do what I've been asked, I need to query the 3 sub-strings from the 
'summary', i.e., run the "term1a AND term2a" query against the following 
strings:

subdoc1 term1 term2
subdoc2 term1a term2a
subdoc3 term1b term2b

I guess that I can write a method to do that, but I want to make sure that the 
sub-document/string query "duplicates" the behavior of the Lucene query.

It seems like there should be a way to duplicate the Lucene query logic by 
using something (methods) in Lucene itself??

I've been reviewing the Javadocs, but I'm still fairly new to Lucene, so I was 
hoping that someone could point me in the right direction?

My apologies for the longish post, but I hope that I've been able to explain 
clearly :)!!

Thanks,
Jim

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to