Ah, I see. More complicated than I realized. How about using two sorts of documents.
Type 1, one lucene doc for your example textid: 1234 text: some text about something Type 2, 3 lucene docs for your example First textid: 1234 company: IBM score: 0.6 Second textid: 1234 company: Google score: 0.1 Third textid: 1234 company: Apple score: 0.4 You could then use the BooleanQuery approach to get textids, with an additional lookup to get the actual text. Not brilliant and won't work if you want text:aaaa company:google minconf:0.1 There is BlockJoinQuery in recent versions that gives some sort of parent/child relationship. Might be worth a look. Or wait for a better idea from someone else. -- Ian. On Wed, Mar 21, 2012 at 3:48 PM, Deb Lucene <deb.luc...@gmail.com> wrote: > Hi Ian, > > Thanks for the reply. I am not sure if the bq solution will b able to solve > the problem. Let me explain with an example - > > document 1 - (some text) > IBM - 0.6 > Google - 0.1 > Apple - 0.4 > > Now suppose I index the document based on the "company name" and > "confidence scores" separately and search using the bq where the Numeric > Field search is based on "anything below 0.5" and text = "IBM". Here, by > mistake the document 1 will be chosen (as it has been stored with 0.6, 0.1 > and 0.4). But actually it should not be - as the "IBM" score is 0.6. So in > gist - this problem needs some sort of linking between the company name and > the scores. > > --d > > > > On Wed, Mar 21, 2012 at 10:41 AM, Ian Lea <ian....@gmail.com> wrote: > >> Why do you want to link name and confidence in one field? Store >> confidence as a NumericField and search something like >> >> BooleanQuery bq = new BooleanQuery(); >> Query nameq = parser.parse(...) or whatever >> Query confq = NumericRangeQuery.newXxx(...); >> bq.add(nameq, ...); >> bq,add(confq, ...); >> >> and search using bq. >> >> >> -- >> Ian. >> >> >> On Wed, Mar 21, 2012 at 2:20 PM, Deb Lucene <deb.luc...@gmail.com> wrote: >> > Hi Group, >> > >> > Sorry for cross posting! >> > >> > We need to index a document corpus (news articles) with some meta data >> > features. The meta data are actually company names with some scoring (a >> > double, between 0 to 1). For example, two documents can be - >> > >> > document 1 >> > (some text - say a technical article from NY times). It comes with the >> > metadata like - >> > IBM - 0.5 >> > Google - 0.9 >> > Apple - 0.3 >> > >> > where 0.5, 0.9, 0.3 are some confidence scores for the company names. >> > >> > Similarly, the document 2 is about some IT article and then the meta data >> > are like - >> > IBM - 0.6 >> > Google - 0.1 >> > Apple - 0.4 >> > >> > now we can index the documents based on the contents or the company names >> > easily. But here the problem is we need to create a "field" where the >> > company names and the scores are linked. So that we can search something >> > like - >> > >> > query = where the "company name" (a field) is "IBM" and the scores of IBM >> > is > 0.5. >> > So in that case the document 2 will be retrieved. >> > >> > I am wondering if anyone has ideas about using the company names and >> scores >> > (linked) together as a field. >> > >> > Thanks in advance, >> > >> > --d >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org