Hi All, Thanks for the quick reply.
It seems like indeed my index is not what I think it is so maybe I'm using the wrong analyzer. Here is the code I use to index the multiple values of alt_id: indexWriter = new IndexWriter(FSDirectory.open(new File(path)),newStandardAnalyzer(Version. LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED); for(String geneId : genes.keySet()){ //Update Gene Document gene = new Document(); gene.add(new Field("type", "gene", Field.Store.YES, Field.Index.ANALYZED )); gene.add(new Field("id", geneId, Field.Store.YES, Field.Index.ANALYZED)); for(String altId : genes.get(geneId)){ gene.add(new Field("alt_id", altId, Field.Store.YES, Field.Index.ANALYZED )); } indexWriter.updateDocument(new Term("id", geneId), gene); } I understand that the best approach is to have the values for alt_id as single tokens, isn't that what the current analyzer does? wich one should I use instead? Cheers, José M. Villaveces On 25 June 2012 15:58, Erick Erickson <erickerick...@gmail.com> wrote: > TermQuerys are assumed to be parsed already. So you're > looking for a _single_ term "ncbi-geneid:379474 or XI.24622". > > > You'd construct something like > Query query1 = new TermQuery(new Term("type", "gene")); > Query query2 = new TermQuery(new Term("alt_Id", "ncbi-geneid:379474")); > Query query3 = new TermQuery(new Term("alt_Id", "unigene:XI.24622")); > > BooleanQuery query = new BooleanQuery(); > query.add(query1, BooleanClause.Occur.MUST); > > BooleanQuery queryB = new BooleanQuery(); > queryB.add(query2, ...SHOULD); > queryB.add(query3, ...SHOULD); > > query.add(queryB, BooleanClause.Occur.MUST); > > > But this _assumes_ that you have _single tokens_ of the > form ncbi-geneid:379474 but given that you say that just the > bare 379474 works, I'm guessing as Ian says that you don't > have what you think you do in your index, you probably have > individual tokens like "ncbi-geneid" (or "ncbi" and "geneid" even), > BC054227, xia, etc. You need to look into your index with Luke > and see what's actually in there. > > You might think about installing Solr, _not_ to power your app, but just > to play with the admin/analysis page to understand how > Analysis works with various combinations of tokenizers and filters.... > > Best > Erick > > On Mon, Jun 25, 2012 at 8:50 AM, <seceval...@gmail.com> wrote: > > I'm quite new to Lucene and recently, I ran into a problem. I have a > lucene > > document that looks like this: > > > > --- type --- > > gene > > > > --- id --- > > xla:379474 > > > > --- alt_id --- > > emb:BC054227 > > gb:BC054227 > > ncbi-geneid:379474 > > ncbi-gi:148230166 > > rs:NM_001086315 > > rs:NP_001079784 > > unigene:Xl.24622 > > xla:379474 > > > > > > I created the query bellow in order to retrieve that document. It works > > fine for altId = 379474 but not for altId = ncbi-geneid:379474 or > Xl.24622. > > I guessed altId must be escaped and tried String altId = > > QueryParser.escape(altId) with no luck. What I'm I missing? > > > > Query query1 = new TermQuery(new Term("type", "gene")); > > Query query2 = new TermQuery(new Term("alt_Id", altId)); > > > > BooleanQuery query = new BooleanQuery(); > > query.add(query1, BooleanClause.Occur.MUST); > > query.add(query2, BooleanClause.Occur.MUST); > > > > By the way I'm running lucene v3.0. > > > > Cheers, > > José M. Villaveces > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >