Thanks for the advice Ian. As you suggested I tried indexing alt_id as Index.NOT_ANALYZED and stick with TermQuery. It works now.
Thanks again, José M. Villaveces On 25 June 2012 17:27, Ian Lea <ian....@gmail.com> wrote: > The key thing is to be consistent. You can either replace your > TermQuery code with the output from QueryParser.parse, with QP created > with StandardAnalyzer, or index alt_id as Index.NOT_ANALYZED and stick > with TermQuery. I think the latter will work even with multiple > terms/tokens stored for alt_id. I'd try that first. > > > > -- > Ian. > > > On Mon, Jun 25, 2012 at 3:51 PM, <seceval...@gmail.com> wrote: > > Hi All, > > > > Thanks for the quick reply. > > > > It seems like indeed my index is not what I think it is so maybe > > I'm using the wrong analyzer. Here is the code I use to index the > multiple > > values of alt_id: > > > > indexWriter = new IndexWriter(FSDirectory.open(new > > File(path)),newStandardAnalyzer(Version. > > LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED); > > > > for(String geneId : genes.keySet()){ > > > > //Update Gene > > > > Document gene = new Document(); > > > > gene.add(new Field("type", "gene", Field.Store.YES, Field.Index.ANALYZED > > )); > > > > gene.add(new Field("id", geneId, Field.Store.YES, > Field.Index.ANALYZED)); > > > > for(String altId : genes.get(geneId)){ > > > > gene.add(new Field("alt_id", altId, Field.Store.YES, > Field.Index.ANALYZED > > )); > > > > } > > > > indexWriter.updateDocument(new Term("id", geneId), gene); > > > > } > > I understand that the best approach is to have the values for alt_id as > > single tokens, isn't that what the current analyzer does? wich one > should > > I use instead? > > > > Cheers, > > > > José M. Villaveces > > > > > > On 25 June 2012 15:58, Erick Erickson <erickerick...@gmail.com> wrote: > > > >> TermQuerys are assumed to be parsed already. So you're > >> looking for a _single_ term "ncbi-geneid:379474 or XI.24622". > >> > >> > >> You'd construct something like > >> Query query1 = new TermQuery(new Term("type", "gene")); > >> Query query2 = new TermQuery(new Term("alt_Id", "ncbi-geneid:379474")); > >> Query query3 = new TermQuery(new Term("alt_Id", "unigene:XI.24622")); > >> > >> BooleanQuery query = new BooleanQuery(); > >> query.add(query1, BooleanClause.Occur.MUST); > >> > >> BooleanQuery queryB = new BooleanQuery(); > >> queryB.add(query2, ...SHOULD); > >> queryB.add(query3, ...SHOULD); > >> > >> query.add(queryB, BooleanClause.Occur.MUST); > >> > >> > >> But this _assumes_ that you have _single tokens_ of the > >> form ncbi-geneid:379474 but given that you say that just the > >> bare 379474 works, I'm guessing as Ian says that you don't > >> have what you think you do in your index, you probably have > >> individual tokens like "ncbi-geneid" (or "ncbi" and "geneid" even), > >> BC054227, xia, etc. You need to look into your index with Luke > >> and see what's actually in there. > >> > >> You might think about installing Solr, _not_ to power your app, but just > >> to play with the admin/analysis page to understand how > >> Analysis works with various combinations of tokenizers and filters.... > >> > >> Best > >> Erick > >> > >> On Mon, Jun 25, 2012 at 8:50 AM, <seceval...@gmail.com> wrote: > >> > I'm quite new to Lucene and recently, I ran into a problem. I have a > >> lucene > >> > document that looks like this: > >> > > >> > --- type --- > >> > gene > >> > > >> > --- id --- > >> > xla:379474 > >> > > >> > --- alt_id --- > >> > emb:BC054227 > >> > gb:BC054227 > >> > ncbi-geneid:379474 > >> > ncbi-gi:148230166 > >> > rs:NM_001086315 > >> > rs:NP_001079784 > >> > unigene:Xl.24622 > >> > xla:379474 > >> > > >> > > >> > I created the query bellow in order to retrieve that document. It > works > >> > fine for altId = 379474 but not for altId = ncbi-geneid:379474 or > >> Xl.24622. > >> > I guessed altId must be escaped and tried String altId = > >> > QueryParser.escape(altId) with no luck. What I'm I missing? > >> > > >> > Query query1 = new TermQuery(new Term("type", "gene")); > >> > Query query2 = new TermQuery(new Term("alt_Id", altId)); > >> > > >> > BooleanQuery query = new BooleanQuery(); > >> > query.add(query1, BooleanClause.Occur.MUST); > >> > query.add(query2, BooleanClause.Occur.MUST); > >> > > >> > By the way I'm running lucene v3.0. > >> > > >> > Cheers, > >> > José M. Villaveces > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >