Re: [RegexQuery] how to check what words were founded in particulary Documents ?

2007-07-20 Thread Erik Hatcher
Erick - you're not missing anything, except that the original poster is after RegexQuery, not WildcardQuery. Both work basically the same way, except in the pattern matching capabilities. Erik On Jul 20, 2007, at 5:45 PM, Erick Erickson wrote: Erik: Well, you wrote the book . But

Re: [RegexQuery] how to check what words were founded in particulary Documents ?

2007-07-20 Thread Erick Erickson
Erik: Well, you wrote the book . But I thought something like this would work TermDocs td = reader.termDocs(); WildcardTermEnum we = new WildcardTermEnum(reader, new term("field", "c*t")); while (we.next()) { td.seek(we); while (td.next()) { report document contains term; } } Although I

Re: [RegexQuery] how to check what words were founded in particulary Documents ?

2007-07-20 Thread Erik Hatcher
Erick - I think you're mixing things up with WildcardQuery. RegexQuery does support all regex capabilities (depending on the underlying regex matcher used). A couple of techniques you could use to achieve the goal: * Use RegexTermEnum, though that'll give you the terms across the entire

Re: [RegexQuery] how to check what words were founded in particulary Documents ?

2007-07-20 Thread Erick Erickson
First, the period (.) isn't part of the syntax, so make sure you look more carefully at the Lucene syntax... Then, you might be able to use WildcardTermEnum to find the terms that match and TermDocs to find the documents that contain those terms. There's nothing built into Lucene to do this out

[RegexQuery] how to check what words were founded in particulary Documents ?

2007-07-20 Thread mhzmark
Hello. Let assume that I have this code in my application: (...) Query query = new RegexQuery(new Term("field", "C.T"));; // searching... (...) And now, I would like to know if my application founded "cat" or "cot" or something else. How can I check what was founded by my applicati

RE: Lucene shows parts of search query as a HIT

2007-07-20 Thread Ard Schrijvers
Hello, hits.score(i) See http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/Hits.html Ard > Hey guys, > > Thanks for all the response. One more thing, I am accessing > my documents > like this: > > for (int i = 0; i < hitCount; i++) { >

Re: multi-field and wildcard query highlighter questions

2007-07-20 Thread Mark Miller
1) Perhaps the the query you tried does not match anything in your index? What release are you using? [prefix*] works fine for me. 2) The Highlighter should not care if you have more than one field with the same name in a document. The Highlighter does not deal with documents. It takes a Token

Re: Lucene shows parts of search query as a HIT

2007-07-20 Thread Askar Zaidi
Hey guys, Thanks for all the response. One more thing, I am accessing my documents like this: for (int i = 0; i < hitCount; i++) { Document doc = hits.doc(i); System.out.println(" " + (i + 1) + ". " + doc.get("item")); } This shows me the item value.

multi-field and wildcard query highlighter questions

2007-07-20 Thread Lukas Vlcek
Hi, I have two questions: 1) Is it possible to get some highlighted text when using wildcard query? (I am using query rewrite) I found that it works for queries like [prefix*suffix] or [prefix?suffix] but I was not able to get results for queries like [prefix*] 2) What kind of problems I should

Re: TermEnum - previous() method ?

2007-07-20 Thread Erick Erickson
You know, every time I see a solution like this, I have to smack myself upside the head. As an old C programmer, I have a real problem "wasting", say, a few tens of megabytes just to make my life easier. Of disk space. That costs virtually nothing. And (as you say, assuming some constraints) does

RE: TermEnum - previous() method ?

2007-07-20 Thread Will Johnson
One other possibility I've used in the past: Store 2 fields, one with the normal characters, one with each character inversed, A->Z, Y->B, and so on. As long as you have the function to go from the normal->reversed and reversed->normal you can iterate over both fields in a forward only mode and j

Re: TermEnum - previous() method ?

2007-07-20 Thread Erick Erickson
There's nothing that I know of built in to Lucene that allows you to do anything like termenum.previous(), I'm pretty sure you have to roll your own. Some possibilities: 1> Depending upon how many terms in your index, you could just read all the authors into a Java collection of your choice at s

Re: deleting/updating/identifying a document

2007-07-20 Thread Otis Gospodnetic
Hi Samuel, Indeed, you can have a PK-like identifier field in each Lucene Document and use deleteDocument(new Term("your PK field", "your ID")) While Having an ID field that uniquely identifies a document is not a must, it is a Lucene best practice in my book and experience so far. Otis -- Luce

RE: deleting/updating/identifying a document

2007-07-20 Thread Chhabra, Kapil
Is it not true for any RDBMS table as well which does not have a Primary Key? If this is a problem that you are facing, then it can be solved by introducing one unique identifier as a field in your index which would act as a Primary Key for your index. Using an untokenized field might not be a good

deleting/updating/identifying a document

2007-07-20 Thread Samuel LEMOINE
Hi everybody ! I'm asking myself about the way Lucene deals with deleting documents. As far as I know, a document is identified by a document number, but this document number is not reliable for long-term issues as it may change on segment merging. The way Lucene deletes documents' data from th

Re: TermEnum - previous() method ?

2007-07-20 Thread muraalee
Hi Mark, Thanks for your inputs. >> I do wonder why you want a previous though? It sounds like you might be >> better off heading down a different path... In our content, we have indexed Author as separate field. We want to expose a feature to 'browse' the Author list. They can type any author

Re: TermEnum - previous() method ?

2007-07-20 Thread markrmiller
I am not very familiar with the Lucene file formats, but I think that there is a lot of "this number tells you how far ahead to read" when enumerating terms. As you might guess, I think this lends toward reading the terms file forward. Not that an index couldn't point you into the terms index some

Re: Lucene and XML Architecture

2007-07-20 Thread Thomas
Thanx a lot Patrick! That's exactly what I was hoping for. I'll give it a shot. -Thomas Patrick Turcotte wrote: Hi, There is a Lucene-eXist trigger that allows you to do just that. Take a look at patch http://sourceforge.net/tracker/index.php?func=detail&aid=1654205&group_id=17691&atid=317691

RE: Question regarding ignore case?

2007-07-20 Thread Chhabra, Kapil
I don't think that there is any other way out apart from re-indexing in all-small or all-caps case(through an Analyzer or externally), and then searching in the same case as you used while indexing. Even if you find a way by which you can run case insensitive searches, I am sure it'll add to the co

RE: TermFreqVector

2007-07-20 Thread Chhabra, Kapil
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Hits.ht ml#id(int) public final int id(int n) throws IOException Returns the id for the nth document in this set. Note that ids may change when the index changes, so you cannot rely on the id to be stable. kapilCh

RE: How to open the term vector storage?

2007-07-20 Thread Jun.Chen
Thank you. Document 1.9 is very useful. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 2007年7月20日 10:18 上午好,Daniel To: Lucene Users Subject: RE: How to open the term vector storage? : I also have this problem... : Field.Text : Field.Keyword : ... : I cannot fin

RE: Question regarding ignore case?

2007-07-20 Thread Liu_Andy2
When index, you can add LowerCaseFilter to your analyzer, or just use some analyzer already done lowercase, such as StandardAnalyzer,SimpleAnalyzer Andy -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of li hao cho Sent: Friday, July 20, 2007 2:52 PM To: jav

Re: Upgrade of Lucene from 1.9.1 to 2.2.0 in JIRA

2007-07-20 Thread Dmitry
Andreas, Thanks, I get it now. www.ejinz.com - Original Message - From: "Andreas Knecht" <[EMAIL PROTECTED]> To: Sent: Friday, July 20, 2007 2:23 AM Subject: Re: Upgrade of Lucene from 1.9.1 to 2.2.0 in JIRA Hi Dmitry, The full re-index is necessary, because DateTools stores dates i

Re: Upgrade of Lucene from 1.9.1 to 2.2.0 in JIRA

2007-07-20 Thread Andreas Knecht
Hi Dmitry, The full re-index is necessary, because DateTools stores dates in a different (human-readable) format in the index. This means that JIRA/lucene wont be able to read dates from the current indexes which use dates stored in the old format. Cheers, Andreas Folks, why do we need fu