Re: group field selection of the form field:(a b c)

2006-09-12 Thread Erick Erickson
As long as the field is added to the *same* document, I don't see a problem with option B, although I'll admit that I haven't used MultiFieldQueryParser. But there was a discussion a while ago about adding tokens with the same field name to a document via document.add being exactly the same as add

Re: UTF8 accents & umlauts filter?

2006-09-12 Thread Ken Krugler
Thanks for the links Michael... this one does look interesting: http://dev.alt.textdrive.com/browser/lu/LUStringBasicLatin.txt The challenge would be to make it fast... perhaps a custom hash table, or look into the cost of a perfect hash function. Just to clear up some unicode/terminology issues:

RE: group field selection of the form field:(a b c)

2006-09-12 Thread Doron Cohen
It think option B cannot work because due to the MUST operator it requires both "databasemanagement" and "accountmanagement" to be in the subtype field. Option A however should work, once the padding blank spaces are removed from the field name - notice that while the standard analyzer would trim

Re: UTF8 accents & umlauts filter?

2006-09-12 Thread Yonik Seeley
Thanks for the links Michael... this one does look interesting: http://dev.alt.textdrive.com/browser/lu/LUStringBasicLatin.txt The challenge would be to make it fast... perhaps a custom hash table, or look into the cost of a perfect hash function. Just to clear up some unicode/terminology issues:

UTF8 accents & umlauts filter?

2006-09-12 Thread Michael Imbeault
Right now Lucene has an accent filter (ISOLatin1AccentFilter) that remove accents on ISO-8859-1 text. What about a UTF8AccentFilter? Is it planned to add such a filter (which would be very useful, as ISOLatin1AccentFilter isn't able to remove some complex accents on some languages encoded in UT

RE: group field selection of the form field:(a b c)

2006-09-12 Thread Pramodh Shenoy
The spaces just came i guess when i copied the code to outlook :-), actually there arent any. Let me take a look at Luke , especially testing to see what should be returned when i run the aprsed query.. sounds very interesting.. Thanks a lot Pramodh From: Eric

Re: getCurrentVersion question

2006-09-12 Thread Erick Erickson
Just add another document (I do something similar). The key is to remember that documents in the same index do NOT have to have the same fields. So, say for your "regular" documents, you have fields (f1, f2, f3, f4). For your meta-data document, you index fields (md1, md2, md3...). The value for o

Re: group field selection of the form field:(a b c)

2006-09-12 Thread Erick Erickson
Interestingly, you have extra spaces when you construct your queries, e.g. queries[2]= " accountmanagement" has an extra space at the beginning but when you index the document, there are no spaces. I believe that since you're indexing the fields UN_TOKENIZED, that the spaces are preserved in the q

Re: Using Hibernate to store Lucene Indexes in a Database

2006-09-12 Thread Beady Geraghty
I don't know if the use of a DATALINK data type would be relevant in your case. Here are some references. http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/start/c0005450.htm http://www.oracle.com/technology/sample_code/tech/java/codesnippet/jdbc/datalink/read

Re: getCurrentVersion question

2006-09-12 Thread Mag Gam
Tom: great! Now do you do you add metadata? I am new to Lucene API + Java, but willing to learn. Got an example? TIA On 9/12/06, Tom Emerson <[EMAIL PROTECTED]> wrote: As far as I know there isn't a way to do this. What we do is add a "metadata" document to each index that includes the creat

group field selection of the form field:(a b c)

2006-09-12 Thread Pramodh Shenoy
Hi Eric/Usergroup, I am working on a help content index-search project based on Lucene. One of my requirements is to search for a particular text in the content of files from specific directories. When I index the content Eg. guides/accountmanagement/index.htm and guides/databasemanagement

Re: SV: Changing the Scoring api

2006-09-12 Thread Chris Hostetter
: However the BooleanQuery's disableCoord seems to make effect. : But I still have the problem when I'm constructing queries with wildcards. really? ... that's strange, WildcardQuery uses the disableCoord feature of BooleanQuery. Do you have an example of what you mean? : already had implemente

Storing fields without term positions

2006-09-12 Thread Timo Nentwig
Hi everybody, is it possible to store fields without term position (the .prx file) data? We store sort of custom data in the field and use it as some sort of a filter for queries, so we just don't need any term position data and it bloats the index' size nearly by factor 3. Thanks Timo ---

Re: getCurrentVersion question

2006-09-12 Thread Tom Emerson
As far as I know there isn't a way to do this. What we do is add a "metadata" document to each index that includes the creation date, the user name of the creating user, and various other tidbits. This gets updated on incremental updates to the index as well. Easily done and makes it easy to query

Re: Highligher Example

2006-09-12 Thread Tom Emerson
Autonomy's KeyView is an alternative to Stellent. It does not cover all of the file formats that Stellent does, though many of them are probably not interesting for most applications. When I last looked at it it did not handle mail archives, though there was a plan to add it. I found it more stabl

SV: Changing the Scoring api

2006-09-12 Thread Marcus Falck
However the BooleanQuery's disableCoord seems to make effect. But I still have the problem when I'm constructing queries with wildcards. / Marcus -Ursprungligt meddelande- Från: Marcus Falck [mailto:[EMAIL PROTECTED] Skickat: den 12 september 2006 09:34 Till: java-user@lucene.apache.or

SV: Changing the Scoring api

2006-09-12 Thread Marcus Falck
Hi Hoss, No it wasn't any thing wrong with your suggestions except that they had landed in my junk mail for some reason, stupid outlook. However I haven't had any chance testing all of your suggestions but I already had implemented my own similarity class that has the coord fixed to 1. And it