Re: Complete field search

2007-03-13 Thread Doron Cohen
This came up in the list with several solutions - look for: Asserting that a value must match the entire content of a field Doron "Kainth, Sachin" <[EMAIL PROTECTED]> wrote on 13/03/2007 03:18:50: > Hi all, > > Is it possible to search whether a term is equal to the entire contents > of a fiel

Re: adding a field to every document

2007-03-13 Thread Daniel Noll
I'd like to add a field to every document in an index... that I'd rather not rebuild from scratch (yet). This is behind Solr (so a ParallelReader won't work without core modifications, right?). Is there a way I could create an index with the same number of documents and only the new fie

the format of tii file

2007-03-13 Thread xh sun
Hi all, I try to analysis a sample tii file(lucene 2.0.0), IndexTermCount is 2 in the file, but I don't know the meaning of these bytes "00 00 FF FF FF FF 0F 00 00 00 14" after the field SkipInterval. It shall be a according to the file format. Who can help me on this? thanks a lot. The att

adding a field to every document

2007-03-13 Thread Erik Hatcher
I'd like to add a field to every document in an index... that I'd rather not rebuild from scratch (yet). This is behind Solr (so a ParallelReader won't work without core modifications, right?). Is there a way I could create an index with the same number of documents and only the new field

lengthNorm accessible?

2007-03-13 Thread maureen tanuwidjaja
hmmm...now I wonder wheter it is possible to access this lengthNorm value so that it can be used as before but without creating any nrm file --> setOmitNorm = true Any other suggestion on how i could get the same rank as before by making use of this lengthNorm but without creating nrm fil

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread Xiaocheng Luan
You can store the fields in the index itself if you want, without indexing them (just flag it as stored/unindexed). I believe storing fields should not incur the "norms" size problem, please correct me if I'm wrong. Thanks, Xiaocheng maureen tanuwidjaja <[EMAIL PROTECTED]> wrote: Ya...I think i

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread maureen tanuwidjaja
Ya...I think i will store it in the database so that later it could be used in scoring/ranking for retrieval...:) Another thing i would like to see is whether the precision or recall will be much affaected by this... Regards, Maureen Xiaocheng Luan <[EMAIL PROTECTED]> wrote:One side

Re: Complete field search

2007-03-13 Thread Xiaocheng Luan
Or, you may index the fields that you want "exact matches" as non-tokenized. Thanks, Xiaocheng Bhavin Pandya <[EMAIL PROTECTED]> wrote: Hi kainth, >So for example if I have a field with this text: "world cup" and I do a >search for "cup" I want it to return false but for another field that >conta

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread Xiaocheng Luan
One side-effect of turning off the norms may be that the scoring/ranking will be different? Do you need to search by each of these many fields? If not, you probably don't have to index these fields (but store them for retrieval?). Just a thought. Xiaocheng Michael McCandless <[EMAIL PROTECTED]>

Re: FieldCache: flush cache explicitly

2007-03-13 Thread Otis Gospodnetic
John - a bug with code is best. No gods here. Otis - Original Message From: John Wang <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, March 9, 2007 2:22:35 AM Subject: FieldCache: flush cache explicitly I think the api should allow for explicitly flush the fieldcache.

Re: [Urgent] deleteDocuments fails after merging ...

2007-03-13 Thread Antony Bowesman
Erick Erickson wrote: The javadocs point out that this line * int* nb = mIndexReaderClone.deleteDocuments(urlTerm) removes*all* documents for a given term. So of course you'll fail to delete any documents the second time you call deleteDocuments with the same term. Isn't the code snippet belo

Re: Wildcard searches with * or ? as the first character

2007-03-13 Thread Antony Bowesman
I have read that with Lucene it is not possible to do wildcard searches with * or ? as the first character. Wildcard searches with * as the Lucene supports it. If you are using QueryParser to parse your queries see http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryPars

Re: IndexReader.GetTermFreqVectors

2007-03-13 Thread Grant Ingersoll
It means it return the term vectors for all the fields on that document where you have enabled TermVector when creating the Document. i.e. new Field(, TermVector.YES) (see http://lucene.apache.org/ java/docs/api/org/apache/lucene/document/Field.TermVector.html for the full array of optio

RE: Wildcard searches with * or ? as the first character

2007-03-13 Thread Steven Parkes
It's possible to do leading wildcard searches in Lucene as of 2.1. See http://wiki.apache.org/lucene-java/LuceneFAQ#head-4d62118417eaef0dcb87f4370583f809848ea695 (http://tinyurl.com/366suf) -Original Message- From: Oystein Reigem [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 13, 2007 11

Wildcard searches with * or ? as the first character

2007-03-13 Thread Oystein Reigem
Hi, I have read that with Lucene it is not possible to do wildcard searches with * or ? as the first character. Wildcard searches with * as the first character (or both first and last character) are useful for text in languages that have a lot of compound words, like German and the Scandinavi

IndexReader.GetTermFreqVectors

2007-03-13 Thread Kainth, Sachin
Hi all, The documentation for the above method mentions something called a vectorized field. Does anyone know what a vectorized field is? This email and any attached files are confidential and copyright protected. If you are not the addressee, any dissemination of this communication is stric

RE: [Urgent] deleteDocuments fails after merging ...

2007-03-13 Thread DECAFFMEYER MATHIEU
Thank u Erick, I'll look more into docs to check why I get a search result and no deletion ... could have been less rude to me though ... I feel a very mean person now :-( anyway thank u for your time __ Matt -Original Message- From: Erick Eri

Re: [Urgent] deleteDocuments fails after merging ...

2007-03-13 Thread Erick Erickson
Well, don't label things urgent. Since this forum is is free, you have no right to demand a quick response. You'd get better responses if there was some evidence that you actually tried to find answers to your questions before posting them. We all have other duties, and taking time out to answer

[Urgent] deleteDocuments fails after merging ...

2007-03-13 Thread DECAFFMEYER MATHIEU
Hi, I have put this question as "urgent" because I can notice I don't have often answers, If I'm asking the wrong way, please tell me... Before I delete a document I search it in the index to be sure there is a hit (via a Term object), When I find a hit I delete the document (with the same Term

Re: Highlighting of original documents

2007-03-13 Thread Oystein Reigem
Mark Miller wrote: Depends on the work you want to do. If you want to highlight a simple XML doc the approach would be to extract all of the text elements and run them through the highlighter and then correctly update them. That would be mostly simple DOM manipulation. OK. I guess there wil

Finding matched terms

2007-03-13 Thread Walt Stoneburner
When performing a query and getting a result set back, if one wants to know which terms from the query actually matched, is Highlighter still the best way to go with the latest Lucene, or should I start looking at query term frequency vectors? Just trying to find a non-expensive way of doing this

Re: How to disable lucene norm factor?

2007-03-13 Thread maureen tanuwidjaja
ok mike.I'll try it and see wheter could work :) then I will proceed to optimize the index. Well then i guess it's fine to use the default value for maxMergeDocs which is INTEGER.MAX? Thanks a lot Regards, Maureen Michael McCandless <[EMAIL PROTECTED]> wrote: "maureen tanuwidj

Re: How to disable lucene norm factor?

2007-03-13 Thread Michael McCandless
"maureen tanuwidjaja" <[EMAIL PROTECTED]> wrote: > How to disable lucene norm factor? Once you've created a Field and before adding to your Document index, just call field.setOmitNorms(true). Note, however, that you must do this for all Field instances by that same field name because whenever

Re: Highlighting of original documents

2007-03-13 Thread Mark Miller
Depends on the work you want to do. If you want to highlight a simple XML doc the approach would be to extract all of the text elements and run them through the highlighter and then correctly update them. That would be mostly simple DOM manipulation. The same approach should work with any forma

How to disable lucene norm factor?

2007-03-13 Thread maureen tanuwidjaja
Hi all, How to disable lucene norm factor? Thanks, Maureen - We won't tell. Get more on shows you hate to love (and love to hate): Yahoo! TV's Guilty Pleasures list.

Highlighting of original documents

2007-03-13 Thread Oystein Reigem
Hi, I want to implement fulltext search on a collection of documents. I try to figure out which system is the better choice - eXist, or Lucene, or some combination of the two. I have some knowledge of eXist, but don't know too much about Lucene. I'd like to display the result of a search as

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread maureen tanuwidjaja
Hi Mike, How to disable/turn off the norm?is it while indexing? Thanks, Maureen - Need Mail bonding? Go to the Yahoo! Mail Q&A for great tips from Yahoo! Answers users.

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread Michael McCandless
"maureen tanuwidjaja" <[EMAIL PROTECTED]> wrote: > "The only simple workaround I can think of is to set maxMergeDocs to > keep all segments "small". But then you may have too many segments > with time. Either that or find a way to reduce the number of unique > fields that you actually need to

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread Michael McCandless
"Michael McCandless" <[EMAIL PROTECTED]> wrote: > The only simple workaround I can think of is to set maxMergeDocs to > keep all segments "small". But then you may have too many segments > with time. Either that or find a way to reduce the number of unique > fields that you actually need to sto

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread maureen tanuwidjaja
Oops sorry,mistyping.. I have the searching result in 30 SECONDS to 3 minutes, which is actually quite unacceptable for the "search engine" I build...Is there any recommendation on how faster searching could be done? maureen tanuwidjaja <[EMAIL PROTECTED]> wrote: Hi mike "The on

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread maureen tanuwidjaja
Hi mike "The only simple workaround I can think of is to set maxMergeDocs to keep all segments "small". But then you may have too many segments with time. Either that or find a way to reduce the number of unique fields that you actually need to store." It is not possible for me to reduce

Open / Close when Merging

2007-03-13 Thread DECAFFMEYER MATHIEU
Hi, I need to merge several indexes (I call them incremental index) with my main index. Each incremental index can contain the same url's of the main index, that's why I have a list of url's to update, that I will delete from the main index before merging with an incremental index. I have also

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread Michael McCandless
"maureen tanuwidjaja" <[EMAIL PROTECTED]> wrote: > "One thing that stands out in your listing is: your norms file > (_1ke1.nrm) is enormous compared to all other files. Are you indexing > many tiny docs where each docs has highly variable fields or > something?" > > Ya I also confuse

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread maureen tanuwidjaja
Hi Mike.. "One thing that stands out in your listing is: your norms file (_1ke1.nrm) is enormous compared to all other files. Are you indexing many tiny docs where each docs has highly variable fields or something?" Ya I also confuse why this nrm file is trmendous in size. I am ind

Re: Complete field search

2007-03-13 Thread Bhavin Pandya
Hi kainth, So for example if I have a field with this text: "world cup" and I do a search for "cup" I want it to return false but for another field that contains exactly the text "cup" I want the result to be true. You fire only phrase query on the first field where you want only "world cup"

Complete field search

2007-03-13 Thread Kainth, Sachin
Hi all, Is it possible to search whether a term is equal to the entire contents of a field rather than that the field contains a term? So for example if I have a field with this text: "world cup" and I do a search for "cup" I want it to return false but for another field that contains exactly the

Re: Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread Michael McCandless
"maureen tanuwidjaja" <[EMAIL PROTECTED]> wrote: > How much actually the disk space needed to optimize the index?The > explanation given in documentation seems to be very different with the > practical situation > > I have an index file of size 18.6 G and I am going to optimize it.I

Urgent : How much actually the disk space needed to optimize the index?

2007-03-13 Thread maureen tanuwidjaja
Dear All How much actually the disk space needed to optimize the index?The explanation given in documentation seems to be very different with the practical situation I have an index file of size 18.6 G and I am going to optimize it.I keep this index in mobile Hard Disk with capacit