Re: Common Words ignoring problem

2007-03-19 Thread thomas arni
You can adapt the source code of StopAnalyzer.java in the analysis package, or I suppose you can use the default constructor with a empty stop word list (but please check this). If you don't know "Luke" use this small tool to display your index and verify your index process. http://www.getopt

Re: Common Words ignoring problem

2007-03-19 Thread aslam bari
Ok, Thats fine. Thanks Now what if i don't want to stop any word, means i want lucene not to ignore any word.How to do this?. And also doing this will afffect any performance or not? Thanks... - Original Message From: Grant Ingersoll <[EMAIL PROTECTED]> To: java-user@lucene.apache.org

Re: Issue while parsing XML files due to control characters, help appreciated.

2007-03-19 Thread Lokeya
Thanks all for the valuable suggestions. The lock issue also got resolved and all 7 laks files are indexed in arnd 85 minutes which is like wow ! To get away with the lock issue i followed the suggestion given in this : http://mail-archives.apache.org/mod_mbox/lucene-java-user/200601.mbox/[EMAIL

Re: Trouble w/ Query Creation

2007-03-19 Thread Chris Hostetter
: I currently have a boolean query which contains a MultiFieldQuery for all MultiFieldQuery is not a query class that comes with Lucene ... did you write it yourself? it sounds like what you want is a boolean query (of a DisjunctionMaxQuery) containing a seperate phrase query for each field ...

Re: how to index XML elements with the same name using Lucene

2007-03-19 Thread Cheolgoo Kang
Keywords.setKeyword(String) could've been able to stack all the keywords set by the digester. So, setKeyword(String) method should be written like below using java.util.List: public static class KeyWords { private String lineNum; private List kw = new LinkedList(); pub

Re: contrib/benchmark questions

2007-03-19 Thread Doron Cohen
Grant Ingersoll <[EMAIL PROTECTED]> wrote on 19/03/2007 13:10:16: > So, if I am understanding correctly: > > >> "SearchSameRdr" Search > : 5000 > > means don't collect indiv. stats fur SearchSameRdr, but do whatever > that task does 5000 times, right? Almost... It should be btw { "SearchSameR

Re: contrib/benchmark questions

2007-03-19 Thread Grant Ingersoll
Thanks for the reply, Doron. I knew this email was targeted for you, but thought it would be good to add to the user record. On Mar 19, 2007, at 2:30 PM, Doron Cohen wrote: Grant Ingersoll <[EMAIL PROTECTED]> wrote on 18/03/2007 10:16:14: I'm using contrib/benchmark to do some tests for my

Re: contrib/benchmark questions

2007-03-19 Thread Doron Cohen
Grant Ingersoll <[EMAIL PROTECTED]> wrote on 18/03/2007 10:16:14: > I'm using contrib/benchmark to do some tests for my ApacheCon talk > and have some questions. > > 1. In looking at micro-standard.alg, it seems like not all braces are > closed. Is a line ending a separator too? '>' can replace

RE: Fast index traversal and update for stored field?

2007-03-19 Thread Steven Parkes
You'll have a difficult time updating Lucene indexes in place. A lot of coordination exists within Lucene specifically not to do this: it's the fact that Lucene does not do this that enables a lot of the lockless parallelism in Lucene. This applies equally to the data store and the inverted index p

Trouble w/ Query Creation

2007-03-19 Thread Rajiv Roopan
Hello, I'm having some issues making the correct query. This is my current situation. I'm searching for :"foo bar" in 3 fields: In the index I have: document 1. field1 contains (boost is 2.0): "bar stuff" field2 contains: "bar max" field3 contains: "" document 2. field1 contains (boost is 2.0)

Re: question about getting all terms in a section of the documents

2007-03-19 Thread Otis Gospodnetic
Donna, You are correct, "enum" should be "terms". Could you please modify the FAQ? You just have to log into Wiki and edit that page (the edit link is at the bottom). Thanks, Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search -

question about getting all terms in a section of the documents

2007-03-19 Thread Donna L Gresh
I am a very new user of Lucene, and thus far am amazed at its speed and ease of use. I have a question about something in the FAQ though. I have a need to get all terms in a specific section of the document; I want to create a database of term vs an identifier of the document containing the term

Re: improving RAM usage by IndexWriter

2007-03-19 Thread Marvin Humphrey
On Mar 19, 2007, at 5:09 AM, Michael McCandless wrote: I think some of these changes are similar to how KinoSearch builds a segment. Yup... sounds familiar. ;) I'm still working through some lingering issues before I can make a clean patch, Well, where is it? Don't keep it a secret! M

Re: Common Words ignoring problem

2007-03-19 Thread Grant Ingersoll
One of the constructors for StandardAnalyzer allows you to set your stop words. If you use the default constructor, you get the default set of stop words, which is in StopAnalyzer.ENGLISH_STOP_WORDS. -Grant On Mar 19, 2007, at 6:14 AM, aslam bari wrote: Hello All, I am using StandarAnalyz

improving RAM usage by IndexWriter

2007-03-19 Thread Michael McCandless
Hi, I've been looking into improving performance of IndexWriter, specifically how it makes use of RAM to buffer added documents. I've created a new class (MultiDocumentWriter) that can build a single segment from many documents at once, more efficiently than the current single document segment ap

Common Words ignoring problem

2007-03-19 Thread aslam bari
Hello All, I am using StandarAnalyzer for indexing documents. Then i make a query to search some words with And query. For example I need to search for a document which contains followings all words " this is garden". I think when lucene index the document , it ignores some common words like "

Re: Storing whole documents in the index

2007-03-19 Thread Karel Tejnora
To store document (specially large ones) out of the index is better than in index. Every merge of segments or optimize will copy those data. Stored in index is possible, but it requires 1-4x more space, depends on read/write speed of the fs, merge and optimize takes longer time. Karel On Sun, 200