Re: search with RangeFilter.Less

2006-06-28 Thread Chris Hostetter
: I'm trying to do a numerical search for a property in Lucene using : RangeFilter.Less : Field("id","property",Field.Store.YES,Field.Index.TOKENIZED) ); :doc.add( new : Field("num",NumberTools.longToString(5L),Field.Store.YES,Field.Index.TOK : ENIZED) ); : Since five is less than ten,

Re: MemoryUsage of sorting

2006-06-28 Thread Chris Hostetter
: some OutOfMemory errors. If I understand it correctly, each unique term : in a field is read into a cache, when I use Searcher.search(Query query, : Sort sort) with one SortField. So even if my query only finds 5 Minor clarification: if the sort type is one of the numeric types, then an array o

Re: Searching is taking a lot...

2006-06-28 Thread Chris Hostetter
Just to clarify: If you are doing paginated results, then the Hits API is probably fast enough for you ... it's designed to work well in the first 100 results, and most people don't go that deep when looking at search results. if you look back earier in this thread, the "I hope you're not using t

Re: Searching is taking a lot...

2006-06-28 Thread James Pine
A HitCollector object invokes its collect method on every document which matches the query/filter submitted to the Searcher.search method. I think all you would need to do is pass in the page number and results per page to your HitCollector constructor and then in the collect method do the bookeepi

Re: Searching is taking a lot...

2006-06-28 Thread heritrix . lucene
I am using Hits object to collect all documents. Let me tell you my problem. I am creating a web application. Every time when a user looks for something it goes and search the index and return the results. Results may be in millions. So for displaying results, i am doing pagination. Here the probl

Re: search with RangeFilter.Less

2006-06-28 Thread Peter W .
Jason, Thanks, but changing it to '05' or '05L' in the code didn't seem to work, hits.length() still returns 0 when the document should be found. If you make just one change in the example: Hits hits = searcher.search(query); //Hits hits = searcher.search(fq); IndexSearcher finds

Re: search with RangeFilter.Less

2006-06-28 Thread Jason Pump
It's a string comparison. Make the "5" a "05" would be a simple workaround. Jason Peter W. wrote: Hello, I'm trying to do a numerical search for a property in Lucene using RangeFilter.Less without using both RangeQuery and test cases. Here's the code that I expect would return one hit : (ad

Re: Flushing RAMDir into FSDir

2006-06-28 Thread Doron Cohen
Just a thought: using IndexModifier, you could call flush() in intervals, say every seconds or every documents. If not using IndexModifier, closing and re-opening IndexWriter should have similar effect. Pros: (1) simple managing code, (2) content of previous docs can be removed from disk once fl

search with RangeFilter.Less

2006-06-28 Thread Peter W .
Hello, I'm trying to do a numerical search for a property in Lucene using RangeFilter.Less without using both RangeQuery and test cases. Here's the code that I expect would return one hit : (adapted from Youngho) import org.apache.lucene.analysis.SimpleAnalyzer; import org.apache.lucene.docu

Re: What is a good book on Lucene?

2006-06-28 Thread Michael McCandless
Lucene in Action was very helpful for this beginner! ... and I would second that! Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: batch indexing using RAMDirectory

2006-06-28 Thread James Pine
Hey Eric, I think you want: fsWriter.addIndexes(Directory[] {ramDir}); to be: fsWriter.addIndexes(new Directory[]{ramDir}); JAMES --- zheng <[EMAIL PROTECTED]> wrote: > I am a novice in lucene. I write some code to do > batch indexing using > RAMDirectory according to the code provided in >

Re: Flushing RAMDir into FSDir

2006-06-28 Thread Erick Erickson
OK, now I understand what you're trying to accomplish. Unfortunately, I haven't a clue about any better solution than you're already using. I've also seen the optimize step take a really long time. Does it make any sense at all to write a bunch of separate indexes (a new one each time you wou

Re: Adding stem AND original term

2006-06-28 Thread Jason Pump
I would think what you want to do is index on the stem, and rank on the stem and the original form. After all, if you match exactly, then you better match for the stem. Robert Haycock wrote: Hi, I started using the EnglishStemmer and noticed that only the stem gets added to the index. I woul

batch indexing using RAMDirectory

2006-06-28 Thread zheng
I am a novice in lucene. I write some code to do batch indexing using RAMDirectory according to the code provided in lucene in action, which is something like FSDirectory fsDir = FSDirectory.getDirectory("/tmp/index", true); RAMDirectory ramDir = new RAMDirectory(); IndexWriter fsWriter = IndexW

Re: Exact match on a single term / word

2006-06-28 Thread Chris Hostetter
: I am trying to get Lucene to perform an exact match on a single term or word : using the default query parser. It works fine whenever I have more than one : word / term in the search string (it parses the string into a PhraseQuery : with a slop of 0 which is correct). However when the search str

Re: What is a good book on Lucene?

2006-06-28 Thread Ben Knear
Grant Ingersoll wrote: Lucene In Action is well done. It doesn't cover all the latest features per se (as in 2.0 version), but hits on most of them. Haven't read the others. There are also a lot of free resources available that you could use to piecemeal together. Check the wiki.for these

Re: What is a good book on Lucene?

2006-06-28 Thread Grant Ingersoll
Lucene In Action is well done. It doesn't cover all the latest features per se (as in 2.0 version), but hits on most of them. Haven't read the others. There are also a lot of free resources available that you could use to piecemeal together. Check the wiki.for these Vladimir Olenin wrote:

What is a good book on Lucene?

2006-06-28 Thread Vladimir Olenin
I wonder what is the best book, that can be recommended as an introduction as well as 'in-depth' coverage of the latest version of Lucene? There are a few in the Internet, but I was wondering which has the most comprehensive coverage of all features, etc. Thanks! Vlad

Re: Adding stem AND original term

2006-06-28 Thread Erik Hatcher
Ah yes, sorry my bad. I only quickly glanced at the code. Erik On Jun 28, 2006, at 10:04 AM, Robert Haycock wrote: Hi Erik, Isn't buffering what I'm doing? The first time next() is called it reads the next token from the stream into 'unStemmedToken'. The next call uses the same t

Re: Flushing RAMDir into FSDir

2006-06-28 Thread Ben Knear
Erick Erickson wrote: Kind of a tangential response, but there was a discussion a while back about RAMdir .vs. FSDir that you probably want to search for and look over. As I remember (and I only glanced at it) the statement was made that the FSDir *is* a RAMdir, at least for a while. This impli

RE: Adding stem AND original term

2006-06-28 Thread Robert Haycock
Stupid me. It was working fine. I hadn't called super(in) so the call to stream.close() in DocumentWriter was obviously failing!!! Rob. -Original Message- From: Robert Haycock [mailto:[EMAIL PROTECTED] Sent: 28 June 2006 15:04 To: java-user@lucene.apache.org Subject: RE: Adding stem AN

SV: Flushing RAMDir into FSDir

2006-06-28 Thread Marcus Falck
I'm aware of that the FSDirectory actually stores documents in a RAMDir until merge time. But the thing is that I also want to store the documents in the RAMDir as snapshots on the harddrive until they have been flushed down to the FSDir. So I won't loose any documents in a crash. Does anybody

Re: Flushing RAMDir into FSDir

2006-06-28 Thread Erick Erickson
Kind of a tangential response, but there was a discussion a while back about RAMdir .vs. FSDir that you probably want to search for and look over. As I remember (and I only glanced at it) the statement was made that the FSDir *is* a RAMdir, at least for a while. This implies that there es little t

RE: Adding stem AND original term

2006-06-28 Thread Robert Haycock
Hi Erik, Isn't buffering what I'm doing? The first time next() is called it reads the next token from the stream into 'unStemmedToken'. The next call uses the same token, and nulls it after use. Third call will get the next one from the stream and so on. I effectively have a 'one token buffer'

Re: Removing document from index

2006-06-28 Thread Aleksander M. Stensby
My bet is that after updating /appending to an index, the searcher object used also need to be updated, so that it will work agains the new snapshot of the index. See http://wiki.apache.org/jakarta-lucene/UpdatingAnIndex 1. keep a single open IndexReader used by all searches 2. Every few mi

Removing document from index

2006-06-28 Thread Leandro Saad
Hi all. I can remove a documents from the index using IndexReader.delete (Term) but the search still returns this document. What am I doing wrong? -- Leandro Rodrigo Saad Cruz CTO - InterBusiness Technologies db.apache.org/ojb guara-framework.sf.net xingu.sf.net

Re: Adding stem AND original term

2006-06-28 Thread Erick Erickson
I'll leave it to others to analyze the code, and ask something completely different ... In the Lucene in Action book, there is an example of indexing synonyms. The idea is that they get indexed in the exact same position. So, would it be easier if you indexed the stemmed and unstemmed terms in di

Re: Searching is taking a lot...

2006-06-28 Thread Erick Erickson
I hope you're not using the Hits object to assemble all 14M results. A recurring theme is that a Hits object should NOT be used for collection more than a few (100 I think) objects since it re-executes the query every 100 or so terms it returns. It's intent is to efficiently return the first few h

Re: Adding stem AND original term

2006-06-28 Thread Erik Hatcher
Returning null is reserved for the end of the tokens. You'll need to implement some kind of buffering mechanism - check out the custom analyzers (like the SynonymAnalyzer) in the Lucene in Action code - http://www.lucenebook.com - for examples. Erik On Jun 28, 2006, at 8:52 AM,

Re: Lucene indexing RDF

2006-06-28 Thread Christiaan Fluit
adasal wrote: As far as i have researched this I know that the gnowsis project uses both rdf and lucene, but I have not had time to determine their relationship. www.gnowsis.org/ I can tell you a bit about Gnowsis, as we (Aduna) are cooperating with the Gnowsis people on RDF creation, storage

Adding stem AND original term

2006-06-28 Thread Robert Haycock
Hi, I started using the EnglishStemmer and noticed that only the stem gets added to the index. I would like to be able to add both to give me a stem search and an exact search capability. My first attempt has been to write my own stemming filter. The idea being that the first pass would get the

Re: Searching is taking a lot...

2006-06-28 Thread heritrix . lucene
Hi, I think i have posted this question in some other thread... When the resultSet is very big, Searching is taking a lot of time. For returning responce of a query that finds approx 14 M results, first time it is taking approx 17Sec. But next time for the same query it is taking almost 2 seconds.

Re: IndexSearcher in Servlet

2006-06-28 Thread Aleksander M. Stensby
You have to re-init the searcher / reader object. You can re-init the reader object that the searcher uses, without re-initing the searcher object itself, as stated earlier here On Wed, 28 Jun 2006 14:23:21 +0200, heritrix.lucene <[EMAIL PROTECTED]> wrote: o o no I mean the searching wou

MemoryUsage of sorting

2006-06-28 Thread Kroehling, Thomas
Hi, I have a question about using sorting in Lucene, because we experienced some OutOfMemory errors. If I understand it correctly, each unique term in a field is read into a cache, when I use Searcher.search(Query query, Sort sort) with one SortField. So even if my query only finds 5 documents, Luc

Re: IndexSearcher in Servlet

2006-06-28 Thread heritrix . lucene
o o no I mean the searching would be fast or not... But now i have tested. The result that i found reveals that there would be no difference in terms of searching speed. But there is another thing that i want to ask. What if the index is changed in between. Will the indexReader give the results w

Re: Exact match on a single term / word

2006-06-28 Thread Erik Hatcher
What is the problem with using a TermQuery in this case? Please provide some more details on the analyzer you're using (both for indexing and with QueryParser) and a sample of text you indexed. Erik On Apr 28, 2006, at 7:36 AM, Hugh Ross wrote: I am trying to get Lucene to perform

Re: IndexSearcher in Servlet

2006-06-28 Thread Erik Hatcher
On Jun 28, 2006, at 6:53 AM, heritrix.lucene wrote: Is there any difference in terms of speed between IndexReader and IndexSearcher?? I'm assuming you mean is there any difference in speed in how you construct an IndexSearcher no. Erik On 6/27/06, Erik Hatcher <[EMAIL PRO

SV: Flushing RAMDir into FSDir

2006-06-28 Thread Marcus Falck
Did a clone of the AddIndexes method. See code below. Anybody seeing any problems with using the AddIndexesWithoutOptimize method ? // Original public virtual void AddIndexes(Directory[] dirs) { lock (this) {

Exact match on a single term / word

2006-06-28 Thread Hugh Ross
I am trying to get Lucene to perform an exact match on a single term or word using the default query parser. It works fine whenever I have more than one word / term in the search string (it parses the string into a PhraseQuery with a slop of 0 which is correct). However when the search string just

Re: IndexSearcher in Servlet

2006-06-28 Thread Aleksander M. Stensby
As far as i know, an IndexSearcher use an IndexReader.. Hence you can do searcher.getIndexReader().. even tho you instanciated the searcher with a string path or a directory. So, i would guess that by creating a searcher with an indexreader as parameter, the constructor will be faster. But,

Re: IndexSearcher in Servlet

2006-06-28 Thread heritrix . lucene
Is there any difference in terms of speed between IndexReader and IndexSearcher?? On 6/27/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Jun 27, 2006, at 10:32 AM, Fabrice Robini wrote: > That's also my case... > I create a new IndexSearcher at each query, but with a static and > instanciate

Flushing RAMDir into FSDir

2006-06-28 Thread Marcus Falck
Hi, I got a lucene based host application that retrieves content for indexing from fetcher applications. Since I get fresh content all the time I wanted to have full control over the disc write process. So I ended up using a RAMDirectory and a FSDirectory. When the content arrives

Re: Lucene indexing RDF

2006-06-28 Thread adasal
What are the issues in indexing rdf? I would be interested to see a discusion of this. Off the top of my head it would be one thing to index the data, regardless of enclosing tags, but something else to employ the tags as adjunct to the index. Has this been approached anywhere? A third part would