: I'm trying to do a numerical search for a property in Lucene using
: RangeFilter.Less
: Field("id","property",Field.Store.YES,Field.Index.TOKENIZED) );
:doc.add( new
: Field("num",NumberTools.longToString(5L),Field.Store.YES,Field.Index.TOK
: ENIZED) );
: Since five is less than ten,
: some OutOfMemory errors. If I understand it correctly, each unique term
: in a field is read into a cache, when I use Searcher.search(Query query,
: Sort sort) with one SortField. So even if my query only finds 5
Minor clarification: if the sort type is one of the numeric types, then an
array o
Just to clarify: If you are doing paginated results, then the Hits API is
probably fast enough for you ... it's designed to work well in the first
100 results, and most people don't go that deep when looking at search
results.
if you look back earier in this thread, the "I hope you're not using t
A HitCollector object invokes its collect method on
every document which matches the query/filter
submitted to the Searcher.search method. I think all
you would need to do is pass in the page number and
results per page to your HitCollector constructor and
then in the collect method do the bookeepi
I am using Hits object to collect all documents.
Let me tell you my problem. I am creating a web application. Every time when
a user looks for something it goes and search the index and return the
results. Results may be in millions. So for displaying results, i am doing
pagination.
Here the probl
Jason,
Thanks, but changing it to '05' or '05L' in the code didn't seem to
work,
hits.length() still returns 0 when the document should be found.
If you make just one change in the example:
Hits hits = searcher.search(query);
//Hits hits = searcher.search(fq);
IndexSearcher finds
It's a string comparison. Make the "5" a "05" would be a simple workaround.
Jason
Peter W. wrote:
Hello,
I'm trying to do a numerical search for a property in Lucene using
RangeFilter.Less
without using both RangeQuery and test cases.
Here's the code that I expect would return one hit :
(ad
Just a thought: using IndexModifier, you could call flush() in intervals,
say every seconds or every documents. If not using IndexModifier,
closing and re-opening IndexWriter should have similar effect.
Pros: (1) simple managing code, (2) content of previous docs
can be removed from disk once fl
Hello,
I'm trying to do a numerical search for a property in Lucene using
RangeFilter.Less
without using both RangeQuery and test cases.
Here's the code that I expect would return one hit :
(adapted from Youngho)
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.docu
Lucene in Action was very helpful for this beginner!
... and I would second that!
Mike
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hey Eric,
I think you want:
fsWriter.addIndexes(Directory[] {ramDir});
to be:
fsWriter.addIndexes(new Directory[]{ramDir});
JAMES
--- zheng <[EMAIL PROTECTED]> wrote:
> I am a novice in lucene. I write some code to do
> batch indexing using
> RAMDirectory according to the code provided in
>
OK, now I understand what you're trying to accomplish. Unfortunately, I
haven't a clue about any better solution than you're already using. I've
also seen the optimize step take a really long time.
Does it make any sense at all to write a bunch of separate indexes (a new
one each time you wou
I would think what you want to do is index on the stem, and rank on the
stem and the original form. After all, if you match exactly, then you
better match for the stem.
Robert Haycock wrote:
Hi,
I started using the EnglishStemmer and noticed that only the stem gets
added to the index. I woul
I am a novice in lucene. I write some code to do batch indexing using
RAMDirectory according to the code provided in lucene in action, which is
something like FSDirectory fsDir = FSDirectory.getDirectory("/tmp/index",
true);
RAMDirectory ramDir = new RAMDirectory();
IndexWriter fsWriter = IndexW
: I am trying to get Lucene to perform an exact match on a single term or word
: using the default query parser. It works fine whenever I have more than one
: word / term in the search string (it parses the string into a PhraseQuery
: with a slop of 0 which is correct). However when the search str
Grant Ingersoll wrote:
Lucene In Action is well done. It doesn't cover all the latest
features per se (as in 2.0 version), but hits on most of them.
Haven't read the others. There are also a lot of free resources
available that you could use to piecemeal together. Check the
wiki.for these
Lucene In Action is well done. It doesn't cover all the latest features
per se (as in 2.0 version), but hits on most of them. Haven't read the
others. There are also a lot of free resources available that you could
use to piecemeal together. Check the wiki.for these
Vladimir Olenin wrote:
I wonder what is the best book, that can be recommended as an
introduction as well as 'in-depth' coverage of the latest version of
Lucene? There are a few in the Internet, but I was wondering which has
the most comprehensive coverage of all features, etc.
Thanks!
Vlad
Ah yes, sorry my bad. I only quickly glanced at the code.
Erik
On Jun 28, 2006, at 10:04 AM, Robert Haycock wrote:
Hi Erik,
Isn't buffering what I'm doing? The first time next() is called it
reads the next token from the stream into 'unStemmedToken'. The next
call uses the same t
Erick Erickson wrote:
Kind of a tangential response, but there was a discussion a while back
about
RAMdir .vs. FSDir that you probably want to search for and look over.
As I
remember (and I only glanced at it) the statement was made that the FSDir
*is* a RAMdir, at least for a while. This impli
Stupid me. It was working fine. I hadn't called super(in) so the call
to stream.close() in DocumentWriter was obviously failing!!!
Rob.
-Original Message-
From: Robert Haycock [mailto:[EMAIL PROTECTED]
Sent: 28 June 2006 15:04
To: java-user@lucene.apache.org
Subject: RE: Adding stem AN
I'm aware of that the FSDirectory actually stores documents in a RAMDir until
merge time. But the thing is that I also want to store the documents in the
RAMDir as snapshots on the harddrive until they have been flushed down to the
FSDir. So I won't loose any documents in a crash.
Does anybody
Kind of a tangential response, but there was a discussion a while back about
RAMdir .vs. FSDir that you probably want to search for and look over. As I
remember (and I only glanced at it) the statement was made that the FSDir
*is* a RAMdir, at least for a while. This implies that there es little t
Hi Erik,
Isn't buffering what I'm doing? The first time next() is called it
reads the next token from the stream into 'unStemmedToken'. The next
call uses the same token, and nulls it after use. Third call will get
the next one from the stream and so on. I effectively have a 'one token
buffer'
My bet is that after updating /appending to an index, the searcher object
used also need to be updated, so that it will work agains the new snapshot
of the index.
See http://wiki.apache.org/jakarta-lucene/UpdatingAnIndex
1. keep a single open IndexReader used by all searches
2. Every few mi
Hi all. I can remove a documents from the index using IndexReader.delete
(Term) but the search still returns this document.
What am I doing wrong?
--
Leandro Rodrigo Saad Cruz
CTO - InterBusiness Technologies
db.apache.org/ojb
guara-framework.sf.net
xingu.sf.net
I'll leave it to others to analyze the code, and ask something completely
different ...
In the Lucene in Action book, there is an example of indexing synonyms. The
idea is that they get indexed in the exact same position. So, would it be
easier if you indexed the stemmed and unstemmed terms in di
I hope you're not using the Hits object to assemble all 14M results. A
recurring theme is that a Hits object should NOT be used for collection more
than a few (100 I think) objects since it re-executes the query every 100 or
so terms it returns. It's intent is to efficiently return the first few
h
Returning null is reserved for the end of the tokens. You'll need
to implement some kind of buffering mechanism - check out the custom
analyzers (like the SynonymAnalyzer) in the Lucene in Action code -
http://www.lucenebook.com - for examples.
Erik
On Jun 28, 2006, at 8:52 AM,
adasal wrote:
As far as i have researched this I know that the gnowsis project uses both
rdf and lucene, but I have not had time to determine their relationship.
www.gnowsis.org/
I can tell you a bit about Gnowsis, as we (Aduna) are cooperating with
the Gnowsis people on RDF creation, storage
Hi,
I started using the EnglishStemmer and noticed that only the stem gets
added to the index. I would like to be able to add both to give me a
stem search and an exact search capability.
My first attempt has been to write my own stemming filter. The idea
being that the first pass would get the
Hi,
I think i have posted this question in some other thread...
When the resultSet is very big, Searching is taking a lot of time.
For returning responce of a query that finds approx 14 M results, first time
it is taking approx 17Sec.
But next time for the same query it is taking almost 2 seconds.
You have to re-init the searcher / reader object. You can re-init the
reader object that the searcher uses, without re-initing the searcher
object itself, as stated earlier here
On Wed, 28 Jun 2006 14:23:21 +0200, heritrix.lucene
<[EMAIL PROTECTED]> wrote:
o o no
I mean the searching wou
Hi,
I have a question about using sorting in Lucene, because we experienced
some OutOfMemory errors. If I understand it correctly, each unique term
in a field is read into a cache, when I use Searcher.search(Query query,
Sort sort) with one SortField. So even if my query only finds 5
documents, Luc
o o no
I mean the searching would be fast or not... But now i have tested. The
result that i found reveals that there would be no difference in terms of
searching speed.
But there is another thing that i want to ask. What if the index is changed
in between.
Will the indexReader give the results w
What is the problem with using a TermQuery in this case? Please
provide some more details on the analyzer you're using (both for
indexing and with QueryParser) and a sample of text you indexed.
Erik
On Apr 28, 2006, at 7:36 AM, Hugh Ross wrote:
I am trying to get Lucene to perform
On Jun 28, 2006, at 6:53 AM, heritrix.lucene wrote:
Is there any difference in terms of speed between IndexReader and
IndexSearcher??
I'm assuming you mean is there any difference in speed in how you
construct an IndexSearcher no.
Erik
On 6/27/06, Erik Hatcher <[EMAIL PRO
Did a clone of the AddIndexes method.
See code below. Anybody seeing any problems with using the
AddIndexesWithoutOptimize method ?
// Original
public virtual void AddIndexes(Directory[] dirs)
{
lock (this)
{
I am trying to get Lucene to perform an exact match on a single term or word
using the default query parser. It works fine whenever I have more than one
word / term in the search string (it parses the string into a PhraseQuery
with a slop of 0 which is correct). However when the search string just
As far as i know, an IndexSearcher use an IndexReader.. Hence you can do
searcher.getIndexReader().. even tho you instanciated the searcher with a
string path or a directory. So, i would guess that by creating a searcher
with an indexreader as parameter, the constructor will be faster.
But,
Is there any difference in terms of speed between IndexReader and
IndexSearcher??
On 6/27/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
On Jun 27, 2006, at 10:32 AM, Fabrice Robini wrote:
> That's also my case...
> I create a new IndexSearcher at each query, but with a static and
> instanciate
Hi,
I got a lucene based host application that retrieves content for
indexing from fetcher applications.
Since I get fresh content all the time I wanted to have full control
over the disc write process.
So I ended up using a RAMDirectory and a FSDirectory.
When the content arrives
What are the issues in indexing rdf? I would be interested to see a
discusion of this.
Off the top of my head it would be one thing to index the data, regardless
of enclosing tags, but something else to employ the tags as adjunct to the
index. Has this been approached anywhere?
A third part would
43 matches
Mail list logo