Indeed!
I found a very good article on this as well at :
http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
It really sums up what you are saying.
Thanks for the help!
Daniel Shane
- Original Message -
From: "Michael McCandless"
To:
olution is either to remove
stopwords from the index or shard it and ParallelMultiSearch it.
What do you think?
Daniel Shane
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
I question the performance of such an approach. For lucene to be fast,
disk access need to be fast, and the transaction stuff with google is
not that good.
I'll have to test it out to see, but I anticipate a huge performance hit
compared to lucene running with a real HDD access.
D
Wow thats exactly what I was looking for! In the mean time I'll use the
time based collector.
Thanks Uwe and Mark for your help!
Daniel Shane
mark harwood wrote:
Or https://issues.apache.org/jira/browse/LUCENE-1720 offers lightweight timeout
testing at all index access stages prior to
I don't think its possible, but is there something in lucene to cap a
search to a predefined time length or is there a way to stop a search
when its running for too long?
Daniel Shane
-
To unsubscribe, e-mail: java
e or
does this mean that the first token has to have an empty Type attribute
as well?
I'm just not sure,
Daniel Shane
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional comm
Ok, I got it, from checking other filters, I should call
input.incrementToken() instead of super.incrementToken().
Do you feel this kind of breaks the object model (super.incrementToken()
should also work).
Maybe when the old API is gone, we can stop checking if someone has
overloaded next()
Uwe Schindler wrote:
There may be a problem that you may not want to restore the peek token into
the TokenFilter's attributes itsself. It looks like you want to have a Token
instance returned from peek, but the current Stream should not reset to this
Token (you only want to "look" into the next T
After thinking about it, the only conclusion I got was instead of saving
the token, to save an iterator of Attributes and use that instead. It
may work.
Daniel Shane
Daniel Shane wrote:
Hi all!
I'm trying to port my Lucene code to the new TokenStream API and I
have a filter that I c
eekedTokens.size() > 0) {
return this.peekedTokens.removeFirst();
}
return this.input.next(token);
}
}
Let me know if anyone has an idea,
Daniel Shane
I think you should do this instead (it will print the exception message
*and* the stack trace instead of only the message) :
throw new IndexerException ("CorruptIndexException on doc: " + doc.toString(),
ex);
Daniel Shane
Chris Bamford wrote:
Hi Grant,
I think you code ther
the deletions as well?
Daniel Shane
Yonik Seeley wrote:
On Fri, Aug 21, 2009 at 12:49 AM, Chris
Hostetter wrote:
: But in that case, I assume Solr does a commit per document added.
not at all ... it computes a signature and then uses that as a unique key.
IndexWriter.updateDocument does all
But in that case, I assume Solr does a commit per document added.
Lets say I wanted to index a collection of 1 million pages, would it
take much longer if I comited at each insertion rather than comiting at
the end?
Daniel Shane
Grant Ingersoll wrote:
On Aug 13, 2009, at 10:33 AM, Daniel
n the index (before it
has been written to).
What I'd like is to have an access to the stuff the index writer has
written but not yet commited. Is there something that can access that data?
Daniel Shane
Shai Erera wrote:
How many documents do you index between you refresh a reader? If it
iven field *at the time I index a document* ?
Daniel Shane
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
ld be a good addition to the Lucene code base (I think this query
should be used as a default in the QueryParser if it works ok instead of a
simple BooleanQuery).
Thanks in advance for your help,
Daniel Shane
16 matches
Mail list logo