t my problem isn't so much with the
join utility, but more with my query parser plugging class. Is there
something that it missing in the above link example that I need to also add
to mine to ensure that queries are applied pre join? Thanks.
-Shane
On Fri, Aug 2, 2013 at 10:46 AM, Martijn v G
ocumentation it almost sounds
like the filters will be processed pre join. However, I'm observing that
the filters are getting applied post joining. Is this supposed to be the
case? If so, what would be the best way to modify the source so that
queries are applied pre join and not post join? Thanks.
-Shane
Indeed!
I found a very good article on this as well at :
http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
It really sums up what you are saying.
Thanks for the help!
Daniel Shane
- Original Message -
From: "Michael McCandless"
To:
olution is either to remove
stopwords from the index or shard it and ParallelMultiSearch it.
What do you think?
Daniel Shane
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
aniel Shane
Allahbaksh Mohammedali Asadullah wrote:
Hi,
This is great news and good work. I think I will try this today evening. I
think we should put this as one of component in lucene-contrib. What do you
say? Committer and owner please comment.
Regards,
Allahbaksh
-Original Message-
Wow thats exactly what I was looking for! In the mean time I'll use the
time based collector.
Thanks Uwe and Mark for your help!
Daniel Shane
mark harwood wrote:
Or https://issues.apache.org/jira/browse/LUCENE-1720 offers lightweight timeout
testing at all index access stages prior to
I don't think its possible, but is there something in lucene to cap a
search to a predefined time length or is there a way to stop a search
when its running for too long?
Daniel Shane
-
To unsubscribe, e-mail: java
e or
does this mean that the first token has to have an empty Type attribute
as well?
I'm just not sure,
Daniel Shane
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional comm
Ok, I got it, from checking other filters, I should call
input.incrementToken() instead of super.incrementToken().
Do you feel this kind of breaks the object model (super.incrementToken()
should also work).
Maybe when the old API is gone, we can stop checking if someone has
overloaded next()
Uwe Schindler wrote:
There may be a problem that you may not want to restore the peek token into
the TokenFilter's attributes itsself. It looks like you want to have a Token
instance returned from peek, but the current Stream should not reset to this
Token (you only want to "look" into the next T
After thinking about it, the only conclusion I got was instead of saving
the token, to save an iterator of Attributes and use that instead. It
may work.
Daniel Shane
Daniel Shane wrote:
Hi all!
I'm trying to port my Lucene code to the new TokenStream API and I
have a filter that I c
eekedTokens.size() > 0) {
return this.peekedTokens.removeFirst();
}
return this.input.next(token);
}
}
Let me know if anyone has an idea,
Daniel Shane
I think you should do this instead (it will print the exception message
*and* the stack trace instead of only the message) :
throw new IndexerException ("CorruptIndexException on doc: " + doc.toString(),
ex);
Daniel Shane
Chris Bamford wrote:
Hi Grant,
I think you code ther
the deletions as well?
Daniel Shane
Yonik Seeley wrote:
On Fri, Aug 21, 2009 at 12:49 AM, Chris
Hostetter wrote:
: But in that case, I assume Solr does a commit per document added.
not at all ... it computes a signature and then uses that as a unique key.
IndexWriter.updateDocument does all
But in that case, I assume Solr does a commit per document added.
Lets say I wanted to index a collection of 1 million pages, would it
take much longer if I comited at each insertion rather than comiting at
the end?
Daniel Shane
Grant Ingersoll wrote:
On Aug 13, 2009, at 10:33 AM, Daniel
n the index (before it
has been written to).
What I'd like is to have an access to the stuff the index writer has
written but not yet commited. Is there something that can access that data?
Daniel Shane
Shai Erera wrote:
How many documents do you index between you refresh a reader? If it
iven field *at the time I index a document* ?
Daniel Shane
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
system.
Shane
ariel goldberg wrote:
Greetings,
I'm creating an application that
requires the indexing of millions of documents on behalf of a large group of
users, and was hoping to get an opinion on whether I should use one index per
user or one index per day.
can just check
to see if either of the files INDEX_PATH/segments or
INDEX_PATH/segments.gen exist, but that doesn't seem like the best route.
Is there a function call to determine whether or not an index already
exists?
Thanks,
Not sure if this is what you are after, but there is a projet call
File2XLIFF4j which converts a number of file formats to XLIFF (an XML
structure) using OpenOffice.org. And if I am not mistaken, Lucene has
code available for indexing XML. The project is located at
http://file2xliff4j.sourcef
have made the code available (along with a
patch file) at http://my-family.us/highlighter. To set the minimum
sequence size, just call setMinTokenSequence(int) after creating the
Highlighter object.
Shane
Harini Raghavan wrote:
I have a requirement to highlight phrases. I came across a refe
Are you doing all 7 million docs with the same writer? The call to
optimize will take longer as your index size increases. So if you are
actually indexing your docs in smaller chunks, the speed will decrease
due to the call to optimize.
Mekin Maheshwari wrote:
I am creating an index of abou
the Similarity on an index without having
to query out each document, and re-indexing in a new index?
Thanks,
Shane
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
As I am always looking for ways to enhance a searches response time, if
I were to use the MultiReader as suggested, would it still be possible
to determine which index a hit came from? Currently I use the
MultiSearcher.subSearcher() method to determine this information. After
taking a, albei
, but am not sure that is the route to go.
Any help would be greatly appreciated. (As a side note, my hits may be
in the thousands, so performance is also an issue).
Shane
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
Not sure if this is something of interest, but there is an open source
project called File2XLIFF4j on Sourceforge.net
(http://file2xliff4j.sourceforge.net/). The project converts many
common file formats to XLIFF. It may be useful for getting a common
format, highlighting, and the recreating
for each returned Document.
Does anybody know if there currently some built-in functionality to do this?
Shane Perry
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
ld be a good addition to the Lucene code base (I think this query
should be used as a default in the QueryParser if it works ok instead of a
simple BooleanQuery).
Thanks in advance for your help,
Daniel Shane
28 matches
Mail list logo