[EMAIL PROTECTED]
I am searching for a solution to make the Highlighter run property in
combination with phrase queries.
I want to highlight text with a phrase query like "windows printserver",
the following highlighted:
"windows printservers" are good blah blah "windows" manages
"print
Check out the SpanScorer.
- Mark
On Nov 10, 2008, at 8:25 AM, "Sertic Mirko, Bedag" <[EMAIL PROTECTED]
> wrote:
[EMAIL PROTECTED]
I am searching for a solution to make the Highlighter run property in
combination with phrase queries.
I want to highlight text with a phrase query like "w
Michael McCandless wrote:
But: it's slow to load a field for the first time. LUCENE-1231
(column-stride fields) aims to greatly speed up the load time.
Test it out though. In some recent testing I was doing it was *way*
faster than I thought it would be based on what I had been reading. Of
c
Hi
Thank you for your response.
Are there examples available?
Regards
Mirko
-Ursprüngliche Nachricht-
Von: Mark Miller [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 10. November 2008 14:45
An: java-user@lucene.apache.org
Betreff: Re: Highlighter and Phrase Queries
Check out the SpanScore
Check out the unit tests for the highlighter and there are a bunch of
examples.
Its pretty much the same as using the standard scorer, except that it
requires a cached token filter so that the tokenstream can be read more
than once.
Once you pass in the SpanScorer to the Highlighter though,
Ok, i will do.
I guess it will also work with BooleanQueries and combined Term/Wildcard/Phrase
Queries?
-Ursprüngliche Nachricht-
Von: Mark Miller [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 10. November 2008 15:38
An: java-user@lucene.apache.org
Betreff: Re: AW: Highlighter and Phrase
: Did you come across :
: scoreNorm = 1.0f / topDocs.getMaxScore();
: or something of this sort in Hits?
: As per my knowledge, the initial score is more than 1 but finally the scores
: get divided by the maxScore of the matched doc set. i.e. Setting an upper
: limit of 1 (for the max scorer
Right, it will work the same as the standard Highlighter except that it
highlights spans and phrase queries based on position.
Sertic Mirko, Bedag wrote:
Ok, i will do.
I guess it will also work with BooleanQueries and combined Term/Wildcard/Phrase
Queries?
-Ursprüngliche Nachricht-
If you have only 30 seconds to read this;
Join us in celebrating the ASF's 10th Anniversary at ApacheCon!
The Call for Papers is now open for ApacheCon US 2009, taking
place 2-6
November in Oakland, California. Proposals are being accepted at
http://us.apacheco
Well .. the FieldCache API is documented here (for 2.4.0):
http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/search/FieldCache.html
EG you can load ints (for example) like this:
FieldCache.DEFAULT.getInts(reader, "myfield");
This returns an array mapping docID --> int va
On Friday 07 November 2008 18:46:17 Michael McCandless wrote:
>
> Sorting populates the field cache (internal to Lucene) for that
> field, meaning it loads all values for all docs and holds them in
> memory. This makes the first query slow, and, consumes RAM, in
> proportion to how large your ind
Could able to do that using range query
String end = "25337325126";//i.e. 11/30/, assume that this is max end
date
Term endTerm = new Term("timestamp",end);
RangeQuery rangeQuery = new RangeQuery(null,endTerm,true);
Sort sort = new Sort("timestamp",true);
Filter dupFilte
Yes, that is a significant issue. What I'm coming to realize is that either
I will end up with something like
class MultiFilter {
String field;
private int[] termInDoc;
Map termToInt;
...
}
which can be entirely built on the current lucene APIs but has significantly
more overhead (the
Tim,
I didn't follow all the details, so this may be somewhat off,
but did you consider using TermVectors?
Regards,
Paul Elschot
Op Monday 10 November 2008 19:18:38 schreef Tim Sturge:
> Yes, that is a significant issue. What I'm coming to realize is that
> either I will end up with something l
In the FAQ's it says that you have to do a manual incremental update:
How do I update a document or a set of documents that are already indexed?
>
> There is no direct update procedure in Lucene. To update an index
> incrementally you must first *delete* the documents that were updated, and
> *the
On Monday 10 November 2008 13:55:31 Michael McCandless wrote:
>
> Finally, you might want to instead look at Solr, which provides facet
> counting out of the box, rather than roll your own...
Doooh - new api, but it's facet counting sounds good.
Any starting points for moving from plain lucene to
You have to have indexed something that uniquely identifies the
document in order to know what the old one is. Really, this is
the same question as updating, isn't it? If you could update
a document in place, you'd have to know what document
that was. If you know that information, you know which
do
Hi
We have about 1 mio documents and growing within a hierarchical order (3
to 20 deep) and about 3000 people accessing these nodes, whereas some
people have access to certain branches and other people to other
branches and some branches are shared. The access control of these nodes
is changi
ChadDavis <[EMAIL PROTECTED]> wrote on 11/10/2008 02:22:45 PM:
> In the FAQ's it says that you have to do a manual incremental update:
>
> How do I update a document or a set of documents that are already
indexed?
> >
> > There is no direct update procedure in Lucene. To update an index
> > incr
This has been discussed more than a few times, I suggest you take
a look at the searchable archive for things like privileges, access
privileges, etc. You'll find lots of information faster that way...
Best
Erick
On Mon, Nov 10, 2008 at 2:52 PM, Michael Wechner
<[EMAIL PROTECTED]>wrote:
> Hi
>
>
The FAQ's have this index performance tip:
Use autoCommit=false when you open your IndexWriter
>
> In Lucene 2.3 there are substantial optimizations for Documents that use
> stored fields and term vectors, to save merging of these very large index
> files. You should see the best gains by using au
That's what I thought.
So, that leads me to . . . is it necessarily all that much faster to index
in an incremental update fashion, rather than just clobbering the old index?
On Mon, Nov 10, 2008 at 12:52 PM, Erick Erickson <[EMAIL PROTECTED]>wrote:
> You have to have indexed something that un
Actually, all non-deprecated ctors of IndexWriter set autoCommit to
false. Ie, in 3.0 autoCommit false will become the only option.
Mike
ChadDavis wrote:
The FAQ's have this index performance tip:
Use autoCommit=false when you open your IndexWriter
In Lucene 2.3 there are substantial o
That's easy. Thanks.
On Mon, Nov 10, 2008 at 1:12 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:
>
> Actually, all non-deprecated ctors of IndexWriter set autoCommit to false.
> Ie, in 3.0 autoCommit false will become the only option.
>
> Mike
>
>
> ChadDavis wrote:
>
> The FAQ's have this
Hmmm -- I hadn't thought about that so I took a quick look at the term
vector support.
What I'm really looking for is a compact but performant representation of
a set of filters on the same (one term field). Using term vectors would
mean an algorithm similar to:
String myfield;
String myterm;
Te
It all depends on how many updates you're doing, which
you haven't told us .
If a large majority of your index is being updated, there's
no particular reason to update, I'd build a new one.
Best
Erick
On Mon, Nov 10, 2008 at 3:09 PM, ChadDavis <[EMAIL PROTECTED]>wrote:
> That's what I thought.
Op Monday 10 November 2008 22:21:20 schreef Tim Sturge:
> Hmmm -- I hadn't thought about that so I took a quick look at the
> term vector support.
>
> What I'm really looking for is a compact but performant
> representation of a set of filters on the same (one term field).
> Using term vectors woul
I think we've gone around in a loop here. It's exactly due to the inadequacy
of cached filters that I'm considering what I'm doing.
Here's the section from my first email that is most illuminating:
"
The reason I have this question is that I am writing a multi-filter for
single term fields. My ind
Hello,
We are using a MultiFieldQueryParser and we have problems with making
Lucene find parts of words. So that for example searching for "a" will
find all the results that contain "a" in it, not only as a separate
token, but even inside of the tokens (like word "make").
We tried putting wi
Take a look at the ngram classes (probably in contrib, don't remember
for sure right now).
Patrick
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Has anyone deployed Lucene to index log files? I have seen some articles
about how RackSpace used Lucene and Hadoop for log processing, but I have
not seen any details on the implementation.
To get my required analytics, I think I would need to treat each line of
the Apache log files as a do
On Nov 10, 2008, at 2:42 PM, Stefan Trcek wrote:
On Monday 10 November 2008 13:55:31 Michael McCandless wrote:
Finally, you might want to instead look at Solr, which provides facet
counting out of the box, rather than roll your own...
Doooh - new api, but it's facet counting sounds good.
An
Reading this I realize how unclear it is, so let me give a concrete example:
I want to do a search restricting users by age range. So someone can ask for
the users 18-35, 40-60 etc.
Here are the options I considered:
1) construct a RangeQuery. This is a 20-40 clause boolean subquery in an
otherw
hi :)
first, i'm sorry for my bad English..
I have a question.
In lucene 2.4.0 , Token class constructor public Token(String text, int
start, int end, int flags) is deprecated.
I want to know why and
What constructor is the substitution for this deprecated constructor?
May I use like this?
T
Erick Erickson schrieb:
This has been discussed more than a few times, I suggest you take
a look at the searchable archive for things like privileges, access
privileges, etc. You'll find lots of information faster that way...
You mean Erik Hatcher's answer re SecurityFilter
http://archives.de
35 matches
Mail list logo