: public String highlightTerm(String originalText , TokenGroup group)
: {
: if(group.getTotalScore()<=0)
: {
: return originalText;
: }
: return "" + originalText + "";
: }
:
: I'm getting '< em> some text '
Thanks Grant,
I did make an initial posting on this list but got zero responses so
I'm guessing nobody else has seen the problem.
basically from this:
public String highlightTerm(String originalText , TokenGroup group)
{
if(group.getTotalScore()<=0)
{
From the resources section of the website, the Issue Tracking link
is: http://issues.apache.org/jira/browse/LUCENE
Also, it is helpful if you have done a preliminary search on the
topic and some reasonable investigation to confirm that it is in fact
a bug. If your not sure, please ask on t
can someone please tell me where the most appropriate place to report
bugs might be - in this case for the hit-highlighter contribution
Thanks
Jason.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
Incidently, is field1(foo bar) a shortcut for field1(foo) field1(bar) like
in the regular QueryParser?
I believe I just or the queries together:
example = "field1,field2((search & old) ~3 horse)";
expected = "(+spanNear([field1:search, field1:horse], 3, false)
+spanNear([fie
Would it be simpler just to modify the input with a regex rather than risk
messing with StandardANalyzer? Or wouldn't that do what you need?
On 1/11/07, Van Nguyen <[EMAIL PROTECTED]> wrote:
Hi,
I need to modify the StandardAnalyzer so that it will tokenize zip codes
that look like this:
Thanks Erick,
this is what I ended up doing more or less but I'm not happy with it
really,
hparser.setDefaultOperator(QueryParser.Operator.AND);
Query hideQuery =hparser.parse("properties@"+hideterm+":"+hidevalue);
cquery.add(hideQuery, Boole
begin 666 luke_diffs.dat
M3VYL>2!I;B N+B]L=6ME.B!B=6EL9 ID:69F("UU("UR("XO$9I96QDF5D*"DL"BT@
M(" @(" @(" @(" @(" @("!F6S!=+FES5&5R;59E8W1OF5D([EMAIL PROTECTED]"AT86)S6VE=+" B8V)4
M;VME;FEZ960B*3L*(" @(" @($]B:F5C="!C8E161B ](&9I;F0H=&%B&5D+" B<[EMAIL PROTECTED];&5A;[EMAIL PROTECTED]
M+2 @(" @("
Yes. The other option is to download the jar and run it with a command
line argument pointing to the lucene2 jar. I would love to avoid that step.
Hook me up. (responding to list to point out second option - I prefer
this one)
- Mark
Benson Margulies wrote:
My experience tonight is that the s
Chris Hostetter wrote:
: I wasn't clear on this answer. The problem was not grammar ambiguity but
: from a user standpoint...I wanted to differentiate the proximity binary
: operator from the phrase distance operator...even though they are
: similar. Perhaps the differentiation is more confusin
: This sounds troubling to me now :) I may need to clear up my
: understanding of this and rework the parser:
: "A | B | C ! D ! E" wold get parsed as allFields:a allFields:b
: (+allFields:c -allFields:d -allFields:e)
: This is because ! binds tighter than |...
: Sounds like I need to bone up on h
I don't know how to build up that magic, signed, auto-launching device.
I'll post diffs, however, momentarily.
-Original Message-
From: Joe Shaw [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 11, 2007 8:55 PM
To: java-user@lucene.apache.org
Subject: Re: Use the Luke, Force
Hi,
Benson
Hi,
Benson Margulies wrote:
My experience tonight is that the stock 1.9-based Luke won't open my 2.0
indices. So I fixed up a version of the source.
I've been seeing this too.
Anyone else want it?
That would be great, if you don't mind. A jar would be nice too. :)
Joe
--
My experience tonight is that the stock 1.9-based Luke won't open my 2.0
indices. So I fixed up a version of the source.
Anyone else want it?
: I wasn't clear on this answer. The problem was not grammar ambiguity but
: from a user standpoint...I wanted to differentiate the proximity binary
: operator from the phrase distance operator...even though they are
: similar. Perhaps the differentiation is more confusing then helpful.
it's not
I would try adding this (or your regex)
| (("-"
)|())
between the EMAIL and HOST line or something,
And change this:
org.apache.lucene.analysis.Token next() throws IOException :
{
Token token = null;
}
{
( token = |
token = |
token = |
token = |
token = |
token = |
Hi,
I need to modify the StandardAnalyzer so that it will tokenize zip codes
that look like this:
92626-2646
I think the part I need to modify is in here - specifically:
// floating point, serial, model numbers, ip addresses, etc.
// every other segment must have at least
What analyzers are you using for your queries and your indexing?
StandardAnalyzer (I believe) will break "A.B" into two tokens , so your
index could contain both tags. So what you really have in the index would be
story1 A C E
story2: A B P Q (note no '.').
searching for B.A would really searc
you kind of lost me there ... i get that ~ is a binary operator, but in
both cases the intent is to say "these words must appear near eachother"
...s oi'm wondering why you cose to use "hard knocks dude":3 instead of
"hard knocks dude"~3 oh wiat, i think i get it ... was it to
eliminate amb
: > so do you convert A ! B ! C into a three clause boolean query, or a two
: > clause BooleanQuery that contains another two clause BooleanQuery?
: >
: It becomes a three clause boolean query...would there be a difference in
: scoring? I assumed not and it used to make a boolean that contained
Hi,
We are having some trouble with the results that we get from certain
queries.
Basically .. we have documents that we index, each document has a bunch of
tags, the tags could be of the sort
tags: A, B, C, D.P, E.A etc ..
Each story will contain only a subset of the tags ..
For example
Sto
: > so do you convert A ! B ! C into a three clause boolean query, or a two
: > clause BooleanQuery that contains another two clause BooleanQuery?
: >
: It becomes a three clause boolean query...would there be a difference in
: scoring? I assumed not and it used to make a boolean that contained
:
In general, if you are having performance issues with highlighting, the
first thing to do is double check what the bottleneck is: is it accessing
the text to by highlighted, or is it running the highlighter?
you suggested earlier in the thread that the problem was with accessing
the text...
: >>
Say my query is "apple banana orange". The word "apple" is near the start of
the document, "banana" and "orange" at the end. Wouldn't your optimization
stop at the word "apple" and just return this word highlighted?
Yes
Or do you know of a way to quantify the match?
I guess you could count how
So, to reply to myself with info learned (since I like reading this forum for
what people have done with lucene. kudos to cnet by the way)
In tracking down some of the speed, I was able to manage some speed
improvements
Again, the movie examples are just metaphors for what I am really working on
Jason:
Interesting idea, thanks. But how do you know whether the highlighting is
any good? I thought highlighter implemented some kind of strategy to find
the best fragment.
Say my query is "apple banana orange". The word "apple" is near the start of
the document, "banana" and "orange" at the en
Nah, see that "Don't worry about RAMDir, those guys who wrote Lucene in Action
were smoking something when they wrote that section. ;)" line from my earlier
email.
Aha, another Basis Tech person! Good, good :)
Otis
- Original Message
From: Benson Margulies <[EMAIL PROTECTED]>
To: jav
Given the ram directory inside of the IndexWriter in 2.0, is there still
any reason to stage this manually?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Increase maxBufferedDocs as much as you can (the more RAM the better).
Increase max heap for the JVM (-Xmx).
If you are on UNIX, increase the max open file descriptors limits and then
increase mergeFactor somewhat, say 100.
User -server mode for the JVM.
Don't worry about RAMDir, those guys who wr
You're right. The computer doesn't crash but slows to death.
I'm developing in my workstation and the application will run in a server, a
better machine but not that great.
The indexing will sure run at night no matter the performance but it is not
desired to take many hours.
-Original Messa
I never got to index all the data but it is too slow.
I got 3 million in 2,5 hours.
As suggested in Lucene in Action, I use ramDir and after I write 5000
documents I merge them to the fsDir.
The merge factor is now 100 I tried other variations but didn't make much
difference.
-Original Mess
Alice,
If you have a computer that crashes once you put a lot of load on it,
I'd say you have bigger problems then the speed of the indexing. A
computer should not crash, no matter how much load you put on it. If
you have such a huge database, I can't believe that you don't have
access to o
Hi Alice,
Can you define slow (hours, days, months and on what hardware)? Have
you done any profiling, etc. to see where the bottlenecks are? What
size documents are you talking about here? What are your merge
factors, etc.?
Thanks,
Grant
On Jan 11, 2007, at 10:47 AM, Alice wrote:
H
I used the following settings for speeding up indexing on a similarly sized
db table
If you have enough ram it might help you.
IndexWriter writer = *new* IndexWriter(fdDir,*new* StandardAnalyzer(), *true
*);
writer.setMergeFactor(100);
writer.setMaxMergeDocs(99);
writer.setMaxBufferedDocs(
One option: https://cool-apps-distributedindex.dev.java.net/
caveat: you would have to setup an account (you get 10CPUhr & 10GB account
upon signup)
On 1/11/07, Alice <[EMAIL PROTECTED]> wrote:
Unfortunately I can't use multiple machines.
And I cannot start lots of threads because the server
Unfortunately I can't use multiple machines.
And I cannot start lots of threads because the server crashes.
-Original Message-
From: Russ [mailto:[EMAIL PROTECTED]
Sent: quinta-feira, 11 de janeiro de 2007 14:33
To: java-user@lucene.apache.org
Subject: Re: Huge Index
Can you use multipl
Can you use multiple threads/machines to index the data into separate indexes,
and then combine them?
Russ
Sent wirelessly via BlackBerry from T-Mobile.
-Original Message-
From: "Alice" <[EMAIL PROTECTED]>
Date: Thu, 11 Jan 2007 13:47:36
To:
Subject: Huge Index
Hello!
I have to i
: It works like this: "A -B -C" would be expressed as "A ! B ! C"
: By binary, I mean that each operator must connect two clauses...in that
: case A is connected to B and C is connected to A ! B.
: I avoid the single prohibit clause issue, -query, by not really allowing
so do you convert A ! B
Hi all,
I would like to draw your attention to an open and rather devious
long-standing index corruption issue that we've only now finally
gotten to the bottom of:
https://issues.apache.org/jira/browse/LUCENE-140
If you hit this, you will typically see a "docs out of order"
IllegalStateExc
The value of the word - the word itself, should be your unique identifier.
Otis
- Original Message
From: Josh Joy <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, January 11, 2007 5:39:24 AM
Subject: Re: SpellChecker Index - remove words?
Thanks for the replyI gue
Thanks for the replyI guess my concern is that I
want to ensure that
I don't accidentally delete other words than the
intended word. Example,
if I build a custom index, I can delete a word not
based on the term
itself, though in the term I can include a "unique"
identifier as well.
Can the
Josh,
The spellchecker index is just another Lucene index, so you can delete
documents/words from it the same way you delete documents from any Lucene index
- using IndexReader's delete(...) methods. You can pass that delete method a
Term where the field name is "word" and the value is the miz
42 matches
Mail list logo