On Monday 05 September 2005 04:38, Chris Hostetter wrote:
>
> : >>[Query]
> : >>"Napol* Dynamite" near "film|movie"
>
> : >This can be done using nested SpanNearQuery's and SpanOrQuery's.
> : >A PhrasePrefixQuery can not be used as a SpanQuery.
>
> I've never really looked at SpanQueries very ha
Hi,
I'd like to go in details regarding issues that occurs when you want to
index and search contents in multiple languages.
I have read Lucene in Action book, and many thread on this mailing list,
the most interesting so far being this one:
http://mail-archives.apache.org/mod_mbox/lucene-ja
Hi all,
The code snipset below does NOT result in an optimized index in one of
my test cases. As I understand, the optimized index, means that there is
only ONE segment file in the index folder. After this code has run, I
sometimes have 100 segment files in the directory.
When I call optimiz
You should call .optimize() instead of merging.
Erik
On Sep 5, 2005, at 5:22 AM, Martin Rode wrote:
Hi all,
The code snipset below does NOT result in an optimized index in one
of my test cases. As I understand, the optimized index, means that
there is only ONE segment file in the ind
: For example, given this data:
:
: author: a b c
: author: d e f
:
: a search for "a SAME c" would match the first row, but "a SAME d"
would
: match nothing, which is what I want.
if i understand you correctly, then you are describing a use case
in which
the index has two documents, each co
Hello all,
did somebody here implement and run the BM25 algorithm with Lucene
(perferably Lucene 1.2 but any information or even code about that would be
very helpful on any Lucene version).
Kind Regards,
Karl
--
Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
Satte Provisi
Hi,
I wanted to try highlighter in contrib, compiled it and
I got a compile error because there isn't TermVectorOffsetInfo
class which is imported by TokenSources.java:
import org.apache.lucene.index.TermVectorOffsetInfo;
I tried to find the issues on Bugzilla, but couldn't find them.
Where can
It's in the latest version of Lucene in SVN.
If you don't want to work with the latest version of
Lucene simply remove TokenSources.java - it's an
optional class for use with the highlighter and
provides a way of retrieving already-parsed document
tokens from the index. Instead, you can simply run
Hi Mark,
Thank you for your advice.
I want to work with current version - 1.4.3
so I simply deleted the class and could compile highlighter.
Thank you,
Koji
> -Original Message-
> From: mark harwood [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 06, 2005 12:44 AM
> To: java-user
Would this not delete all records from the index that have a saleDate field?
reader.delete(new Term("salesDate", ""));
Thanks,
Luke
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
No. The delete method deletes all Documents with _matching_ terms.
Otis
--- Luke <[EMAIL PROTECTED]> wrote:
> Would this not delete all records from the index that have a saleDate
> field?
>
> reader.delete(new Term("salesDate", ""));
>
> Thanks,
>
> Luke
>
>
>
Hi
Assuming that in the indexing process I setup 3 different documents
doc1, doc2, doc3.
with something like:
doc1.add(Field.Keyword("variable", "var_no1"));
doc1.add(Field.Keyword("variable", "var_test1"));
doc2.add(Field.Keyword("variable", "var_no2"));
doc2.add(Field.Keyword("variable", "var_
That looks correct. That's what PrefixQuery is for. If you use
QueryParser and give if "var*", QP will convert that to PrefixQuery for
you.
Otis
--- Axel <[EMAIL PROTECTED]> wrote:
> Hi
>
> Assuming that in the indexing process I setup 3 different documents
> doc1, doc2, doc3.
>
> with somet
: > : For example, given this data:
: > :
: > : author: a b c
: > : author: d e f
: > : a search for "a SAME c" would match the first row, but "a SAME d"
: > would
: > : match nothing, which is what I want.
: No, both fields are in the same document. Which is also why proximity
: does not work.
: How can I get all values across the documents with a given prefix?
: For prefix = "var" for example I would like to have a list of all 5 values.
:
: For prefix = "var_no" for example I would like to have a list of the values
: {"var_no1", "var_no2", "var_no3"}.
if you just want the values, you
Hi again,
I'm using highlighter to highlight terms in Japanese text,
but I cannot get preferable output.
If I use StandardAnalyzer or SnowballAnalyzer w/ English,
getBestFragment() returns preferable outputs:
Sample: (SnowballAnalyzer)
Text: A meeting will be held in the City Hall
TokenStream:
[
Hi,
I have the similar problem to deal with. In fact, a lot of times, the
documents do not have any lanugage information or it may contain text in
multiple languages. Further, the user would not like to always supply this
information. Also the user may very well be interested in documents in
m
Hi,
i ve got only one little question:
I m using the class HighFreqTerms of the Luke Project to
find those terms in my index ( made by Nutch)
Now I wanted to filter the Terms with a
stopwordlist (junkwords).
The method getHighFreqTerms gives me the ability
to define a Hashtable junkwords ,
I believe I have heard that Span queries provide some way to access
document offset information for their hits somehow. Does anyone know if
this is true, and if so, how I would go about it?
Alternatively (preferably actually) does the surround code from the SVN
development area have a way of r
I don't know the behaviour of the Japanese Analyzer you are using.
Can you add to your example diagnosis the Token.getPositionIncrement,
Token.startOffset and Token.endOffset for each of the tokens?
The highlighter groups tokens with overlapping start and end offsets
into a single TokenGroup f
>>I believe I have heard that Span queries provide some way to access
document offset information for their hits somehow.
See http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2
Faithfully selecting extracts based *exactly* on query criteria will be
hard given complex queries eg
Hi, Koji,
I had the same problem as you. This is because CJK's n-gram analysis
is different from single character's.
My get around is to use CJKHighlighter and CJKHighlightAnalyzer in sandbox.
--
Chris Lu
Lucene Search RAD on Any Database
http://www.dbsight.net
On 9/5/05, Koji Se
22 matches
Mail list logo