Re: Can Span Queries contain boolean, prefix and other component queries?

2005-09-05 Thread Paul Elschot
On Monday 05 September 2005 04:38, Chris Hostetter wrote: > > : >>[Query] > : >>"Napol* Dynamite" near "film|movie" > > : >This can be done using nested SpanNearQuery's and SpanOrQuery's. > : >A PhrasePrefixQuery can not be used as a SpanQuery. > > I've never really looked at SpanQueries very ha

Multiple Language Indexing and Searching

2005-09-05 Thread Olivier Jaquemet
Hi, I'd like to go in details regarding issues that occurs when you want to index and search contents in multiple languages. I have read Lucene in Action book, and many thread on this mailing list, the most interesting so far being this one: http://mail-archives.apache.org/mod_mbox/lucene-ja

Optimize, OutOfMemory + Merge

2005-09-05 Thread Martin Rode
Hi all, The code snipset below does NOT result in an optimized index in one of my test cases. As I understand, the optimized index, means that there is only ONE segment file in the index folder. After this code has run, I sometimes have 100 segment files in the directory. When I call optimiz

Re: Optimize, OutOfMemory + Merge

2005-09-05 Thread Erik Hatcher
You should call .optimize() instead of merging. Erik On Sep 5, 2005, at 5:22 AM, Martin Rode wrote: Hi all, The code snipset below does NOT result in an optimized index in one of my test cases. As I understand, the optimized index, means that there is only ONE segment file in the ind

Re: SAME-opattor (possible newbie question)

2005-09-05 Thread Martin Malmsten
: For example, given this data: : : author: a b c : author: d e f : : a search for "a SAME c" would match the first row, but "a SAME d" would : match nothing, which is what I want. if i understand you correctly, then you are describing a use case in which the index has two documents, each co

BM25 with Lucene

2005-09-05 Thread Karl Koch
Hello all, did somebody here implement and run the BM25 algorithm with Lucene (perferably Lucene 1.2 but any information or even code about that would be very helpful on any Lucene version). Kind Regards, Karl -- Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko! Satte Provisi

TermVectorOffsetInfo class?

2005-09-05 Thread Koji Sekiguchi
Hi, I wanted to try highlighter in contrib, compiled it and I got a compile error because there isn't TermVectorOffsetInfo class which is imported by TokenSources.java: import org.apache.lucene.index.TermVectorOffsetInfo; I tried to find the issues on Bugzilla, but couldn't find them. Where can

Re: TermVectorOffsetInfo class?

2005-09-05 Thread mark harwood
It's in the latest version of Lucene in SVN. If you don't want to work with the latest version of Lucene simply remove TokenSources.java - it's an optional class for use with the highlighter and provides a way of retrieving already-parsed document tokens from the index. Instead, you can simply run

RE: TermVectorOffsetInfo class?

2005-09-05 Thread Koji Sekiguchi
Hi Mark, Thank you for your advice. I want to work with current version - 1.4.3 so I simply deleted the class and could compile highlighter. Thank you, Koji > -Original Message- > From: mark harwood [mailto:[EMAIL PROTECTED] > Sent: Tuesday, September 06, 2005 12:44 AM > To: java-user

Deleting All Documents With Certain Field Name

2005-09-05 Thread Luke
Would this not delete all records from the index that have a saleDate field? reader.delete(new Term("salesDate", "")); Thanks, Luke - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Deleting All Documents With Certain Field Name

2005-09-05 Thread Otis Gospodnetic
No. The delete method deletes all Documents with _matching_ terms. Otis --- Luke <[EMAIL PROTECTED]> wrote: > Would this not delete all records from the index that have a saleDate > field? > > reader.delete(new Term("salesDate", "")); > > Thanks, > > Luke > > >

List of values from refix query

2005-09-05 Thread Axel
Hi Assuming that in the indexing process I setup 3 different documents doc1, doc2, doc3. with something like: doc1.add(Field.Keyword("variable", "var_no1")); doc1.add(Field.Keyword("variable", "var_test1")); doc2.add(Field.Keyword("variable", "var_no2")); doc2.add(Field.Keyword("variable", "var_

Re: List of values from refix query

2005-09-05 Thread Otis Gospodnetic
That looks correct. That's what PrefixQuery is for. If you use QueryParser and give if "var*", QP will convert that to PrefixQuery for you. Otis --- Axel <[EMAIL PROTECTED]> wrote: > Hi > > Assuming that in the indexing process I setup 3 different documents > doc1, doc2, doc3. > > with somet

Re: SAME-opattor (possible newbie question)

2005-09-05 Thread Chris Hostetter
: > : For example, given this data: : > : : > : author: a b c : > : author: d e f : > : a search for "a SAME c" would match the first row, but "a SAME d" : > would : > : match nothing, which is what I want. : No, both fields are in the same document. Which is also why proximity : does not work.

Re: List of values from refix query

2005-09-05 Thread Chris Hostetter
: How can I get all values across the documents with a given prefix? : For prefix = "var" for example I would like to have a list of all 5 values. : : For prefix = "var_no" for example I would like to have a list of the values : {"var_no1", "var_no2", "var_no3"}. if you just want the values, you

Highlighter apply to Japanese

2005-09-05 Thread Koji Sekiguchi
Hi again, I'm using highlighter to highlight terms in Japanese text, but I cannot get preferable output. If I use StandardAnalyzer or SnowballAnalyzer w/ English, getBestFragment() returns preferable outputs: Sample: (SnowballAnalyzer) Text: A meeting will be held in the City Hall TokenStream: [

Multi-lang analyzer? Re: Multiple Language Indexing and Searching

2005-09-05 Thread Hacking Bear
Hi, I have the similar problem to deal with. In fact, a lot of times, the documents do not have any lanugage information or it may contain text in multiple languages. Further, the user would not like to always supply this information. Also the user may very well be interested in documents in m

use of Luke s getHighFreqTerms

2005-09-05 Thread Nils Hoeller
Hi, i ve got only one little question: I m using the class HighFreqTerms of the Luke Project to find those terms in my index ( made by Nutch) Now I wanted to filter the Terms with a stopwordlist (junkwords). The method getHighFreqTerms gives me the ability to define a Hashtable junkwords ,

Hits document offset information? Span query or Surround?

2005-09-05 Thread Sean O'Connor
I believe I have heard that Span queries provide some way to access document offset information for their hits somehow. Does anyone know if this is true, and if so, how I would go about it? Alternatively (preferably actually) does the surround code from the SVN development area have a way of r

Re: Highlighter apply to Japanese

2005-09-05 Thread markharw00d
I don't know the behaviour of the Japanese Analyzer you are using. Can you add to your example diagnosis the Token.getPositionIncrement, Token.startOffset and Token.endOffset for each of the tokens? The highlighter groups tokens with overlapping start and end offsets into a single TokenGroup f

Re: Hits document offset information? Span query or Surround?

2005-09-05 Thread markharw00d
>>I believe I have heard that Span queries provide some way to access document offset information for their hits somehow. See http://marc.theaimsgroup.com/?l=lucene-user&m=112496111224218&w=2 Faithfully selecting extracts based *exactly* on query criteria will be hard given complex queries eg

Re: Highlighter apply to Japanese

2005-09-05 Thread Chris Lu
Hi, Koji, I had the same problem as you. This is because CJK's n-gram analysis is different from single character's. My get around is to use CJKHighlighter and CJKHighlightAnalyzer in sandbox. -- Chris Lu Lucene Search RAD on Any Database http://www.dbsight.net On 9/5/05, Koji Se