Re: How to build your custom termfreq vector an add it to the field ?

2007-11-09 Thread Grant Ingersoll
Not really sure what to tell you other than you need to dig in and look at how the other Query classes are implemented. I would start with TermQuery/TermScorer. One thing I did to get to know the scoring was to go through and document it the best I could (given the time I had) as pseudocod

Re: Chinese Segmentation with Phase Query

2007-11-09 Thread Cedric Ho
On Nov 10, 2007 2:08 AM, Steven A Rowe <[EMAIL PROTECTED]> wrote: > Hi Cedric, > > On 11/08/2007, Cedric Ho wrote: > > a sentence containing characters ABC, it may be segmented into AB, C or A, > > BC. > [snip] > > In this cases we would like to index both segmentation into the index: > > > > AB o

Re: Chinese Segmentation with Phase Query

2007-11-09 Thread Cedric Ho
The CJKAnalyzer is too simple for our need. But thanks for suggesting anyway. Cheers, Cedric On Nov 9, 2007 10:43 PM, Open Study <[EMAIL PROTECTED]> wrote: > Hi Cedric > > You may try the CJKAnalyzer within the lucene sandbox. It doesn't give > a perfect solution for Chinese word segmentation, bu

Re: TermDocs.skipTo error

2007-11-09 Thread Michael Busch
Mike Streeton wrote: > I have just tried this again using the index I built with lucene 2.1 but > running the test using lucene 2.2 and it works okay, so it seems to be > something related to an index built using lucene 2.2. > > Mike > Hi Mike, does this also happen with the current trunk ver

Re: How to check which field contains Term

2007-11-09 Thread Lukasz Rzeszotarski
Thanks for yours response, Probably, I will use this class in my project.

RE: Chinese Segmentation with Phase Query

2007-11-09 Thread Steven A Rowe
Hi Cedric, On 11/08/2007, Cedric Ho wrote: > a sentence containing characters ABC, it may be segmented into AB, C or A, BC. [snip] > In this cases we would like to index both segmentation into the index: > > AB offset (0,1) position 0A offset (0,0) position 0 > C offset (2,2) position

Comparing Two Indexes

2007-11-09 Thread Lucene User
Hi, I wanted two compare two indexes.Please recommend an algorithm which takes all the factors into accoubt such as versions of software being used by lucene and application which has an effect on the index being created.We can also compare with certain fields and the text. Regards --

Obtaining the number of segments in an index?

2007-11-09 Thread Lucifer Hammer
Hi, Is there a way to get the number of segments in an index? I looked at the API's for the reader, writer and searcher, but didn't find anything. Thanks, Lucifer - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comma

RE: TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
I have just tried this again using the index I built with lucene 2.1 but running the test using lucene 2.2 and it works okay, so it seems to be something related to an index built using lucene 2.2. Mike -Original Message- From: Mike Streeton [mailto:[EMAIL PROTECTED] Sent: 09 November 2

RE: TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
I have tried this again using Lucene 2.1 and as Erick found it works okay, I have tried it on jdk 1.6 u1 and u3 both work, but both fail when using lucene 2.2 Mike -Original Message- From: Mike Streeton [mailto:[EMAIL PROTECTED] Sent: 09 November 2007 16:05 To: java-user@lucene.apache.o

Re: - lock improvement suggestion

2007-11-09 Thread Nikolay Diakov
I see you do the wrapping in a RuntimeException trick. Perhaps you can introduce a special exception derived from RuntimeException that you would throw in that case. It would basically mean "The underlying FS does something we cannot tolerate so we fail fast." --Nikolay Michael McCandless wro

RE: TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
Erick, Sorry the numbers are just printed out for debugging when it is building the index. I will try it with lucene 2.1 and see what happens Thanks Mike -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 09 November 2007 15:59 To: java-user@lucene.apache.org Sub

Re: TermDocs.skipTo error

2007-11-09 Thread Erick Erickson
FWIW, running Lucene 2.1, Java 1.5 all I get is some numbers being printed out 0 1 2 . . . 90,000 and ran through the above 4 times or so Erick On Nov 9, 2007 5:51 AM, Mike Streeton <[EMAIL PROTECTED]> wrote: > I have posted before about a problem with TermDocs.skipTo () but never

Re: - lock improvement suggestion

2007-11-09 Thread Michael McCandless
I agree, we should not ignore the return value here. I think throwing an exception if it returns false is the right thing to do? Though, if it's a checked exception, that's not a backwards compatible change... Mike "Nikolay Diakov" <[EMAIL PROTECTED]> wrote: > I have briefly reviewed the Simpl

- lock improvement suggestion

2007-11-09 Thread Nikolay Diakov
I have briefly reviewed the SimpleFSLock of Lucene 2.1 and 2.2. I see that the lock release mechanism does not check the return value of delete: public void release() { lockFile.delete(); } On most linux-es this can never return false, however under some windows FS if someone (a virus

Re: Chinese Segmentation with Phase Query

2007-11-09 Thread Open Study
Hi Cedric You may try the CJKAnalyzer within the lucene sandbox. It doesn't give a perfect solution for Chinese word segmentation, but will solve the problem in your case. On Nov 9, 2007 10:59 AM, Cedric Ho <[EMAIL PROTECTED]> wrote: > Hi, > > We are having an issue while indexing Chinese Documen

Re: Create and populate a field when indexing

2007-11-09 Thread KR
Grant Ingersoll-6 wrote: > > When you are indexing the file and adding the Document, you will need > to parse out your filename per your regular expression, and then > create the appropriate field: > > Document doc = new Document() > String cat = getCategoryFromFileName(inputFileName) > doc

TermDocs.skipTo error

2007-11-09 Thread Mike Streeton
I have posted before about a problem with TermDocs.skipTo () but never managed to reproduce it. I have now got it to fail using the following program, please can someone try it and see if they get the stack trace: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Array index

Re: Can multiple JVMs search an index with one JVM writing to it?

2007-11-09 Thread Koji Sekiguchi
It should work. See the following FAQ: http://wiki.apache.org/lucene-java/LuceneFAQ#head-6c56b0449d114826586940dcc6fe51582676a36e regards, Koji Matt Magoffin wrote: Hello, I tried finding information about this from past mailing list emails, but couldn't find anything. I'm using Lucene 1.4 se