Re: Empty SinkTokenizer

2009-03-31 Thread Raymond Balmès
nation.com/search/document/deda4dd3f9041bee/the_order_of_fields_in_document_fields#bb26d84091aebcaa > > -Grant > > > On Mar 31, 2009, at 8:44 AM, Grant Ingersoll wrote: > > I'm going to bring this over to java-dev. >> >> -Grant >> >> On Mar 30, 2009, at 11:34 AM, Raymond Balmès

Re: Empty SinkTokenizer

2009-03-30 Thread Raymond Balmès
lucene 2.4.0 On Mon, Mar 30, 2009 at 2:18 PM, Grant Ingersoll wrote: > > On Mar 30, 2009, at 4:42 AM, Raymond Balmès wrote: > >> >> >> I found out that the fields are processed in alpha order... and not in >> creation order. Is there any reason for that ? >>

Re: Empty SinkTokenizer

2009-03-30 Thread Raymond Balmès
IndexWriter every time? Do you ever close it? > > It'd help to see the surrounding code... > > Best > Erick > > On Sat, Mar 28, 2009 at 1:36 PM, Raymond Balmès >wrote: > > > Hi guys, > > > > I'm using a SinkTokenizer to collect some terms of

Empty SinkTokenizer

2009-03-28 Thread Raymond Balmès
Hi guys, I'm using a SinkTokenizer to collect some terms of the documents while doing the main document indexing I attached it to a specific field (tokenized, indexed). * writer* = *new* IndexWriter(index, *my _analyzer*, create, *new*IndexWriter.MaxFieldLength(100)); doc.add(new Field("cont

Re: Different analyzer per field ?

2009-03-17 Thread Raymond Balmès
field name. You can use > this for both indexing and query. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Raymond Balmès [mailto:raymond

Different analyzer per field ?

2009-03-17 Thread Raymond Balmès
I was looking for calling a different analyzer for each field of a document... looks like it is not possible. Do I have it right ? -Ray-

Re: Problem building Lucene 2_4 with Ant/Eclipse

2009-03-08 Thread Raymond Balmès
> - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Raymond Balmès [mailto:raymond.bal...@gmail.com] > > Sent: Sunday, March 08, 2009 4:06 PM > > To: j

Problem building Lucene 2_4 with Ant/Eclipse

2009-03-08 Thread Raymond Balmès
I get the problem below in trying to build Lucene 2_4. I'm using Eclipse and just run Ant on the top build.xml. It is pretty weird because the core is indeed build, but for some reason the build stops there and I don't get any of the demos build etc... Any idea what this "svnversion" program is ?

Re: Why do range queries work on fields only ?

2009-03-04 Thread Raymond Balmès
ators for the normal term search that allow for this... but I'm new as you can see. http://www.jdocs.com/lucene/2.0.0/org/apache/lucene/search/RangeFilter.html Thx, -Raymond- On Tue, Mar 3, 2009 at 10:10 PM, Steven A Rowe wrote: > Hi Raymond, > > On 3/3/2009 at 1:19 PM, Raymond B

Re: Why do range queries work on fields only ?

2009-03-04 Thread Raymond Balmès
order. But this could get clumsy with large > numbers of terms. > > If you mean "at least one of index04...08 in the field" > that's just an OR clause. > > Best > Erick > > > On Tue, Mar 3, 2009 at 1:18 PM, Raymond Balmès >wrote: > > > so

Re: Why do range queries work on fields only ?

2009-03-03 Thread Raymond Balmès
sorry [index04 TO index 08] On Tue, Mar 3, 2009 at 7:18 PM, Raymond Balmès wrote: > Just a simplified view of my problem : > > A document contains the terms "index01 blabla index02 xxx yyy index03 ... > index10". I have the terms indexed in the collection. > I now w

Re: Why do range queries work on fields only ?

2009-03-03 Thread Raymond Balmès
3, 2009 at 6:33 PM, Steven A Rowe wrote: > Hi Raymond, > > On 3/3/2009 at 12:04 PM, Raymond Balmès wrote: > > The range query only works on fields (using a string compare)... is > > there any reason why it is not possible on the words of the document. > > > > The

Why do range queries work on fields only ?

2009-03-03 Thread Raymond Balmès
Hi all, The range query only works on fields (using a string compare)... is there any reason why it is not possible on the words of the document. The following query [stringa TO stringb] would just give the list of documents which contains words between those two strings. -RB-

Re: N-grams with numbers and Shinglefilters

2009-03-02 Thread Raymond Balmès
ay need to normalize the phrases for the search phase, so it may not work. Keep in touch, -RB- On Mon, Mar 2, 2009 at 5:23 PM, Steven A Rowe wrote: > Hi Raymond, > > On 3/2/2009 at 10:09 AM, Raymond Balmès wrote: > > suppose I have a tri-gram, what I want to do is index the tri-gram

Re: N-grams with numbers and Shinglefilters

2009-03-02 Thread Raymond Balmès
that using regex for instance. My documents look like regular html or pdf pages although some of them contains those specific tri-grams. Thx, -RB- On Mon, Mar 2, 2009 at 2:37 PM, Steven A Rowe wrote: > Hi Raymond, > > On 3/1/2009, Raymond Balmès wrote: > > I'm

N-grams with numbers and Shinglefilters

2009-03-01 Thread Raymond Balmès
Hi, I'm trying to index (& search later) documents that contain tri-grams however they have the following form: <2 digit> <2 digit> Does the ShingleFilter work with numbers in the match ? Another complication, in future features I'd like to add optional digits like [<1 digit>] <2 digit> <2 d

Re: Beginner: Specific indexing

2008-09-09 Thread Raymond Balmès
Well that is well explained in "Lucene in Action" if you want to search files you have to build a file parser and there is a good example given. So not really my problem. But I thought I could go thru the token stream only once, where I have to go twice 1. for detecting my triplets , 2. for indexi

Re: Beginner: Specific indexing

2008-09-05 Thread Raymond Balmès
I think I'm getting you. But the files I'm going to parse have many formats : PDF, HTML, Word. they don't have a particular structure, memos if you will. But the ones I'm interested in will have the triplets I described Yes building a TokenFilter as you suggest should do the job. I guess my initi

Re: Beginner: Specific indexing

2008-09-05 Thread Raymond Balmès
I understand your point, I did not say it was a Lucene problem but was rather checking if I my intended design was correct... basically not. Since I thought that I would first break my stream in token to do my special filter, I thought I could do it in one step... Interesting if you are not going

Re: Beginner: Specific indexing

2008-09-02 Thread Raymond Balmès
OK, not clear enough. I have documents in which I'm looking for 3 consecutive elements : <#1> <#2> (string1 is a predefined list) I want to disregard those without this sequence and reverse index those with these markers... it looks to me that parsing won't do the job since my documents are unst

Re: Test. Please ignore that. Fwd: How to send mail to java user

2008-09-02 Thread Raymond Balmès
I'm getting plenty of message but do you receive mine... please someone give me reply On Tue, Sep 2, 2008 at 11:36 AM, Leonid Maslov <[EMAIL PROTECTED]> wrote: > -- Forwarded message -- > From: Sankari Palanisamy <[EMAIL PROTECTED]> > Date: Tue, Sep 2, 2008 at 12:32 PM > Subject:

Re: Injecting additional tokens

2008-09-01 Thread Raymond Balmès
Is my subscription working... I got no reply on my previous question. Sorry the disturbance. On Mon, Sep 1, 2008 at 10:29 PM, Markus Lux <[EMAIL PROTECTED]> wrote: > Hi, > > Assume I have a String "z-4". That would be properly indexed by my > Analyzer, > so I'd find the belonging document if I se

Beginner: Specific indexing

2008-08-30 Thread Raymond Balmès
Hi guys, Fairly new to Lucene, and just finished reading Lucene in Action. My problem is the following I need to index the documents that only contains the following pattern(s) in a mass of documents: <#1> <#2> is a fixed list of words <#x> are small numbers <100 My idea is to simply build a

Beginner: Specific indexing

2008-08-30 Thread Raymond Balmès
Hi guys, Fairly new to Lucene, and just finished reading Lucene in Action. My problem is the following I need to index the documents that only contains the following pattern(s) in a mass of documents: <#1> <#2> is a fixed list of words <#x> are small numbers <100 My idea is to simply build a