Re: how to get newest library version?

2005-08-23 Thread Paul Elschot
On Tuesday 23 August 2005 23:45, Peter Veentjer - Anchor Men wrote: > Does anyone know how I can download the newest version of Lucene from the SVN? I have been trying (even the website) but I only get timeouts. I would even be happy with a newly build jar (based on the newest sources). So I help

Re: Lucene and Xanga.com

2005-08-23 Thread Otis Gospodnetic
Nicely done, looks pretty and seems fast. How much data is being searched there? Otis --- Monsur Hossain <[EMAIL PROTECTED]> wrote: > Hey all. We just relaunched our search feature over here at > Xanga.com; the > Blogs, Metros and Blogrings sections are powered by Lucene.NET! You > can > che

Re: post-normalization score filter

2005-08-23 Thread Chris Hostetter
It doesn't look like there were any replies to this while i've been away, so i just wanted to point out that this isn't really a practicle thing to do because the score's don't have any meaningfulll absolute value (ie: you can't compare the scores from one search with the scores of another). the

Lucene and Xanga.com

2005-08-23 Thread Monsur Hossain
Hey all. We just relaunched our search feature over here at Xanga.com; the Blogs, Metros and Blogrings sections are powered by Lucene.NET! You can check it out here: http://search.xanga.com/ This is only the beginning of what we want to do with search and Lucene. I want to thank everyone on th

how to get newest library version?

2005-08-23 Thread Peter Veentjer - Anchor Men
Does anyone know how I can download the newest version of Lucene from the SVN? I have been trying (even the website) but I only get timeouts. I would even be happy with a newly build jar (based on the newest sources). So I help someone can help me out so I can remove a MultiFieldQueryParser bug

Example of Field.TermVector.WITH_POSITIONS_OFFSETS usage?

2005-08-23 Thread Sean O'Connor
Hello, I am trying to work through term positions and how to get them from a collection of hits. Does setting TermVector.WITH_POSITIONS_OFFSETS to true save the start/end position of the term in the source text file? (I _think_ it does). If so, where would I start for trying to make th

Re: QueryParser not thread-safe

2005-08-23 Thread jian chen
Right. My philosophy is that, make it work, then, make it better. Don't waste time on something that you are not sure if it would cause performance problem. Jian On 8/23/05, Paul Elschot <[EMAIL PROTECTED]> wrote: > On Tuesday 23 August 2005 19:01, Miles Barr wrote: > > On Tue, 2005-08-23 at 13

Re: i18n query normalization

2005-08-23 Thread Ken Krugler
We have a multi-languaged index and we need to match accented characters with non accented characters. For example, if a document contains: mângão, the query: mangao should match it. I guess I would have to build some sort of analyzer/tokenizer for this. I was wondering if there are

Re: i18n query normalization

2005-08-23 Thread Daniel Naber
On Tuesday 23 August 2005 19:15, John Wang wrote: >  We have a multi-languaged index and we need to match accented > characters with non accented characters. For example, if a document > contains: mângão, the query: mangao should match it. See ISOLatin1AccentFilter in contrib/analyzers in SVN. r

Re: QueryParser not thread-safe

2005-08-23 Thread Paul Elschot
On Tuesday 23 August 2005 19:01, Miles Barr wrote: > On Tue, 2005-08-23 at 13:47 -0300, [EMAIL PROTECTED] wrote: > > Hi! I've been having problems with lucene's QueryParser, apparently it is not thread-safe. > > > > That means I can't parse queries in threads where the queryparser object is cre

Re: QueryParser not thread-safe

2005-08-23 Thread Luke Francl
On Tue, 2005-08-23 at 12:01, Miles Barr wrote: > Using a non-threadsafe object in a threaded environment is fairly > standard in Java, just wrap it in a synchronized block. > > If you don't want all threads waiting on one query parser, create a pool > of them. Based on doing the simplest possib

i18n query normalization

2005-08-23 Thread John Wang
Hi: We have a multi-languaged index and we need to match accented characters with non accented characters. For example, if a document contains: mângão, the query: mangao should match it. I guess I would have to build some sort of analyzer/tokenizer for this. I was wondering if there a

Re: QueryParser not thread-safe

2005-08-23 Thread Miles Barr
On Tue, 2005-08-23 at 13:47 -0300, [EMAIL PROTECTED] wrote: > Hi! I've been having problems with lucene's QueryParser, apparently it is not > thread-safe. > > That means I can't parse queries in threads where the queryparser object is > created once and reused for each query. If I do, the resul

Re: hslf ppt files

2005-08-23 Thread Nick Burch
On Tue, 23 Aug 2005, Derya Kasapoglu wrote: is there anybody who have the poi hslf classes to extract text from Power Point files. I know the classes are on the poi sites but they are not packaged in a jar! You'll need to either download it yourself from CVS and compile with ant, or grab a ni

QueryParser not thread-safe

2005-08-23 Thread jhandl
Hi! I've been having problems with lucene's QueryParser, apparently it is not thread-safe. That means I can't parse queries in threads where the queryparser object is created once and reused for each query. If I do, the resulting queries may have all kinds of weird problems, for example missin

Re: WhiteSpace Tokenizer question

2005-08-23 Thread Yonik Seeley
It's the QueryParser, not the Analyzer. When the query parser sees multiple tokens from what looks like a single word, it puts them in a phrase query. I think the only way to change that behavior would be to modify the QueryParser. -Yonik On 8/23/05, Dan Armbrust <[EMAIL PROTECTED]> wrote: > I w

RE: WhiteSpace Tokenizer question

2005-08-23 Thread Vanlerberghe, Luc
The query string is first parsed by QueryParser and what it believes to be single terms are then passed on to your analyzer. QueryParser only considers space, tab, \n and \r to be white space (See QueryParser.jj) QueryParser itself is not aware that '-' should be treated as white space so in your

WhiteSpace Tokenizer question

2005-08-23 Thread Dan Armbrust
I wrote a slightly modified version of the WhiteSpaceTokenizer that allows me to treat other characters as whitespace. My thought was that this would be an easy way to make it tokenize on characters such as "-". My tokenizer looks like this: public class CustomWhiteSpaceTokenizer extends Char

hslf ppt files

2005-08-23 Thread Derya Kasapoglu
Hi, is there anybody who have the poi hslf classes to extract text from Power Point files. I know the classes are on the poi sites but they are not packaged in a jar! If i download all of them by myself i get version problems! So maybe someone has a jar file and can send me? Thanks in forward By

Re: Why is delete() part of IndexREADER?

2005-08-23 Thread Cheolgoo Kang
It's because of Lucene's index structure. IndexWriter creates a new segment(one Lucene index is composed of several segments) when a document added and doesn't care about old indexes already exist. So, IndexWriter should not have delete() operation for old indexes. And so, the IndexReader have cont

Re: Hierarchical Documents

2005-08-23 Thread Dan Funk
People indexing XML documents tend to deal with the same kind of problem, there is an excellent article at the URL below showing how they handled some fairly complex hierarchical queries. http://www.idealliance.org/papers/xmle02/dx_xmle02/papers/03-02-08/03-02-08.html Rohit Lodha wrote: Hi A

Re: UpdateIndex

2005-08-23 Thread Miles Barr
On Tue, 2005-08-23 at 13:53 +0200, Derya Kasapoglu wrote: > Thank you for your help!!! > > I try it without Analyzer! > > document.add(Field.Keyword("path", file[i].getAbsolutePath())); > > then > > Term term = new Term("path", file[i].getAbsolutePath()); > Query query = new TermQuery(term); >

Re: UpdateIndex

2005-08-23 Thread Derya Kasapoglu
Thank you for your help!!! I try it without Analyzer! document.add(Field.Keyword("path", file[i].getAbsolutePath())); then Term term = new Term("path", file[i].getAbsolutePath()); Query query = new TermQuery(term); reader.delete(term); so is better! :) and it works > --- Ursprüngliche N

Re: UpdateIndex

2005-08-23 Thread Derya Kasapoglu
I meant the reader.hasDeletions() returns null and reader.delete(term) returns 0. So...! I store the path that way in the index: document.add(Field.Text("pathLC", file[i].getAbsolutePath())); and i use the StandardAnalyzer. I can not search for the path if i store it as Keyword like that: document

Re: UpdateIndex

2005-08-23 Thread Miles Barr
On Tue, 2005-08-23 at 12:54 +0200, Derya Kasapoglu wrote: > Yes, it returns null. > But this is a little bit funny because the searching is correct > and it finds the document whitch have changed! > So want can i do!? > > Is there an opportunity to get the document id? It can't return null since

Re: UpdateIndex

2005-08-23 Thread Derya Kasapoglu
Yes, it returns null. But this is a little bit funny because the searching is correct and it finds the document whitch have changed! So want can i do!? Is there an opportunity to get the document id? > --- Ursprüngliche Nachricht --- > Von: Miles Barr <[EMAIL PROTECTED]> > An: java-user@lucene.a

Re: UpdateIndex

2005-08-23 Thread Miles Barr
On Tue, 2005-08-23 at 12:38 +0200, Derya Kasapoglu wrote: > i query the index for the path of the files in the directory and compare the > dates. > But i have a Problem! > I find out the files which have changed but i can not delete the documet > from the index, i don't know why! > > In the Field

Re: UpdateIndex

2005-08-23 Thread Derya Kasapoglu
Hi, i'm writing the deletion now and i do it that way: i query the index for the path of the files in the directory and compare the dates. But i have a Problem! I find out the files which have changed but i can not delete the documet from the index, i don't know why! In the Field "pathLC" is he

Re: Hierarchical Documents

2005-08-23 Thread Paul . Illingworth
I have been struggling with this sort of problem for some time and still haven't got an ideal solution. Initially I was going to go for the approach Erik has suggested for similar reasons - it allowed me to search within categories and within sub categories of those categories very simply. Un

Re: Hierarchical Documents

2005-08-23 Thread Erik Hatcher
On Aug 22, 2005, at 2:27 AM, Rohit Lodha wrote: Currently, Documents cannot contain other documents. I have a Graph of Objects (Documents) to search in. I could flatten them and search but... Is there any nice way to do it? I have used a technique of encoding a hierarchical path (like "/ ca

Re: Why is delete() part of IndexREADER?

2005-08-23 Thread Ray Tsang
I have come to peace with this problem. Basically, I think it's because you need to read/find what you are deleting first? hehe Writer just need to write whatever it's been told to write. ray, On 8/23/05, Mikko Noromaa <[EMAIL PROTECTED]> wrote: > Hi, > > Why IndexReader allows me to do write-

Why is delete() part of IndexREADER?

2005-08-23 Thread Mikko Noromaa
Hi, Why IndexReader allows me to do write-operations like delete? I'd think this should be part of the IndexWriter class instead. I had created a wrapper class that callers can open for either writing or searching. It creates either an IndexWriter or an IndexSearches and stores that inside itself