RE: Chinese support

2006-01-28 Thread Zsolt
And where can I find it? Zsolt >-Original Message- >From: Ray Tsang [mailto:[EMAIL PROTECTED] >Sent: Sunday, January 29, 2006 2:14 AM >To: java-user@lucene.apache.org >Subject: Re: Chinese support > >Hi Zsolt, > >you can try to use a Chinese analyzer. > >ray, > >On 1/28/06, Zsolt <[EMAIL

RE: problem updating a document: no segments file?

2006-01-28 Thread John Powers
i feel confident in the delete sequence. i will run the things you ask for though.this does work on my laptop. the code that changed was some update method that was used in the first release. so before the only writes needed were done by this and it wholesale replaces. whereas the n

Re: Chinese support

2006-01-28 Thread Ray Tsang
Hi Zsolt, you can try to use a Chinese analyzer. ray, On 1/28/06, Zsolt <[EMAIL PROTECTED]> wrote: > Hi, > > We use lucene without any problems even for German text bit with Chinese > text nothing is found. What is the best way to index and search Chinese > text? > > Zsolt > > >

Re: Searching Textile Documents

2006-01-28 Thread Alan Chandler
On Wednesday 23 November 2005 22:50, Erik Hatcher wrote: > > Well, the smiley is because my own frankenstein blog is a servlet, > some very simple abstraction layers, velocity templates, and > Lucene... http://www.blogscene.org/erik - though I'm a very > infrequent blogger. The categories are pick

deleting duplicate documents from my index

2006-01-28 Thread gekkokid
Hi, im trying to delete duplicate documents from my index, the unique indentifier is the documents url (aka field "url"). my initial thought of how to acomplish this is to open the index via a reader and sort them by the documents url and then iterate through them looking for a match with the c

indexing URL's from parsed HTML

2006-01-28 Thread Michael Dodson
I'm new to Lucene and I'm trying to index an HTML file parsed with NekoHTML. With text between HTML tags, its easy enough to have an overloaded getText() method which either recursively indexes all text, or which accepts the name of a tag (like "title") and only finds text between tags.

Re: encoding

2006-01-28 Thread petite_abeille
Hello, On Jan 27, 2006, at 11:44, John Haxby wrote: I've attached the perl script -- feed http://www.unicode.org/Public/4.1.0/ucd/UnicodeData.txt to it. Thanks! Works great! It's based on a slightly different principle to yours. You seem to look for things like "mumble mumble LETTER X m

Re: index concurrency & result order

2006-01-28 Thread kate
Chris Hostetter: > You don't need to use a HitCollector just to sort by a field, take a > look at the Search.search(Query,Sort) method instead. thanks - this is exactly what i needed. k. pgpolwcdVjy2K.pgp Description: PGP signature

RE: problem updating a document: no segments file?

2006-01-28 Thread Chris Hostetter
: this code works in a couple other boxes as is. that deleting code Are those boxes running the same OS? The same JVM? : removes the active index after this one builds in a different location. : then the searcher is told to make this newest one the current and the : old one is deleted. it eff

Chinese support

2006-01-28 Thread Zsolt
Hi, We use lucene without any problems even for German text bit with Chinese text nothing is found. What is the best way to index and search Chinese text? Zsolt - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional command

Re: index concurrency & result order

2006-01-28 Thread Chris Hostetter
: secondly, the existing MySQL-based search returns documents in alphabetical : order by title, instead of by relevance. i'd like to replicate this : behaviour for the (few) people who prefer the existing system; however, i'm : not sure how to do it efficiently. i see i can pass my own HitCollect