Re: 30 milllion+ docs on a single server

2006-08-12 Thread Ray Tsang
i've indexed 80m records and now up to 200m.. it can be done, and could've been done better. like the other said, architecture is important. have you considered looking into solr? i haven't kept up with it (and many of the mailing lists...), but looks very interesting. ray, On 8/12/06, Jason

Re: Lucene as syslog storage

2006-06-18 Thread Ray Tsang
I think it ultimately depends on what you would like to do with the stored data? Would you need more of full text searches on the log or more of statistical anlaysis? ray, On 6/18/06, Andreas Moroder <[EMAIL PROTECTED]> wrote: Hello, I would like to write a application to browse around and se

Re: CJKAnalyzer - does it work?

2006-06-15 Thread Ray Tsang
Hi Erik, Where did you get that chinese sentence from? That's funny! haha. ray, On 6/15/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: Rob, Your example is, hopefully, not exact since you used "C1..." which I presume was not what you originally tested with. CJKAnalyzer is working fine for me i

Re: The best Chinese Analyzer?

2006-05-08 Thread Ray Tsang
Hi Bob, In short, I use a slightly modified ChineseAnalyzer to index chinese text. They differ mainly in the way they tokenize the text. StandardAnalyzer is inteded to use w/ Latin-based languages, that each word composes of multiple characters, and each word is separated by special markers such

Re: Chinese support

2006-01-29 Thread Ray Tsang
nd where can I find it? > > Zsolt > > >-Original Message- > >From: Ray Tsang [mailto:[EMAIL PROTECTED] > >Sent: Sunday, January 29, 2006 2:14 AM > >To: java-user@lucene.apache.org > >Subject: Re: Chinese support > > > >Hi Zsolt, > >

Re: Chinese support

2006-01-28 Thread Ray Tsang
Hi Zsolt, you can try to use a Chinese analyzer. ray, On 1/28/06, Zsolt <[EMAIL PROTECTED]> wrote: > Hi, > > We use lucene without any problems even for German text bit with Chinese > text nothing is found. What is the best way to index and search Chinese > text? > > Zsolt > > >

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
t; > I've a bit of experience with search engines, but I'm obviously still > learning thanks to this group. > > Peter > > On 1/26/06, Ray Tsang <[EMAIL PROTECTED]> wrote: > > > > Peter, > > > > Wow, the speed up in impressive! But may I ask wh

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
Peter, Wow, the speed up in impressive! But may I ask what did you do to achieve 135 queries/sec prior to the JVM swich? ray, On 1/27/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Correction: make that 285 qps :) > > On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > > > > I tried the AMD64-b

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
Speaking of NioFSDirectory, I thought there was one posted a while ago, is this something that can be used? http://issues.apache.org/jira/browse/LUCENE-414 ray, On 11/22/05, Doug Cutting <[EMAIL PROTECTED]> wrote: > Jay Booth wrote: > > I had a similar problem with threading, the problem turned o

Re: Lucene 1.9 release date?

2005-10-14 Thread Ray Tsang
Can we add a 1.9 release to the roadmap? or start a 1.9 release tracker issue? ray, On 10/15/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > Olivier, > > Your best bet it to discuss this on the java-dev list. At this > point, we haven't a firm volunteer for pushing out a release. I'd be > thrille

Re: Lucene database bindings

2005-09-18 Thread Ray Tsang
I must admit that I have not downloaded the source yet. But a quick question, does it deal w/ aggregate functions and group by clauses? Thanks! Ray, On 9/17/05, markharw00d <[EMAIL PROTECTED]> wrote: > >>Basically your lucene_query function will return a true/false in one > of the query predica

Re: Search Results Clustering

2005-08-30 Thread Ray Tsang
I had similar requirements of "count" and "group by" on over 130mil records, it's really a pain. It's currently usable but not satisfactory. Currently it's grouping at run-time by iterating through ungrouped items. It collects matching documents into BitSet, so subsequent queries can use BitSet

Re: Why is delete() part of IndexREADER?

2005-08-23 Thread Ray Tsang
I have come to peace with this problem. Basically, I think it's because you need to read/find what you are deleting first? hehe Writer just need to write whatever it's been told to write. ray, On 8/23/05, Mikko Noromaa <[EMAIL PROTECTED]> wrote: > Hi, > > Why IndexReader allows me to do write-

Re: UpdateIndex

2005-08-22 Thread Ray Tsang
This could be off topic, but I made something that updates indices that worked like the following, wonder if anybody has the same ideas? I found something like IndexAccessControl in the mailing list before. An implementation of the following uses IAC. ManagedIndex index = ManagedIndex.getInstanc

Re: hit count within categories

2005-07-31 Thread Ray Tsang
I also had similar problem. It was essentially a 'group by'-like requirement.I used both get(fieldName) and getTermFreqVector(...), it seemed that get(fieldName) on a page of results (say, 10 results per page) was faster than getTermFreqVector() for me. ray, On 7/29/05, mark harwood <[EMAIL

Re: Lucene and numerical fields search

2005-07-19 Thread Ray Tsang
; suggest them for inclusion to the core or contrib (by attaching them to > a new entry in Bugzilla). > > Otis > > > --- Ray Tsang <[EMAIL PROTECTED]> wrote: > > > It seems TooManyClauses is a potential problem for any query that > > expands to a series o

Re: Lucene and numerical fields search

2005-07-12 Thread Ray Tsang
It seems TooManyClauses is a potential problem for any query that expands to a series of OR'ed boolean queries (PrefixQuery, WildcardQuery, RangeQuery...). If the max was set too high, the inefficiency would make the search unsable. I kind of worked around this by creating a BitSetQuery, and exte