Table Defn and/or ER Diagram of Segment files

2011-12-16 Thread Dr. Ray Hoare
Is there an entity-relationship of the segment files and/or Berkeley DB tables (with table definitions)? I'm trying understand the segment files of Lucene and know that a Berkeley DB can be used to store the directory but can't locate any ER diagram or table definitions for the DB.

RE: Lucene - Text Classification.

2009-11-09 Thread Lukas, Ray
There is one on Salmon Run that I am using.. it seems to work pretty well.. add the words "Salmon Run" to your Google search.. -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Monday, November 09, 2009 10:41 AM To: java-user@lucene

RE: ebook resources - including lucene in action

2009-04-21 Thread Lukas, Ray
systems deserve a certain amount of respect and justice. They are stealing and harming the very people that make our projects and the companies we work for possible. Is a simple 30 or 40 dollars in recognition for what these guys have done to much to ask? To do what is right... Ray -Original

Query against newly created index.. Do not work

2009-02-27 Thread Lukas, Ray
I can now create indexes with Nutch, and see them in Luke.. this is fantastic news, well for me it is beyond fantastic.. Now I would like to (need to) query them, and to that end I wrote the following code segment. int maxHits = 1000; NutchBean nutchBean = new N

Re: MultiSearcher to overcome the Integer.MAX_VALUE limit

2008-03-06 Thread Ray
you all who are saying 2 billion docs will bring lucene to its knees are wrong... Ray. - Original Message - From: "Erick Erickson" <[EMAIL PROTECTED]> To: Sent: Thursday, March 06, 2008 10:40 PM Subject: Re: MultiSearcher to overcome the Integer.MAX_VALUE limit We

MultiSearcher to overcome the Integer.MAX_VALUE limit

2008-03-06 Thread Ray
elves ? Kind regards, Ray.

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Ray Tsang
i've indexed 80m records and now up to 200m.. it can be done, and could've been done better. like the other said, architecture is important. have you considered looking into solr? i haven't kept up with it (and many of the mailing lists...), but looks very interesting. ray, On

Re: Lucene as syslog storage

2006-06-18 Thread Ray Tsang
I think it ultimately depends on what you would like to do with the stored data? Would you need more of full text searches on the log or more of statistical anlaysis? ray, On 6/18/06, Andreas Moroder <[EMAIL PROTECTED]> wrote: Hello, I would like to write a application to browse arou

Re: CJKAnalyzer - does it work?

2006-06-15 Thread Ray Tsang
Hi Erik, Where did you get that chinese sentence from? That's funny! haha. ray, On 6/15/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: Rob, Your example is, hopefully, not exact since you used "C1..." which I presume was not what you originally tested with. CJKAnalyzer i

Re: The best Chinese Analyzer?

2006-05-08 Thread Ray Tsang
rs/src/java/org/apache/lucene/analysis/cn/ChineseTokenizer.java?rev=353930&view=markup) "C1C2C3C4" -> "C1" "C2" "C2" "C3" "C3" "C4" The most obvious result of these 3 tokenization tokenization strategies is the searc

Re: Chinese support

2006-01-29 Thread Ray Tsang
Zsolt, It's in the lucene trunk under the contrib/ directory, you can check it out from the repository, take a look at http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/analyzers/src/java/org/apache/lucene/analysis/ ray, On 1/29/06, Zsolt <[EMAIL PROTECTED]> wrote: > A

Re: Chinese support

2006-01-28 Thread Ray Tsang
Hi Zsolt, you can try to use a Chinese analyzer. ray, On 1/28/06, Zsolt <[EMAIL PROTECTED]> wrote: > Hi, > > We use lucene without any problems even for German text bit with Chinese > text nothing is found. What is the best way to index and search Chinese

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
Paul, Thanks for the advice! But for the 100+queries/sec on a 32-bit platfrom, did you end up applying other patches? or use different FSDirectory implementations? Thanks! ray, On 1/27/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Ray, > > The short answer is that you can make L

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
Peter, Wow, the speed up in impressive! But may I ask what did you do to achieve 135 queries/sec prior to the JVM swich? ray, On 1/27/06, Peter Keegan <[EMAIL PROTECTED]> wrote: > Correction: make that 285 qps :) > > On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote: >

Re: Throughput doesn't increase when using more concurrent threads

2006-01-26 Thread Ray Tsang
Speaking of NioFSDirectory, I thought there was one posted a while ago, is this something that can be used? http://issues.apache.org/jira/browse/LUCENE-414 ray, On 11/22/05, Doug Cutting <[EMAIL PROTECTED]> wrote: > Jay Booth wrote: > > I had a similar problem with threading, the

Re: Lucene 1.9 release date?

2005-10-14 Thread Ray Tsang
Can we add a 1.9 release to the roadmap? or start a 1.9 release tracker issue? ray, On 10/15/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > Olivier, > > Your best bet it to discuss this on the java-dev list. At this > point, we haven't a firm volunteer for pushing

Re: Lucene database bindings

2005-09-18 Thread Ray Tsang
I must admit that I have not downloaded the source yet. But a quick question, does it deal w/ aggregate functions and group by clauses? Thanks! Ray, On 9/17/05, markharw00d <[EMAIL PROTECTED]> wrote: > >>Basically your lucene_query function will return a true/false in one

Re: Search Results Clustering

2005-08-30 Thread Ray Tsang
er grouped records in the beginning. In addition, it feels like reading the field values from the document in order to look for group-by results is most time consuming. How does RDBMS do it? ray, On 8/31/05, kapilChhabra (sent by Nabble.com) <[EMAIL PROTECTED]> wrote: > > thanks a lot

Re: Why is delete() part of IndexREADER?

2005-08-23 Thread Ray Tsang
I have come to peace with this problem. Basically, I think it's because you need to read/find what you are deleting first? hehe Writer just need to write whatever it's been told to write. ray, On 8/23/05, Mikko Noromaa <[EMAIL PROTECTED]> wrote: > Hi, > > Why IndexRea

Re: UpdateIndex

2005-08-22 Thread Ray Tsang
implementations of how index is structured, e.g. RotatingIndex, AlternatingIndex, that rotates document updates to different indices. Ray, On 8/23/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Yes, this is not how you should do it. > Use reader.delete(Term) method to delete docum

Re: hit count within categories

2005-07-31 Thread Ray Tsang
I also had similar problem. It was essentially a 'group by'-like requirement.I used both get(fieldName) and getTermFreqVector(...), it seemed that get(fieldName) on a page of results (say, 10 results per page) was faster than getTermFreqVector() for me. ray, On 7/29/05, mark harwo

Re: Lucene and numerical fields search

2005-07-19 Thread Ray Tsang
owever, It should be fine if it was rewritten from other queries. Perhaps it should get cleaned up and be made a protected class for now. ray, On 7/17/05, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Hi Ray, > > If you can share your BitSetQuery and QP modifications, feel free to >

Re: Lucene and numerical fields search

2005-07-12 Thread Ray Tsang
ad of the ones provided by the distribution. Ray On 7/12/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > > > > Hi Mickaƫl, > > Take a look at the org.apache.lucene.search.DateFilter class that comes > with Lucene. This does date range filtering (I am us