i've indexed 80m records and now up to 200m.. it can be done, and could've
been done better. like the other said, architecture is important. have you
considered looking into solr? i haven't kept up with it (and many of the
mailing lists...), but looks very interesting.
ray,
On 8/12/06, Jason
I think it ultimately depends on what you would like to do with the
stored data? Would you need more of full text searches on the log or
more of statistical anlaysis?
ray,
On 6/18/06, Andreas Moroder <[EMAIL PROTECTED]> wrote:
Hello,
I would like to write a application to browse around and se
Hi Erik,
Where did you get that chinese sentence from? That's funny! haha.
ray,
On 6/15/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
Rob,
Your example is, hopefully, not exact since you used "C1..." which I
presume was not what you originally tested with.
CJKAnalyzer is working fine for me i
Hi Bob,
In short, I use a slightly modified ChineseAnalyzer to index chinese text.
They differ mainly in the way they tokenize the text.
StandardAnalyzer is inteded to use w/ Latin-based languages, that each
word composes of multiple characters, and each word is separated by
special markers such
nd where can I find it?
>
> Zsolt
>
> >-Original Message-
> >From: Ray Tsang [mailto:[EMAIL PROTECTED]
> >Sent: Sunday, January 29, 2006 2:14 AM
> >To: java-user@lucene.apache.org
> >Subject: Re: Chinese support
> >
> >Hi Zsolt,
> >
Hi Zsolt,
you can try to use a Chinese analyzer.
ray,
On 1/28/06, Zsolt <[EMAIL PROTECTED]> wrote:
> Hi,
>
> We use lucene without any problems even for German text bit with Chinese
> text nothing is found. What is the best way to index and search Chinese
> text?
>
> Zsolt
>
>
>
t;
> I've a bit of experience with search engines, but I'm obviously still
> learning thanks to this group.
>
> Peter
>
> On 1/26/06, Ray Tsang <[EMAIL PROTECTED]> wrote:
> >
> > Peter,
> >
> > Wow, the speed up in impressive! But may I ask wh
Peter,
Wow, the speed up in impressive! But may I ask what did you do to
achieve 135 queries/sec prior to the JVM swich?
ray,
On 1/27/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> Correction: make that 285 qps :)
>
> On 1/26/06, Peter Keegan <[EMAIL PROTECTED]> wrote:
> >
> > I tried the AMD64-b
Speaking of NioFSDirectory, I thought there was one posted a while
ago, is this something that can be used?
http://issues.apache.org/jira/browse/LUCENE-414
ray,
On 11/22/05, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Jay Booth wrote:
> > I had a similar problem with threading, the problem turned o
Can we add a 1.9 release to the roadmap? or start a 1.9 release tracker issue?
ray,
On 10/15/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> Olivier,
>
> Your best bet it to discuss this on the java-dev list. At this
> point, we haven't a firm volunteer for pushing out a release. I'd be
> thrille
I must admit that I have not downloaded the source yet. But a quick
question, does it deal w/ aggregate functions and group by clauses?
Thanks!
Ray,
On 9/17/05, markharw00d <[EMAIL PROTECTED]> wrote:
> >>Basically your lucene_query function will return a true/false in one
> of the query predica
I had similar requirements of "count" and "group by" on over 130mil
records, it's really a pain. It's currently usable but not
satisfactory.
Currently it's grouping at run-time by iterating through ungrouped
items. It collects matching documents into BitSet, so subsequent
queries can use BitSet
I have come to peace with this problem. Basically, I think it's
because you need to read/find what you are deleting first? hehe
Writer just need to write whatever it's been told to write.
ray,
On 8/23/05, Mikko Noromaa <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Why IndexReader allows me to do write-
This could be off topic, but I made something that updates indices
that worked like the following, wonder if anybody has the same ideas?
I found something like IndexAccessControl in the mailing list before.
An implementation of the following uses IAC.
ManagedIndex index = ManagedIndex.getInstanc
I also had similar problem. It was essentially a 'group by'-like
requirement.I used both get(fieldName) and getTermFreqVector(...),
it seemed that get(fieldName) on a page of results (say, 10 results
per page) was faster than getTermFreqVector() for me.
ray,
On 7/29/05, mark harwood <[EMAIL
; suggest them for inclusion to the core or contrib (by attaching them to
> a new entry in Bugzilla).
>
> Otis
>
>
> --- Ray Tsang <[EMAIL PROTECTED]> wrote:
>
> > It seems TooManyClauses is a potential problem for any query that
> > expands to a series o
It seems TooManyClauses is a potential problem for any query that
expands to a series of OR'ed boolean queries (PrefixQuery,
WildcardQuery, RangeQuery...). If the max was set too high, the
inefficiency would make the search unsable.
I kind of worked around this by creating a BitSetQuery, and exte
17 matches
Mail list logo