: How large is the index?
I'm not sure if i'm permitted to give out that info, but I do happen to
recall seeing this page before...
http://64.233.179.104/search?q=cache:qkHzwrcO1AAJ:www.cnetchannel.com/products/datasource.aspx+%22SKUs+in+production%22&hl=en
...so, yeah... you can draw whatever
I had similar requirements of "count" and "group by" on over 130mil
records, it's really a pain. It's currently usable but not
satisfactory.
Currently it's grouping at run-time by iterating through ungrouped
items. It collects matching documents into BitSet, so subsequent
queries can use BitSet
thanks a lot for your suggestion.
I'll try it and get back if need be.
Meanwhile, I gave it a thought and concluded that the best time to do the
categorization/clustering should be lucene calculates Hits/in the Scrorer.
I am not sure if I am right.
In addition to the current functionality can w
Very nice implementation and a great write up.
How large is the index?
And when you keep posting new content to the index, will you optimize the index?
--
Chris Lu
Lucene Search RAD on Any Database
http://www.dbsight.net
On 8/30/05, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> I
I'm pleased to announce that for about a month now, CNET's "Product
Listing" pages are powered by Lucene 1.4.3. These pages not only allow
users to browse CNET's catalog of tech products by category, but also to
"Filter" the lists according to category specific Attribute Filters which
are display
Chris,
Thanks for your comments -- it's great to hear that people have had success
with very large indexes.
I'll be running on a 4-CPU (3.8GHz, 2GB RAM) Windows 2000 box, so hopefully
I'll get some advantages with the ParallelMultiSearcher... If anyone has some
metrics to post on using t
I explored the idea of Role-Based Access Control using Lucene at
http://affy.blogspot.com/2003/04/using-lucene-for-role-based-access.html.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECT
Zach,
It probably won't help performance to split the index and then search
it on the same machine unless you search the indexes in parallel (with
a multiprocessor or multi-core machine). Even in this case, the disk
is often a bottleneck, essentially preventing the search from really
running in pa
Does anyone have experience using lots of indexes simultaneously with
the multisearcher? I'm looking to index 15 distinct objects for
searching, and was thinking of creating 15 distinct indexes for better
manageability & performance (for certain searches when I know which
index to search).
Certai
: You can just assign the field B some weight when creating the index?
that implies that the field "A" being sorted on is SCORE ... which isn't
allways the case.
: Is it possible to write a custom sort for a query such that the first
: N documents that match a certain additional criteria get pus
Peter,
Check out Compass:
http://compass.sourceforge.net/
It is a layer that can integrate Hibernate and Lucene for you...
Thanks,
Zach
-Original Message-
From: Peter Gelderbloem [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 10:52 AM
To: java-user@lucene.apache.org
Subject:
As discussed in the past...
> The problem is that a jar file entry becomes an InputStream, but
> InputStream is not random access, and Lucene requires random access. So
> you need to extract the index either to disk or RAM in order to get
http://mail-archives.apache.org/mod_mbox/lucene-java-use
: The obvious answer here might be to use a filter for the first
: (required) clause and then query again using that filter for the other
: terms. The problem I forsee with that solution is that I can't easily
: re-use the filters because of the sheer number of combinations of terms
: and the nee
: Suppose I cluster the results only on the 1st field i.e. I do not show
: the constituent clusters. Even in this case, i'll require around 900
: Filters[i have 900 unique terms] in memory and will have to run the same
: query 900 times, 1 on each Filter. I am sitting at a situation where I
: get
I didn't notice any exceptions and unfortunately I built these 2 long enough
ago that I have no logs left.
Anyway, I built 2 indexes using a process that I've built hundreds of indexes
successfully with, and these two indexes seem to contain no documents despite
being pretty large (about a gig)
The "did you mean" implementation should ideally use all of the other
words in a query as context to guide the selection of spelling
alternatives. Google appear to do this - not sure if they use the doc
content or user queries to suggest the alternatives.
I've got some colocation finding code wh
I wonder if it would further help for the spell checked to make use of
something like WordNet (for English only), where low-frequency words
are "double-checked" against WordNet before considered correct.
Otis
--- Tom White <[EMAIL PROTECTED]> wrote:
> On 8/29/05, Chris Lu <[EMAIL PROTECTED]> wro
Hi,
First off, I would just like to thank everyone who has contributed to
the gem we all know was Lucene.
I am thinking of using Lucene purely for text indexing and using a
persistence mechanism like hibernate to search structured data. Would it
be a good idea to use filters that do hibernate quer
Seema - please stop cross-posting your mails to those three e-mail
lists. java-user is the most appropriate list for your posts.
Erik
On Aug 30, 2005, at 8:07 AM, seema pai wrote:
How to use Lucene with File system Indexing on WebSphere
application server
deployed in a cluster ?
On
Another solution would be for you to create a custom TokenFilter that
split tokens at "_" characters and then a custom Analyzer that used
that filter after the StandardTokenizer.
Erik
On Aug 30, 2005, at 6:52 AM, Is, Studcio wrote:
Hello,
first of all thanks to everyone for replies a
Hello group,
thank you for all your discussion, suggestios and help. I thought I will run
some investgations on that sourcecode with Lucene 1.2 and document them.
With the help of chen I might be able to create a version that can do the
job. Perhaps we can then create some small footprint solution
When using sort there is no meaning for weight.
Aviran
http://www.aviransplace.com
-Original Message-
From: Chris Lu [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 30, 2005 12:35 AM
To: java-user@lucene.apache.org; raymondcreel
Subject: Re: custom sort
You can just assign the field B
How to use Lucene with File system Indexing on WebSphere application server
deployed in a cluster ?
On 8/30/05, seema pai <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> My site has large database of Television and Movie titles, in English,
> Spanish language. The movie data starts from year 1928 ti
I was going to send out the answer to this problem this morning. I found
it around 2am last night. I was mistaken that I had corrupt the
indexes. The real problem was that I had forgotten the constructor I
was using for the index reader an had it like
writer = new IndexWriter(indexlocation, new
Hello,
first of all thanks to everyone for replies and suggestions. I solved my
problem by adapting the StandardTokenizer.jj and compiling it using
javacc.
I replaced line 90:
|)+ >
with
||"_")+ >
so that underscore is treated like alphanumeric characters. In my first
tests, it seems to work
What kind of corruption do you get? Do the files get corrupted
(unusable/unreadable), or do you get multiple items in the index?
-Oorspronkelijk bericht-
Van: Eric Bressler [mailto:[EMAIL PROTECTED]
Verzonden: maandag 29 augustus 2005 23:18
Aan: java-user@lucene.apache.org
Onderwerp: Cor
On Fri, 2005-08-26 at 16:31 -0400, Thomas Lepkowski wrote:
> I have a set of index files that I'd like to distribute with my Java
> application. The only way this seems practical is to place the index files
> in a jar file. I tries this, but the search choked when I told IndexSearcher
> the inde
On 8/29/05, Chris Lu <[EMAIL PROTECTED]> wrote:
>
>
> Two approaches I can think of:
> * Use a word list(it may not be the word list you want, but it is just
> a compromise).
> * Analyze your original index, listing out all words inside.
>
>
Using a word list suffers from two problems:
1. (Cove
Hi Yonik,
thank you very much!!
Now it works very well!!
The formula "numDocs() == maxDocs() - numer_of_deleted_docs" should be stand
in the API! :)
Thank you again!
Bye Derya
> --- Ursprüngliche Nachricht ---
> Von: Yonik Seeley <[EMAIL PROTECTED]>
> An: java-user@lucene.apache.org
> Betreff:
29 matches
Mail list logo