can someone explain to me the norm issue that is stored in each field at index
time for scoring, how in impacts the index size, for lucene 1.4.3 is it active
by default. and the penalty of disabling it
much much thanks in advance
-
Any questions? Get answ
hey all,
i am currently running lucene 1.43, i am loading all relevant searchable data
into a ramdir and storing display data in oracle.
i have read good things about mmap and was wondering about the following:
1. does it require a 'warm up' to achieve sub-second results? if so,
hello all,
i would like to retrieve during query time, the part of speech of each word
in a query,
does anyone know of an implementation of a java part of speech api?
thanks in advance,
-
Stay in the know. Pulse on the new Y
i apologize in advance for the question.
i am running lucene 1.4.3 (prefer not to use the keywordanalyzer)
i need to mix a user entered query along with a search on keyword indexed
fields
how would i search for an exact phrase using a term query on a keyword field
while searchin
is it possible to take a stemmed token from as index and run some sort of
reverse porter stemming to get a logical word, the problem is that porter
stemming is very aggressive, for example: people is indexed as peopl , so
basically my quesion is
if i have peoples , people, both indexed as
docs
confusing
however , i will look into your impl, it sounds solid, i am curretly on
lucene 1.4.3 (which classes should i look into in solr?)
comments welcomed
thanks in advance!
Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 6/10/06, z shalev wrote:
>
t for
lower hit counts. On my 64-bit platform, a MAX_SIZE value of 10K-20K seems
to provide optimal performance. I'm looking forward to trying this with
OpenBitSet.
Peter
On 5/29/06, z shalev wrote:
>
> i know im a little late replying to this thread, but, in my humble
hey,
i am using the pmsearcher to retrieve data from a number of ram indexes. i
am calling my own search function which calls the indexsearcher.search meathod
and returns the top 100 ids/scores , however, before returning the topdocs i
start a separate thread which requeries the index and
i know im a little late replying to this thread, but, in my humble opinion the
best way to aggregate values (not necessarily terms, but whole values in
fields) is as follows:
startup stage:
for each field you would like to aggregate create a hashmap
open an index reader and run
hi all,
is there a faster way to retrieve ONLY the count of results for a query?
lucene ranks (scores) the first batch of docs and sorts them by rank, this is
functionality i dont need in certain queries and i assume, not doing this can
return the count faster then the hits.length()
i am currently adding synonyms at index time (and not expanding the query), i
fear that there is a problem with this implementation:
is there a way to lower the score of a document if it was found due to a
synonyms match and not due to a match of the word queried. from what i
understand th
I've rewritten the RAM DIR to supprt 64 bit (still havent had time to add this
to lucene, hopefully in the coming months when i have a free second)
My question:
i have a machine with 4 GB RAM
i have a 3GB index file,
i successfully load the 3GB index into memory,
the
indeed currently limited to 2GB. This would not be too
hard to fix. Please file a bug report. Better yet, attach a patch.
I assume you're running a 64bit JVM. If so, then MMapDirectory might
also work well for you.
Doug
z shalev wrote:
> this is in continuation of a previous email i sent
this is in continuation of a previous email i sent
i have a 6gb index containing over 12 million terms.
looking at the Lucene code RAMDirectory.java
i see an int cast of the index file size, meaning there is a 2GB limit
did i miss something?
has anyone loaded more then
Original Message-----
From: z shalev [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 09, 2006 12:02 AM
To: java-user@lucene.apache.org
Subject: Re: 1.4.3 and 64bit support? out of memory??
hey chris,
i will check and let you know just to make sure,
basically i see the OS allocating memory
hey chris,
i will check and let you know just to make sure,
basically i see the OS allocating memory (up to about 4GB) while loading the
indexes to memory and then crashing on the TermInfosReader class. what i
noticed was that the crash occured when lucene tried to create a Term arra
yes,
100%
Dan Armbrust <[EMAIL PROTECTED]> wrote:
z shalev wrote:
> hi all,
>
> i've been trying to load a 6GB index on linux (16GB RAM) but am having no
> success.
>
> i wrote a program that allocates memory and it was able to allocate as much
> RAM
hi all,
i've been trying to load a 6GB index on linux (16GB RAM) but am having no
success.
i wrote a program that allocates memory and it was able to allocate as much
RAM as i requested (stopped at 12GB)
however
i am recieving the following stack trace:
JVMDUMP013I
hey all,
my team has been working for the last couple of days on integrating carrot2
into our project as a sort of src (search result clustering) solution.
i was rather impressed with the results, until i checked out vivisimo's demo
and saw a bit of a difference quality wise,
ha
i am currently implementing lucene using multiple rmi servers as index
searchers,
has anyone done this using ejbs? (any tips?)
if so, are there any performance hits?
thanks in advance,
-
Relax. Yahoo! Mail virus scanning helps det
hey chris,
i was using the hits.doc method while iterating,,,
you've given me some hope!! i will look into the FieldCache
Chris Hostetter <[EMAIL PROTECTED]> wrote:
: currently , i am iterating through about 200-300 of the top docs and
: creating the groups (so, as of now, the groups
thanks for the advice guys!
currently , i am iterating through about 200-300 of the top docs and creating
the groups (so, as of now, the groups are partial) , my response time HAS to be
at most 500-600 milli (query + groupings) or my company will probably go with a
commercial search engine
ly: to get a count per word bit-wise AND each WBS(n) with FRSBS and count
up the 1s
Jim Powers
On Sunday 29 January 2006 07:55, z shalev wrote:
> hey,
>
> i have a bit of a complex problem,
> i need to group results recieved in a result set,
> for example:
>
> my resu
hey,
i have a bit of a complex problem,
i need to group results recieved in a result set,
for example:
my result set returns 10,000 results
there are about 10 fields in each result document
i need to group the most frequent values appearing in each field.
if 1 of m
hey,
is there a max amount of data (in gigabytes) where lucene's performance
starts to deteriorate
i tested with about 2 giga on two instances (2 ram dirs using the
parallelmultisearcher) and performance was great,
however i think i will need to support about 10-15 times as much
hello all,
i have an environment with a number of search instances (index searchers)
running as rmi servers and a federator (a parallel multi searcher) combining
the results of all the instances, this is working great, and allows us to load
into memory millions of docs.
my problem is
-
Yahoo! DSL Something to write home about. Just $16.99/mo. or less
27 matches
Mail list logo