indexing norms???

2007-02-27 Thread zzzzz shalev
can someone explain to me the norm issue that is stored in each field at index time for scoring, how in impacts the index size, for lucene 1.4.3 is it active by default. and the penalty of disabling it much much thanks in advance - Any questions? Get answ

comparing Ramdirectory to MMap

2006-12-29 Thread zzzzz shalev
hey all, i am currently running lucene 1.43, i am loading all relevant searchable data into a ramdir and storing display data in oracle. i have read good things about mmap and was wondering about the following: 1. does it require a 'warm up' to achieve sub-second results? if so,

part of speech tagger

2006-10-20 Thread zzzzz shalev
hello all, i would like to retrieve during query time, the part of speech of each word in a query, does anyone know of an implementation of a java part of speech api? thanks in advance, - Stay in the know. Pulse on the new Y

termquery beginners question

2006-06-25 Thread zzzzz shalev
i apologize in advance for the question. i am running lucene 1.4.3 (prefer not to use the keywordanalyzer) i need to mix a user entered query along with a search on keyword indexed fields how would i search for an exact phrase using a term query on a keyword field while searchin

reversing porter stemming

2006-06-16 Thread zzzzz shalev
is it possible to take a stemmed token from as index and run some sort of reverse porter stemming to get a logical word, the problem is that porter stemming is very aggressive, for example: people is indexed as peopl , so basically my quesion is if i have peoples , people, both indexed as

Re: Aggregating category hits

2006-06-10 Thread zzzzz shalev
docs confusing however , i will look into your impl, it sounds solid, i am curretly on lucene 1.4.3 (which classes should i look into in solr?) comments welcomed thanks in advance! Yonik Seeley <[EMAIL PROTECTED]> wrote: On 6/10/06, z shalev wrote: >

Re: Aggregating category hits

2006-06-10 Thread zzzzz shalev
t for lower hit counts. On my 64-bit platform, a MAX_SIZE value of 10K-20K seems to provide optimal performance. I'm looking forward to trying this with OpenBitSet. Peter On 5/29/06, z shalev wrote: > > i know im a little late replying to this thread, but, in my humble

combining two query calls in one?

2006-06-09 Thread zzzzz shalev
hey, i am using the pmsearcher to retrieve data from a number of ram indexes. i am calling my own search function which calls the indexsearcher.search meathod and returns the top 100 ids/scores , however, before returning the topdocs i start a separate thread which requeries the index and

Re: Aggregating category hits

2006-05-29 Thread zzzzz shalev
i know im a little late replying to this thread, but, in my humble opinion the best way to aggregate values (not necessarily terms, but whole values in fields) is as follows: startup stage: for each field you would like to aggregate create a hashmap open an index reader and run

fastest way to get raw hit count

2006-05-29 Thread zzzzz shalev
hi all, is there a faster way to retrieve ONLY the count of results for a query? lucene ranks (scores) the first batch of docs and sorts them by rank, this is functionality i dont need in certain queries and i assume, not doing this can return the count faster then the hits.length()

lowering score of doc if synonyms matched (synonyms indexed)

2006-05-10 Thread zzzzz shalev
i am currently adding synonyms at index time (and not expanding the query), i fear that there is a problem with this implementation: is there a way to lower the score of a document if it was found due to a synonyms match and not due to a match of the word queried. from what i understand th

RAM Directory / querying Performance issue

2006-04-26 Thread zzzzz shalev
I've rewritten the RAM DIR to supprt 64 bit (still havent had time to add this to lucene, hopefully in the coming months when i have a free second) My question: i have a machine with 4 GB RAM i have a 3GB index file, i successfully load the 3GB index into memory, the

Re: Can Lucene load more then 2GB into RAM memory?

2006-03-16 Thread zzzzz shalev
indeed currently limited to 2GB. This would not be too hard to fix. Please file a bug report. Better yet, attach a patch. I assume you're running a 64bit JVM. If so, then MMapDirectory might also work well for you. Doug z shalev wrote: > this is in continuation of a previous email i sent

Can Lucene load more then 2GB into RAM memory?

2006-03-10 Thread zzzzz shalev
this is in continuation of a previous email i sent i have a 6gb index containing over 12 million terms. looking at the Lucene code RAMDirectory.java i see an int cast of the index file size, meaning there is a 2GB limit did i miss something? has anyone loaded more then

RE: 1.4.3 and 64bit support? out of memory??

2006-03-09 Thread zzzzz shalev
Original Message----- From: z shalev [mailto:[EMAIL PROTECTED] Sent: Thursday, March 09, 2006 12:02 AM To: java-user@lucene.apache.org Subject: Re: 1.4.3 and 64bit support? out of memory?? hey chris, i will check and let you know just to make sure, basically i see the OS allocating memory

Re: 1.4.3 and 64bit support? out of memory??

2006-03-08 Thread zzzzz shalev
hey chris, i will check and let you know just to make sure, basically i see the OS allocating memory (up to about 4GB) while loading the indexes to memory and then crashing on the TermInfosReader class. what i noticed was that the crash occured when lucene tried to create a Term arra

Re: 1.4.3 and 64bit support? out of memory??

2006-03-08 Thread zzzzz shalev
yes, 100% Dan Armbrust <[EMAIL PROTECTED]> wrote: z shalev wrote: > hi all, > > i've been trying to load a 6GB index on linux (16GB RAM) but am having no > success. > > i wrote a program that allocates memory and it was able to allocate as much > RAM

1.4.3 and 64bit support? out of memory??

2006-03-08 Thread zzzzz shalev
hi all, i've been trying to load a 6GB index on linux (16GB RAM) but am having no success. i wrote a program that allocates memory and it was able to allocate as much RAM as i requested (stopped at 12GB) however i am recieving the following stack trace: JVMDUMP013I

carrot2 vs. vivisimo

2006-03-05 Thread zzzzz shalev
hey all, my team has been working for the last couple of days on integrating carrot2 into our project as a sort of src (search result clustering) solution. i was rather impressed with the results, until i checked out vivisimo's demo and saw a bit of a difference quality wise, ha

lucene & ejbs

2006-02-09 Thread zzzzz shalev
i am currently implementing lucene using multiple rmi servers as index searchers, has anyone done this using ejbs? (any tips?) if so, are there any performance hits? thanks in advance, - Relax. Yahoo! Mail virus scanning helps det

RE: grouping results by fields

2006-01-30 Thread zzzzz shalev
hey chris, i was using the hits.doc method while iterating,,, you've given me some hope!! i will look into the FieldCache Chris Hostetter <[EMAIL PROTECTED]> wrote: : currently , i am iterating through about 200-300 of the top docs and : creating the groups (so, as of now, the groups

RE: grouping results by fields

2006-01-30 Thread zzzzz shalev
thanks for the advice guys! currently , i am iterating through about 200-300 of the top docs and creating the groups (so, as of now, the groups are partial) , my response time HAS to be at most 500-600 milli (query + groupings) or my company will probably go with a commercial search engine

Re: grouping results by fields

2006-01-30 Thread zzzzz shalev
ly: to get a count per word bit-wise AND each WBS(n) with FRSBS and count up the 1s Jim Powers On Sunday 29 January 2006 07:55, z shalev wrote: > hey, > > i have a bit of a complex problem, > i need to group results recieved in a result set, > for example: > > my resu

grouping results by fields

2006-01-29 Thread zzzzz shalev
hey, i have a bit of a complex problem, i need to group results recieved in a result set, for example: my result set returns 10,000 results there are about 10 fields in each result document i need to group the most frequent values appearing in each field. if 1 of m

data size limitation?

2006-01-19 Thread zzzzz shalev
hey, is there a max amount of data (in gigabytes) where lucene's performance starts to deteriorate i tested with about 2 giga on two instances (2 ram dirs using the parallelmultisearcher) and performance was great, however i think i will need to support about 10-15 times as much

online incremental indexing

2006-01-07 Thread zzzzz shalev
hello all, i have an environment with a number of search instances (index searchers) running as rmi servers and a federator (a parallel multi searcher) combining the results of all the instances, this is working great, and allows us to load into memory millions of docs. my problem is

how to post questions?

2006-01-07 Thread zzzzz shalev
- Yahoo! DSL Something to write home about. Just $16.99/mo. or less