You can do everything related to search(full text or just paths) with
Lucene:-)
On Wed, Dec 2, 2009 at 11:26 PM, Stefan Trcek wrote:
> On Wednesday 02 December 2009 16:20:28 Stefan Trcek wrote:
> > On Wednesday 02 December 2009 15:50:45 archibal wrote:
> > > -optionnally i want to have a central
Hello all,
Sorting consumes huge amount of memory. Did 2.9.1 /3.0 has ability to customize
field cache. In my case, 80% of documents are required to be sorted. Currently
the field cache is loading all records.
Is there any custom interface available to decide which document to be loaded
in ca
In 3.0 nothing changed about that (and also in 2.9). Only that the
FieldCache is now segment-wise which makes IndexReader.reopen be faster.
But you are still able to do your own sorting with own structures, you just
have to write your own TopDocsCollector.
-
Uwe Schindler
H.-H.-Meier-Allee 63
May be you can try Omnifind Yahoo Edition.
2009/12/3 Weiwei Wang
> You can do everything related to search(full text or just paths) with
> Lucene:-)
>
> On Wed, Dec 2, 2009 at 11:26 PM, Stefan Trcek wrote:
>
> > On Wednesday 02 December 2009 16:20:28 Stefan Trcek wrote:
> > > On Wednesday 02 De
This might be OT but did you consider Google Desktop Search?
Seems that somebody reported success with hacking it to allow network file
system index/search: http://www.geekzone.co.nz/content.asp?contentid=3939
Regards,
Lukas
http://blog.lukas-vlcek.com/
2009/12/3 杨建华
> May be you can try Omni
Thanks.
IndexDivisor means, When set to N, then one in every N*termIndexInterval terms
in the index is loaded into memory.
For example i am having 100,000 unique terms and termIndexInterval set to 5
then (10 / (5*128) ) terms (==X terms)will be loaded in to memory.
If termIndexInterval is
That's indeed how index divisor works, so, the memory difference
should be 100X lower, for just the RAM consumed by the terms index.
Other things consume RAM (norms, deleted docs, field cache), so maybe
those are messing up your measurements?
Mike
On Fri, Nov 27, 2009 at 5:30 AM, Danil ŢORIN wr
I don't have norms, I don't delete docs using IndexReader. I have switched off
sorting. I think fieldcache is used for sorting.
I am just loading all the index with different values and calculating the
memory difference.
I am having 100 million records splitted across database. I loaded all the
How do you measure memory consumption?
If you pass -1 for the divisor do you still see no difference?
Can you post the output of CheckIndex on your index?
Are you sure your index has no deletions?
Mike
On Thu, Dec 3, 2009 at 5:55 AM, Ganesh wrote:
> I don't have norms, I don't delete docs usi
I don't have deletions. No norms. No sorting.
I am setting 70 MB for IndexWriter RamBuffer but not indexing any documents.
Swtiched off indexing.
I have enabled TermVector for one field.
I am opening all 30 database with different value of indexdivisor.
Below are my stats
IndexDivisorMemory
Hi all,
Am using lucene 2.3.2.
When i search using lucene demo am getting all the results which contains the
query. But i would like to restrict my results to the relevant match and not
all the documents containing the query string.
Ex:
Query: how to search a string?
Response am getting is: a
On Thu, Dec 3, 2009 at 7:15 AM, Ganesh wrote:
> Below are my stats
> IndexDivisor Memory
> -1 7 MB
> 1 486 MB
> 100 180 MB
> 1000 176 MB.
Do you simply create the IndexWriter & IndexReader, but do no
searching/indexing?
How
This sounds like an important bug fix -- could you open a Jira issue &
attach a patch? Thanks!
Mike
2009/12/2 Eirik Bjørsnøs :
> Hi,
>
> I'm using SpellChecker (in Lucene contrib) to help users of SVNSearch
> who can't type right:
>
> http://svnsearch.org/svnsearch/repos/ASF/search?logMessage=lu
I don't really understand your goal here. Lucene already does this with
it's relevancy ranking. By definition, it calculates a score for each
document and ranks them in order of the score. This is NOT a simple
"is the word in the document or not"
You can read about the scoring algorithm here:
Hi,
I have found an issue with a Lucene index locking files from a Searcher
instance. So that when a mergeSegments happens on a indexwriter close it
actually hangs until the search instance is closed
This seems to happen only when the index is stored on a network SAN device.
When stored elsewhe
Thanks mike.
I am opening the reader and warming it up and then calculating the memory
consumed.
long usedMemory = runtime.totalMemory() - runtime.freeMemory();
Regards
Ganesh
- Original Message -
From: "Michael McCandless"
To:
Sent: Thursday, December 03, 2009 6:22 PM
Subject: Re:
Run System.gc() exactly before measuring memory usage.
On sun jvm it will FORCE gc (unless DisableExplicitGC is used).
On Thu, Dec 3, 2009 at 16:30, Ganesh wrote:
> Thanks mike.
>
> I am opening the reader and warming it up and then calculating the memory
> consumed.
> long usedMemory = runt
That's rather spooky.
It could be your SAN device hangs on delete, if another machine has
that file open? Is it possible that it does this (intentionally)?
You could make a simple standalone test case.
Different OSs have different semantics. Windows refuses to do the
delete (throws IOException
I am doing GC before calculating the memory. Even i set my indexdivisor to
1 but there seems to be no change.
Below are my stats
IndexDivisor Memory
-1 7 MB
1 486 MB
100 180 MB
1000 176 MB.
1176MB
Regards
Ganesh
- Original Message -
From: "Danil
Can you run w/ a memory profiler? I don't trust that gc is truly running.
Mike
On Thu, Dec 3, 2009 at 10:47 AM, Ganesh wrote:
> I am doing GC before calculating the memory. Even i set my indexdivisor to
> 1 but there seems to be no change.
>
> Below are my stats
> IndexDivisor Memory
> -
Maybe the command line argument "-verbose:gc output" would help to
determine if GC is running.
But you are right - a profiler would be the best way.
Benjamin
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For a
Hi,
Given a query, is there a way to learn score of some specific documents in
the index against this query? I don't want to make a global search in the
index and rank and sort all the matching documents. What I want to do is
learn the rank of a bunch of documents in the index that I can identify
Hi,
I'm using Lucene 2.9.1 patched with
http://issues.apache.org/jira/browse/LUCENE-1260
For some special reason I need to find all documents which contain at
least 1 term in a certain field.
This works by iterating the norms array only as long as the field
exists on every document.
For documents
This isn't easy to change; it's hardcoded, in oal.index.NormsWriter,
to 1.0, and also in SegmentReader, to 1.0 (when the field doesn't have
norms stored, but eg someone is requesting them anyway). 1.0 must
encode to 124. I suppose we could empower Similarity to define what
the "undefined norm val
It would be clumsier, but you could create a Filter by spinning
through all the terms on a field and setting the appropriate bit.
You could even do this at startup and store the filters around for
all the fields you care about, or cache them when first used.
The advantage I see here is that it wo
The Snowball Analyzer works well for certain constructs but not others. In
particular I'm having a problem with things like "colossal" vs "colossus" and
"hippocampus" vs "hippocampal".
Is there a way to customize the analyzer to include these rules?
Thanks,
-Chris
---
On Thu, Dec 3, 2009 at 2:15 PM, Michael McCandless
wrote:
> This sounds like an important bug fix -- could you open a Jira issue &
> attach a patch? Thanks!
Mike,
I've opened an issue with a patch that should be pretty trivial:
https://issues.apache.org/jira/browse/LUCENE-2108
Looking forward
Thanks! I'm glad to hear your upgrade to 3.0.0 was smooth.
Mike
2009/12/3 Eirik Bjørsnøs :
> On Thu, Dec 3, 2009 at 2:15 PM, Michael McCandless
> wrote:
>> This sounds like an important bug fix -- could you open a Jira issue &
>> attach a patch? Thanks!
>
> Mike,
>
> I've opened an issue with
Chris,
You could look at KStem to see if that does a better job.
Or perhaps WordNet can be used to get the lemma of those terms instead of using
stemming.
Finally what was I going to say... ah, yes, using synonyms may be another
way this can be handled.
Otis
--
Sematext -- http://sematext.c
I think you should be able to use 1+ FilteredQuery (with IDs of your docs) with
your main query and thus get the scores only for docs that interest you.
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
- Original Message
> From: Erdinc Yilmazel
> To: java-user@lucen
yes ofcourse but am a beginner in using lucene. So i couldnt find out where ,
in which part of the code is this ranking handled?
So kindly point me out the place or the code if possible
Thanks in advance,
Dhivya
--- On Thu, 3/12/09, Erick Erickson wrote:
From: Erick Erickson
Subject: Re:
Thanks mike..
Please find the attached file. I ran the testing for 1,100,1000,1 divisor
value. There is difference from 1 to 100 but there is no difference between
100 to 1.
I created a new application, in which i opened all reader and searcher and
warmed up. Sleep for a minute and
32 matches
Mail list logo