Given that I have a field for which term vector was computed and stored, and
that field is the text of a document, I'd like to rank a subset of such
documents by similarity to a given held-out document, or query, directly
using the cosine measure. How can that be done without going through
creatin
: Subject: Stopping a runaway search, any ideas?
: References: <5b20def02611534db08854076ce825d8032db...@sc1exc2.corp.emainc.com>
: <5b20def02611534db08854076ce825d803626...@sc1exc2.corp.emainc.com>
: <24098ed350c76d46a4fdbd81b51be0e903f040b...@exchange.windows.mmu.acquireme
:
Hi,
I've used NumericField to store my "hour" field.
Example...
doc.add(new NumericField("hour").setIntValue(Integer.parseInt("12")));
Before I was using plain string Field and enumerating them with
TermEnum, which worked fine.
Now I'm using NumericField's I'm not sure how to port this enu
On Fri, Sep 11, 2009 at 1:15 PM, wrote:
> I've been testing out "paging" the document this past week. I'm
> still working on getting a successful test and think I'm close. The
> down side was a drastic slow down in indexing speed, and lots of
> open files, but that was expected.
You mean a sl
Thanks Mike!
I've been testing out "paging" the document this past week. I'm still working
on getting a successful test and think I'm close. The down side was a drastic
slow down in indexing speed, and lots of open files, but that was expected. I
tried with small mergeFactors, maxBufferedDoc
To minimize Lucene's RAM usage during indexing, you should flush after
every document, eg by setting the ramBufferSizeMB to something tiny
(or maxBufferedDocs to 1).
But, unfortunately, Lucene cannot flush partway through indexing one
document. Ie, the full document must be indexed into RAM befor
Quite possibly, but shouldn't one expect Lucene's resource to track
the size of the problem in question? Paul's two examples below use
input files of 5 and 62MB, hardly the size of input I'd expect to
handle in a memory-compromised environment.
bri
On Sep 11, 2009, at 7:43 AM, Glen New
Hi every body:
I am using lucene version 2.3.2 to index and search my documents.
The problem is that I have a remote search server implemented this way:
[code]
Searcher parallelSearcher;
try {
parallelSearcher = new
ParallelMultiSearcher(search
Wow thats exactly what I was looking for! In the mean time I'll use the
time based collector.
Thanks Uwe and Mark for your help!
Daniel Shane
mark harwood wrote:
Or https://issues.apache.org/jira/browse/LUCENE-1720 offers lightweight timeout
testing at all index access stages prior to calls t
Glen,
Absolutely. I think a RMFC Lucene would great, especially for reduced memory or
low bandwidth client/server scenarios.
I just looked at your LuSql tool and it just what I needed about 9 months ago
:-). I wrote a simple re-indexer that interfaces to an SQL Server 2005
database and Lucen
Or https://issues.apache.org/jira/browse/LUCENE-1720 offers lightweight timeout
testing at all index access stages prior to calls to Collector e.g. will catch
a runaway fuzzy query during it's expensive term expansion phase.
- Original Message
From: Uwe Schindler
To: java-user@lucene
Paul,
I saw your last post and now understand the issues you face.
I don't think there has been any effort to produce a
reduced-memory-footprint configurable (RMFC) Lucene. With the many
mobile devices, embedded and other reduced memory devices, should this
perhaps be one of the areas the Lucene
Thanks Glen!
I will take at your project. Unfortunately I will only have 512 MB to 1024 MB
to work with as Lucene is only one component in a larger software system
running on one machine. I agree with you on the C\C++ comment. That is what I
would normally use for memory intense software. I
Yes: TimeLimitedCollector in 2.4.1 (and the new non-deprecated ones in 2.9).
Just wrap your own collector (like TopDocsCollector) with this class.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> -Original Message-
> From: Daniel Sh
I don't think its possible, but is there something in lucene to cap a
search to a predefined time length or is there a way to stop a search
when its running for too long?
Daniel Shane
-
To unsubscribe, e-mail: java-user-unsub
Thanks Dan!
I upgraded my JVM from .12 to .16. I'll test with that.
I've been testing by setting many IndexWriter parameters manually to see
where the best performance is. Then net result was just delaying the
OOM.
The scenario is a test with an empty index. I have a 5 MB file with
800,000 un
In this project:
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html
I concatenate all the text of all of articles of a single journal into
a single text file.
This can create a text file that is 500MB in size.
Lucene is OK in indexing files this size (in parallel even),
Paul:
My first suggestion would be to update your JVM to the latest version (or at
least .14). There were several garbage collection related issues resolved in
version 10 - 13 (especially dealing with large heaps).
Next, your IndexWriter parameters would help figure out why you are using so
mu
This issue is still open. Any suggestions/help with this would be
greatly appreciated.
Thanks,
Paul
-Original Message-
From: java-user-return-42080-paul_murdoch=emainc@lucene.apache.org
[mailto:java-user-return-42080-paul_murdoch=emainc@lucene.apache.org
] On Behalf Of paul_mur
Phew :)
Mike
On Thu, Sep 10, 2009 at 8:14 PM, Jason Rutherglen
wrote:
> Indexing locking was off, there was a bug higher up clobbering the
> index. Sorry and thanks!
>
> On Thu, Sep 10, 2009 at 4:49 PM, Michael McCandless
> wrote:
>> That's an odd exception. It means IndexWriter thinks 468 do
20 matches
Mail list logo