Hi all,
I'm trying to use the new class NativeFSLockFactory, but as you can guess I
have a problem using it.
Don't know what I'm doing wrong, so here is the code:
FSDirectory dir = FSDirectory.getDirectory(indexDir, create,
NativeFSLockFactory.getLockFactory());
logger.info("Index: "+indexDir.
Resending, with the hope that the search gurus missed this.
Would really appreciate any advise on this.
Would not want to reinvent the wheel & I am sure this is something
that would have been done.
Thanks,
mek
On 10/16/06, Mek <[EMAIL PROTECTED]> wrote:
Has anyone dealt with the problem of con
Not sure if this is the case, but you said "searchers", so might be it -
you can (and should) reuse searchers for multiple/concurrent queries.
IndexSearcher is thread-safe, so no need to have a different searcher for
each query. Keep using this searcher until you decide to open a new
searcher - act
Some excellent feedback guys - thanks heaps.
On my OOM issue, I think Hoss has nailed it here:
> That said: if you are seeing OOM errors when you sort by a field (but
> not when you use the docId ordering, or sort by score) then it sounds
> like you are keeping refrences to IndexReaders arround
I tried the notion of a temporary RAMDirectory already, and the documents
parse unacceptably slowly , 8-10 seconds. Great minds think alike. Believe
it or not, I have to deal with a 7,500 page book that details Civil War
records of Michigan volunteers. The XML form is 24M, probably 16M of text
exc
I had a similar question a while ago and the answer is "you can't cheat".
According to what the guys said, this
doc.add("field", )
doc.add("field", )
doc.add("field", )
is just the same as this
doc.add("field", )
But go ahead and increase the maxfieldlength. I'm successfully indexing
(unstored
On 10/18/06, Isabel Drost <[EMAIL PROTECTED]> wrote:
Find Me wrote:
> How to eliminate near duplicates from the index? Someone suggested that
I
> could look at the TermVectors and do a comparision to remove the
> duplicates.
As an alternative you could also have a look at the paper "Detecting
P
Hello-
I was wondering about the usage of IndexWriter.setMaxFieldLength()
it is limited, by default, to 10k terms per field. Can anyone tell me if
this is this a "per field" limit or a "per uniquely named field" limit?
I.e. in the following snippet I add many words to different Fields all w/
the
Erick Erickson wrote:
Arbitrary restrictions by IT on the space the indexes can take up.
Actually, I won't categorically I *can't* make this happen, but in order to
use this option, I need to be able to present a convincing case. And I
can't
do that until I've exhausted my options/creativity.
Arbitrary restrictions by IT on the space the indexes can take up.
Actually, I won't categorically I *can't* make this happen, but in order to
use this option, I need to be able to present a convincing case. And I can't
do that until I've exhausted my options/creativity.
And this it way keeps fo
Erick Erickson wrote:
Here's my problem:
We're indexing books. I need to
a> return books ordered by relevancy
b> for any single book, return the number of hits in each chapter
(which, of
course, may be many pages).
1>If I index each page as a document, creating the relevance on a book
basis
Here's my problem:
We're indexing books. I need to
a> return books ordered by relevancy
b> for any single book, return the number of hits in each chapter (which, of
course, may be many pages).
1>If I index each page as a document, creating the relevance on a book basis
is interesting, but collec
DITTO !!!
I like date truncation, but when I store a truncated date, I don't want to
retrieve the time in Greenwich, England at midnight of the date I'm
truncating in the local machine's time zone. Nothing against the Brits, it
just doesn't do me any good to know what time it was over there on th
No, but using a constant timezone is a good thing anyway since the index
will not keep track of the info. And will not really care as long as you
always use DateTools (index and search).
You can always rewrite DateTools with your own timezone, but EDT is bad
since it is vulnerable to Day light s
Dang it :)
Anyway to set timezone?
Emmanuel Bernard wrote:
DateTools use GMT as a timezone
Tue Aug 01 21:15:45 EDT 2006
Wed Aug 02 02:15:45 EDT 2006
Michael J. Prichard wrote:
When I run this java code:
Long dates = new Long("1154481345000");
Date dada = new Date(dates.longV
Michael J. Prichard wrote:
I get this output:
Tue Aug 01 21:15:45 EDT 2006
That's August 2, 2006 at 01:15:45 GMT.
20060802
Huh?! Should it be:
20060801
DateTools uses GMT.
Doug
-
To unsubscribe, e-mail: [EMAIL
DateTools use GMT as a timezone
Tue Aug 01 21:15:45 EDT 2006
Wed Aug 02 02:15:45 EDT 2006
Michael J. Prichard wrote:
When I run this java code:
Long dates = new Long("1154481345000");
Date dada = new Date(dates.longValue());
System.out.println(dada.toString());
System.out
When I run this java code:
Long dates = new Long("1154481345000");
Date dada = new Date(dates.longValue());
System.out.println(dada.toString());
System.out.println(DateTools.dateToString(dada,
DateTools.Resolution.DAY));
I get this output:
Tue Aug 01 21:15:45 EDT 2006
200608
Hello All,
Lucene looks very interesting to me. I was wondering if any of you could
comment on a few questions:
1) Assuming I use a typical server such as a dual-core dual-processor Dell
2950, about how many files can Lucene index and still have a sub-two-second
search speed for a simple search
This makes it relatively safe for people to grab a snapshot of the trunk
with less >concern about latent bugs.
I think the concern is that if we start doing this stuff on trunk now,
people that are >accustomed to snapping from the trunk might be surprised,
and not in a good way.
+1 on this. T
You can get Lucene 1.9.1 and make Luke use this version. (you need
luke.jar not luke-all.jar)
version 1.9.1 contains API which is removed in 2.0 version of Lucene (as
deprecated) and should still be able to read indexes created by Lucene
2.0 (correct me if I'm wrong)
and then run Luke with com
17 okt 2006 kl. 18.55 skrev Andrzej Bialecki:
You need to create a fuzzy signature of the document, based on term
histogram or shingles - take a look a the Signature framework in
Nutch.
There is a substantial literature on this subject - go to Citeseer
and run a search for "near duplicate
Thank you very much. I have indeed turned off the norms.
Is there any new version of Luke that I can use?
Thanks,
-Vasu
Volodymyr Bychkoviak <[EMAIL PROTECTED]> wrote:
seems that you created your index with norms turned off and trying to
open with luke which can contain older ver
seems that you created your index with norms turned off and trying to
open with luke which can contain older version of lucene.
vasu shah wrote:
Hi,
I am getting this error when accessing my index with Luke.
No sub-file with id _1.f0 found
Does any one have idea about this??
Hi,
I am getting this error when accessing my index with Luke.
No sub-file with id _1.f0 found
Does any one have idea about this??
Any help would be appreciated.
Thanks,
-Vasu
-
Stay in the know. Pulse on the new Yaho
On Wed, 2006-10-18 at 19:05 +1300, Paul Waite wrote:
No they don't want that. They just want a small number. What happens is
they enter some silly query, like searching for all stories with a single
common non-stop-word in them, and with the usual sort criterion of by date
(ie. a field) descendi
Hi,
On Wed, 2006-10-18 at 19:05 +1300, Paul Waite wrote:
> No they don't want that. They just want a small number. What happens is
> they enter some silly query, like searching for all stories with a single
> common non-stop-word in them, and with the usual sort criterion of by date
> (ie. a field
: I *think* that if you reduce your result set by, say, a filter, you might
: drastically reduce what gets sorted. I'm thinking of something like this
: BooleanQuery bq = new BooleanQuery();
: bq.add(Filter for the last N days wrapped in a ConstantScoreQuery, MUST)
: bq.add(all the rest of your st
Your problem is out of my experience, so all I can suggest is that you
search the list archive. I know the idea of faceted searching has been
discussed by people with waaay more experience in that realm than I have
and, as I remember, there were some links provided
I just searched for 'facete
No, you've got that right. But there's something I think you might be able
to try. Fair warning, I'm remembering things I've read on this list and my
memory isn't what it used to be
I *think* that if you reduce your result set by, say, a filter, you might
drastically reduce what gets sorted.
> > So my questions are: is there a way to prevent the IndexWriter from
> > merging, forcing it to create a new segment for each indexing batch?
>
> Already done in the Lucene trunk:
> http://issues.apache.org/jira/browse/LUCENE-672
>
> Background:
> http://www.gossamer-threads.com/lists/lucene/j
> Why go through all this effort when it's easy to make your
> own unique ID?
> Add a new field to each document "myuniqueid" and fill it in
> yourself. It'll
> never change then.
I am sorry I did not mention in my post that I am aware of this solution
but that it cannot be used for my purposes.
32 matches
Mail list logo