ahh, yes, sorry, the ability to read is occasionally handy... [wipes egg
off forehead]
cheers,
jed.
Michael McCandless wrote:
Actually, yes in 2.3.2: IndexReader.unlock has existed for a long time.
In 2.4.0, we moved this to IndexWriter.unlock.
Mike
Jed Wesley-Smith wrote:
not in 2.3.2 t
Thanks Mike!
Michael McCandless wrote:
OK I'll add that (what IW does on setting an OOME) to the javadocs.
Mike
Jed Wesley-Smith wrote:
Mike,
regarding this paragraph:
"To workaround this, on catching an OOME on any of IndexWriter's
methods, you should 1) forcibly remove the write lock
(I
I'm not sure what *could* be easier than looping with IndexSearcher.doc(),
looping from 1 to maxDoc. Of course you'll have to pay some attention to
whether you get a document back or not, and I'm not quite sure whether you'd
have to worry about getting deleted documents. But I don't think either of
I want to give more weight to some terms in the document. Like title of the
book should be given more weight than the contents. And we are testing over
a wide varieties of lucene queries, with quotes, w/o quotes, phrase, span
etc.
As our system will be expecting more number of queries that contain
I'll double check but I believe all the fields in my index are stored. Should
I just loop using indexSearcher.doc() or is there a faster way? Thanks.
> Date: Thu, 30 Oct 2008 16:09:47 -0400
> From: [EMAIL PROTECTED]
> To: java-user@lucene.apache.org
> Subject: Re: Read all the data from an index
Well, that's trickier than you might think. You can easily get
all the STORED data just by getting doc IDs 1-MaxDoc(). But
reconstructing the data from data that is NOT stored is more
difficult. Luke tries, but it may be a lossy process.
Best
Erick
On Thu, Oct 30, 2008 at 3:24 PM, Dragon Fly <[EM
Hi,
I have an old index that was built a few months ago. The data that I used to
build the index has been deleted from the database. I'd like to read all the
data from the old index to build a new index. Which Lucene API calls should I
use to read all the data from the old index? Thank you i
: We improved the performance through caching the bitsets of the single
: fuzzy query/wildcard query.
: Within our logs we can see that combined queries within a BooleanQuery
: are processed sequentially. So our question are: Does it make sense for
: you to parallelize the processing of the qu
Not directly, I don't think. Mark Miller contributed some
highlighting code that converts phrase queries to SpanNearQueries, I
believe, but this isn't general purpose.We probably need a
QueryParser that produces SpanQueries instead of regular Queries, I
suppose, but they aren't always
I tried the term divisor index prior to the posting and didn't see
much difference in the memory usage.
I don't think we can turn off field norms because we use boosting to
influence some content to the front of the results.
Will definitely spend some time with Solr, Terracotta, and possibly
hado
Thanks Grant the presentation, it was very useful.
Can payload work for queries other than Term queries and Span queries? Or is
there any function to convert Query into span query?
Thanks
On Thu, Oct 23, 2008 at 4:08 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:
> You can search the archives fo
Thanks Grant the presentation, it was very useful.
Can payload work for queries other than Term queries and Span queries? Or is
there any function to convert Query into span query?
Thanks
On Thu, Oct 23, 2008 at 4:08 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:
> You can search the archives
mark harwood wrote:
Regretfully, I'm a terrible Swing programmer
I know you've raised this before, - I wasn't prompting you to do the work :)
I did make some promising in-roads into a GWT web-based version which was Apache-license friendly but ultimately I didn't want to bring in a build-time
whichever is chosen.
Just a huge thank you for making this tool available!
Great tool!
//andy
On Thu, Oct 30, 2008 at 4:06 AM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Many people ask me when the next version of Luke becomes available. It's
> almost ready, and the release shoul
John G wrote:
I have an index with a particular document marked as deleted. If I use the
search method that returns TopDocs and that deleted document satisfies the
search criteria, will it be included in the returned TopDocs object even
though it has been marked as deleted?
Thanks in advance.
J
I have an index with a particular document marked as deleted. If I use the
search method that returns TopDocs and that deleted document satisfies the
search criteria, will it be included in the returned TopDocs object even
though it has been marked as deleted?
Thanks in advance.
John G.
--
View
For those attending ApacheCon in New Orleans next week, the Lucene
Search and Machine Learning Birds of a Feather (BOF) will be held
Wednesday night. Please indicate your interest at: http://wiki.apache.org/apachecon/BirdsOfaFeatherUs08
Also, note there are a number of Lucene/Solr/Mahout tal
>>Regretfully, I'm a terrible Swing programmer
I know you've raised this before, - I wasn't prompting you to do the work :)
I did make some promising in-roads into a GWT web-based version which was
Apache-license friendly but ultimately I didn't want to bring in a build-time
dependency on the 1
mark harwood wrote:
I'd like to ask the Lucene user community what version of Lucene would be
preferable
A Swing-based one, managed in Lucene/contrib and released with every Lucene
build .
;)
I agree, this would be ideal. Regretfully, I'm a terrible Swing
programmer, so unless someone el
Andrzej Bialecki wrote:
1) Luke 2.4 release. This has the advantage of being an official stable
[...]
2) Luke 2.9-dev snapshot. This has the advantage that you get the
[...]
Of course I meant Lucene 2.4 and Lucene 2.9-dev ... sorry for the confusion.
--
Best regards,
Andrzej Bialecki
>>I'd like to ask the Lucene user community what version of Lucene would be
>>preferable
A Swing-based one, managed in Lucene/contrib and released with every Lucene
build .
;)
- Original Message
From: Andrzej Bialecki <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thur
One issue with the existing field cache implementation is that it uses int
arrays to reference into the list of unique terms where short or even byte
arrays may suffice for fields with smaller numbers of unique terms.
How many unique terms do you have?
I posted some code that measures the potent
Hi all,
Many people ask me when the next version of Luke becomes available. It's
almost ready, and the release should happen in about a week, depending
on the situation in my daily job.
I'd like to ask the Lucene user community what version of Lucene would
be preferable to include in this Lu
Michaels got some great points (he the lucene master), especially
possibly turning off norms if you can, but for an index like that i'd
reccomwnd solr. Solr sharding can be scaled to billions (min a billion
or two anyway) with few limitations (of course there are a few). Plus
it has further
Actually, yes in 2.3.2: IndexReader.unlock has existed for a long time.
In 2.4.0, we moved this to IndexWriter.unlock.
Mike
Jed Wesley-Smith wrote:
not in 2.3.2 though.
cheers,
jed.
Michael McCandless wrote:
Or you can use IndexReader.unlock.
Mike
Jed Wesley-Smith wrote:
Michael McCa
The terms index (*.tii), which is loaded entirely into RAM, can
consume an unexpectedly large amount of memory when there are an
unusually high number of terms. If you are not using compound file
format, can you look at the size of *.tii?
If this is what is affecting you, one simple wor
OK I'll add that (what IW does on setting an OOME) to the javadocs.
Mike
Jed Wesley-Smith wrote:
Mike,
regarding this paragraph:
"To workaround this, on catching an OOME on any of IndexWriter's
methods, you should 1) forcibly remove the write lock
(IndexWriter.unlock static method) and then
Hi Aditi,
In that case I could suggest you to just index the domain name seperately as
well i.e. index the following fields : email adddess, domain name; instead
of just email address.
When I said reverse the tokens, you could reverse the tokens while
indexing(just flipping the text string while in
28 matches
Mail list logo