Here it is:
http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/analysis/shingle/ShingleMatrixFilter.html
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Neha Gupta
> To: java-user@lucene.apache.org
> Sent: Thursday, June 18, 2009 1
Hello,
You may want to look at Lucene's younger brother named Solr:
http://lucene.apache.org/solr/
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: mitu2009
> To: java-user@lucene.apache.org
> Sent: Friday, June 19, 2009 12:10:42 AM
> Su
Thanks for the comments. Sounds like I will probably be ok.
-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
Sent: Friday, June 19, 2009 1:50 PM
To: java-user@lucene.apache.org; java-...@lucene.apache.org
Subject: Re: caching an indexreader
On the topic of R
On the topic of RAM consumption, it seems like field caches
could return estimated RAM usage (given they're arrays of
standard Java types)? There's methods of calculating per
platform (I believe relatively accurately).
On Fri, Jun 19, 2009 at 12:11 PM, Michael McCandless <
luc...@mikemccandless.co
On Fri, Jun 19, 2009 at 2:40 PM, Scott Smith wrote:
> In my environment, one of the concerns is that new documents are
> constantly being added (and some documents may be deleted). This means
> that when a user does a search and pages through results, it is possible
> that there are new items comi
> As I understand it, the user won't see any changes to the
index until a new Searcher is created.
Correct.
> How much memory will caching the searcher cost? Are there
other tradeoff's I need to consider?
If you're updating the index frequently (every N seconds) and
the searcher/reader is closed
As I read about Filters, it seems to me that a filter is preferred for
any portion of the query string where you are setting the boost to 0
(meaning you don't want it to contribute to the relevancy score).
But, relevancy is only interesting if you are displaying the documents
in relevancy ord
In my environment, one of the concerns is that new documents are
constantly being added (and some documents may be deleted). This means
that when a user does a search and pages through results, it is possible
that there are new items coming in which affect the search-thus changing
where items are
You're welcome!
Mike
On Fri, Jun 19, 2009 at 1:49 PM, Dmitry Lizorkin wrote:
>> Iterate over all ints from 0 .. IndexReader.maxDoc() (exclusive) and
>> call IndexReader.isDeleted?
>
> Excellent, works perfect for us!
>
> Michael, thank you very much for your help!
>
> Best regards,
> Dmitry
>
>
Iterate over all ints from 0 .. IndexReader.maxDoc() (exclusive) and
call IndexReader.isDeleted?
Excellent, works perfect for us!
Michael, thank you very much for your help!
Best regards,
Dmitry
-
To unsubscribe, e-mail: ja
On Fri, Jun 19, 2009 at 12:43 PM, Dmitry Lizorkin wrote:
> In the meantime, does there exist any workaround for the current version
> 2.4.1 we're using?
Iterate over all ints from 0 .. IndexReader.maxDoc() (exclusive) and
call IndexReader.isDeleted?
Open a read-only IndexReader if possible, so i
Assuming your goal is to exclude deleted docs
Yes, precisely.
TermDocs td = IndexReader.termDocs(null);
That looks exactly what we need! We'll be looking forward to the release of
v. 2.9.
In the meantime, does there exist any workaround for the current version
2.4.1 we're using?
Thank
Assuming your goal is to exclude deleted docs, in 2.9 (not yet
released) you can do this:
TermDocs td = IndexReader.termDocs(null);
and then iterate through them.
Mike
2009/6/19 Dmitry Lizorkin :
> Hello!
>
> What is the appropriate way to obtain Lucene internal IDs for _all_ the
> tuples sto
Hello!
What is the appropriate way to obtain Lucene internal IDs for _all_ the
tuples stored in a Lucene index?
Thank you for your help
Dmitry
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional
Nice Uwe,
i'll try this.
Thanks,
Galaio
On Fri, Jun 19, 2009 at 1:33 PM, Uwe Schindler wrote:
> To get the second page,
> Take:
> int hitsPerPage = 10;
> int pageOffset = 10;
> TopDocCollector collector = new TopDocCollector(hitsPerPage + pageOffset);
>
> For page third page take int pageOffset
To get the second page,
Take:
int hitsPerPage = 10;
int pageOffset = 10;
TopDocCollector collector = new TopDocCollector(hitsPerPage + pageOffset);
For page third page take int pageOffset = 20; and so on
After that your results are in hits[], for the first page in [0] to [9], the
second page in
well,
i have somthing like that:
int hitsPerPage = 10;
IndexSearcher searcher = new IndexSearcher(this.indexPath);
TopDocCollector collector = new TopDocCollector(hitsPerPage);
Query query = new QueryParser("",
this.analizer).parse(DocumentRepositoryEntry.Fiel
The contrib/analyzers has several n-gram based tokenization and token
filter options.
On Jun 18, 2009, at 10:15 PM, Neha Gupta wrote:
Hey,
I was wondering if there is a way to read the index and generate n-
grams of
words for a document in lucene? I am quite new to it and am using
pylucen
Thanks Uwe, I will see that.
Galaio
On Fri, Jun 19, 2009 at 12:36 PM, Uwe Schindler wrote:
> Hallo,
>
> Just retrieve the TopDocs for the first n documents, where n =
> offset+count,
> where offset is the first hit on the page (0-based) and count the number
> per
> page.
> To display the resu
Hallo,
Just retrieve the TopDocs for the first n documents, where n = offset+count,
where offset is the first hit on the page (0-based) and count the number per
page.
To display the results you would then just start at offset in TopDocs and
retrieve the stored field from there to offset+count.
Uw
Hi,
is there any api form of Hits pagination?
for example, if i want to retreve the hits between
an interval.
--
Cumprimentos,
João Carlos Galaio da Silva
It's best to let IndexWriter manage the deletion of files (for exactly
this reason).
It turns out, it's perfectly fine to open an IndexWriter with
"create=true" even when IndexReaders are reading that same index.
Those open IndexReaders continue to search their point-in-time
snapshot, and then whe
Oops, didn't read the OP quite well...
2009/6/19 Anshum :
> Exactly, its cleaner but you wouldn't be able to delete on the basis of
> Lucene Document ID.
>
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> d
Or have a third master index, as Joel suggests, apply all updates to that
index, only, then at the end of each batch index update run, use rsync or
equivalent to push the master index out to the 2 search servers and then
tell them to reopen their indexes.
--
Ian.
On Fri, Jun 19, 2009 at 9:23 AM
Hi,
thank you all for your answers. I already done the strategies you mencioned
by delete first and update or using the internal api updateDocument(term,
document).
If there was a updateDocument(internalid, theNewDocument) present
in the api, this clarified more the process.
Thanks again, for you
do they have to be kept in synch in real time?
does each server handle writes to its own index which then need to be
propagated to the other server's index?
From a simplicity point of view, to minimise the amount of self consistency
checking that needs to happen I would suggest even having a thi
Hi Kuro,
How did you generate your second , larger, test data set?
Did you simply copy the original data set multiple times? Or did you use new
pseudo-random data (words). If the first then you would expect a linear
increase in search time as the number of indexed terms has not changed, just
th
Exactly, its cleaner but you wouldn't be able to delete on the basis of
Lucene Document ID.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw
On Fri, Jun 19, 2009 at 1:26 PM, Da
There's also IndexWriter#updateDocument(Term, Document) now, to use
that one you need to be able to uniquely identify a document using a
term (probably with an application-specific id field or something).
This method does also delete and readd the document, but it is a
somewhat cleaner api.
Daan
Hi,
I know a similar subject has been discussed in this list and this is not
a "windows file system" list ;-) But may be someone have encountered the
"thing"... and perhaps solved it !
I have a web application that index many documents so I have a quite
large Lucene (2.2) index (~ 350 Mo) managed
30 matches
Mail list logo