Thanks Ketin for your input. There is already build in HTML strip reader i.e.
HTMLStripReader in solr, which I am currently using to strip all HTML tags
before creating index. This also solved my earlier problem related to
highlighter , which was highlighting HTML tags e.g. I was searching for "
Yonik Seeley wrote:
>
> On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote:
> Most things in an inverted index are sorted (terms, matching document
> ids, term positions within a field, etc). Can you be more specific
> about what you are trying to accomplish?
>
Sorry, I mean sorting the d
On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote:
> What's the best way to maintain an index that is sorted?
Most things in an inverted index are sorted (terms, matching document
ids, term positions within a field, etc). Can you be more specific
about what you are trying to accomplish?
-Yon
Hi,
What's the best way to maintain an index that is sorted?
--
View this message in context:
http://www.nabble.com/Sorted-Index-tf4701044.html#a13438928
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote:
> Thom Nelson wrote:
> > Check out the HashDocSet from Solr, this is the best way to cache small
> > sets of search results. In general, the Solr BitSet/DocSet classes are
> > more efficient than using the standard java.util.BitSet. You can u
On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
> >
> > The easiest way would be to throw an exception from a custom hit
> > collector (and then catch it yourself and continue).
> >
>
> Cheers, I wonder if the performance penalty from throwing an exception is
> worth it
On Friday 26 October 2007 19:06, Zdeněk Vráblík wrote:
> It works if query string ends with ~, but how to switch it on for all
> query?
That's not supported AFAIK. You will need to iterate over the query
(recursively if it's an instance of BooleanQuery) and create a new query
where all parts ar
Thom Nelson wrote:
>
> Check out the HashDocSet from Solr, this is the best way to cache small
> sets of search results. In general, the Solr BitSet/DocSet classes are
> more efficient than using the standard java.util.BitSet. You can use
> these independent of the rest of Solr (though I r
Yonik Seeley wrote:
>
> The easiest way would be to throw an exception from a custom hit
> collector (and then catch it yourself and continue).
>
Cheers, I wonder if the performance penalty from throwing an exception is
worth it?
--
View this message in context:
http://www.nabble.com/Exit-a
Check out the HashDocSet from Solr, this is the best way to cache small
sets of search results. In general, the Solr BitSet/DocSet classes are
more efficient than using the standard java.util.BitSet. You can use
these independent of the rest of Solr (though I recommend checking out
Solr if yo
On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote:
> I am doing a simple conjunction search for documents that do not need to be
> scored or sorted and was wondering if there is a way to stop the search from
> a hit collector when I have enough hits?
The easiest way would be to throw an except
Hi,
I am thinking about caching search results for common queries and just want
to check that for small numbers of results it would be better to store the
doc number as ints or shorts than to store a Filter with a BitSet. I guess
if you results contain less than 1/32 or 1/16 of the number of doc
Hi,
I am doing a simple conjunction search for documents that do not need to be
scored or sorted and was wondering if there is a way to stop the search from
a hit collector when I have enough hits? I guess I am after a hot collector
that can return a boolean determining if the search should cont
Hi all,
How could I set fuzzy search in MultifieldQueryParser?
It works if query string ends with ~, but how to switch it on for all query?
I would like to search without fuzzy and if nothing is found I would
like to search with fuzzy search.
Thanks.
Regards,
Zdenek
--
Guessing your problem here too but see
http://www.htxs.nl/docs/lucene/docs/api/org/apache/lucene/demo/IndexHTML.html
It shows an approach to incremental indexing which updates an index with only
the changed files in a folder.
- Original Message
From: poojasreejith <[EMAIL PROTECTED]>
T
I'm new to lucene and am interested in learning how enterprises deploy
multi-server installations of lucene for large 24x7 operations.
The first question that comes to mind is: are most of the design decisions made
at during development time, or can a simple server be 'grown into' something
26 okt 2007 kl. 06.31 skrev poojasreejith:
I have a folder which contains the indexed files. so, suppose if i
want to add one more indexed data into it, without deleting the
whole folder and performing the indexing for all the files again.
I want it to do only that one file and add the i
Hi All,
is it now possible to release the memory after every search in lucene
for 50 GB of records.
testn wrote:
>
> I think you store dateSc with full precision i.e. with time. You should
> consider to index it just date part or to the resolution you really need.
> It should reduce the m
Hello,
I am seeing that a query with boolean queries in boolean queries takes
much longer than just a single boolean query when the number of hits if
fairly large. For example
+prop1:a +prop2:b +prop3:c +prop4:d +prop5:e
is much faster than
(+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d) +pro
19 matches
Mail list logo