date:20071026

Re: HTML analyzer

2007-10-26 Thread Cool Coder

Thanks Ketin for your input. There is already build in HTML strip reader i.e. HTMLStripReader in solr, which I am currently using to strip all HTML tags before creating index. This also solved my earlier problem related to highlighter , which was highlighting HTML tags e.g. I was searching for "

Re: Sorted Index

2007-10-26 Thread John Patterson

Yonik Seeley wrote: > > On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote: > Most things in an inverted index are sorted (terms, matching document > ids, term positions within a field, etc). Can you be more specific > about what you are trying to accomplish? > Sorry, I mean sorting the d

Re: Sorted Index

2007-10-26 Thread Yonik Seeley

On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote: > What's the best way to maintain an index that is sorted? Most things in an inverted index are sorted (terms, matching document ids, term positions within a field, etc). Can you be more specific about what you are trying to accomplish? -Yon

Sorted Index

2007-10-26 Thread John Patterson

Hi, What's the best way to maintain an index that is sorted? -- View this message in context: http://www.nabble.com/Sorted-Index-tf4701044.html#a13438928 Sent from the Lucene - Java Users mailing list archive at Nabble.com. -

Re: Cache BitSet or doc number?

2007-10-26 Thread Yonik Seeley

On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote: > Thom Nelson wrote: > > Check out the HashDocSet from Solr, this is the best way to cache small > > sets of search results. In general, the Solr BitSet/DocSet classes are > > more efficient than using the standard java.util.BitSet. You can u

Re: Exit a search when have enough results

2007-10-26 Thread Yonik Seeley

On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote: > Yonik Seeley wrote: > > > > The easiest way would be to throw an exception from a custom hit > > collector (and then catch it yourself and continue). > > > > Cheers, I wonder if the performance penalty from throwing an exception is > worth it

Re: fuzzy search MultifieldQueryParser - Lucene 2.2

2007-10-26 Thread Daniel Naber

On Friday 26 October 2007 19:06, Zdeněk Vráblík wrote: > It works if query string ends with ~, but how to switch it on for all > query? That's not supported AFAIK. You will need to iterate over the query (recursively if it's an instance of BooleanQuery) and create a new query where all parts ar

Re: Cache BitSet or doc number?

2007-10-26 Thread John Patterson

Thom Nelson wrote: > > Check out the HashDocSet from Solr, this is the best way to cache small > sets of search results. In general, the Solr BitSet/DocSet classes are > more efficient than using the standard java.util.BitSet. You can use > these independent of the rest of Solr (though I r

Re: Exit a search when have enough results

2007-10-26 Thread John Patterson

Yonik Seeley wrote: > > The easiest way would be to throw an exception from a custom hit > collector (and then catch it yourself and continue). > Cheers, I wonder if the performance penalty from throwing an exception is worth it? -- View this message in context: http://www.nabble.com/Exit-a

Re: Cache BitSet or doc number?

2007-10-26 Thread Thom Nelson

Check out the HashDocSet from Solr, this is the best way to cache small sets of search results. In general, the Solr BitSet/DocSet classes are more efficient than using the standard java.util.BitSet. You can use these independent of the rest of Solr (though I recommend checking out Solr if yo

Re: Exit a search when have enough results

2007-10-26 Thread Yonik Seeley

On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote: > I am doing a simple conjunction search for documents that do not need to be > scored or sorted and was wondering if there is a way to stop the search from > a hit collector when I have enough hits? The easiest way would be to throw an except

Cache BitSet or doc number?

2007-10-26 Thread John Patterson

Hi, I am thinking about caching search results for common queries and just want to check that for small numbers of results it would be better to store the doc number as ints or shorts than to store a Filter with a BitSet. I guess if you results contain less than 1/32 or 1/16 of the number of doc

Exit a search when have enough results

2007-10-26 Thread John Patterson

Hi, I am doing a simple conjunction search for documents that do not need to be scored or sorted and was wondering if there is a way to stop the search from a hit collector when I have enough hits? I guess I am after a hot collector that can return a boolean determining if the search should cont

fuzzy search MultifieldQueryParser - Lucene 2.2

2007-10-26 Thread Zdeněk Vráblík

Hi all, How could I set fuzzy search in MultifieldQueryParser? It works if query string ends with ~, but how to switch it on for all query? I would like to search without fuzzy and if nothing is found I would like to search with fuzzy search. Thanks. Regards, Zdenek --

Re: lucene indexing doubts

2007-10-26 Thread mark harwood

Guessing your problem here too but see http://www.htxs.nl/docs/lucene/docs/api/org/apache/lucene/demo/IndexHTML.html It shows an approach to incremental indexing which updates an index with only the changed files in a folder. - Original Message From: poojasreejith <[EMAIL PROTECTED]> T

Scaling out Lucene /general architecture Q

2007-10-26 Thread Mankowski, Chris

I'm new to lucene and am interested in learning how enterprises deploy multi-server installations of lucene for large 24x7 operations. The first question that comes to mind is: are most of the design decisions made at during development time, or can a simple server be 'grown into' something

Re: lucene indexing doubts

2007-10-26 Thread Karl Wettin

26 okt 2007 kl. 06.31 skrev poojasreejith: I have a folder which contains the indexed files. so, suppose if i want to add one more indexed data into it, without deleting the whole folder and performing the indexing for all the files again. I want it to do only that one file and add the i

Re: Java Heap Space -Out Of Memory Error

2007-10-26 Thread Sebastin

Hi All, is it now possible to release the memory after every search in lucene for 50 GB of records. testn wrote: > > I think you store dateSc with full precision i.e. with time. You should > consider to index it just date part or to the resolution you really need. > It should reduce the m

Search performance using BooleanQueries in BooleanQueries

2007-10-26 Thread Ard Schrijvers

Hello, I am seeing that a query with boolean queries in boolean queries takes much longer than just a single boolean query when the number of hits if fairly large. For example +prop1:a +prop2:b +prop3:c +prop4:d +prop5:e is much faster than (+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d) +pro

Re: HTML analyzer

Re: Sorted Index

Re: Sorted Index

Sorted Index

Re: Cache BitSet or doc number?

Re: Exit a search when have enough results

Re: fuzzy search MultifieldQueryParser - Lucene 2.2

Re: Cache BitSet or doc number?

Re: Exit a search when have enough results

Re: Cache BitSet or doc number?

Re: Exit a search when have enough results

Cache BitSet or doc number?

Exit a search when have enough results

fuzzy search MultifieldQueryParser - Lucene 2.2

Re: lucene indexing doubts

Scaling out Lucene /general architecture Q

Re: lucene indexing doubts

Re: Java Heap Space -Out Of Memory Error

Search performance using BooleanQueries in BooleanQueries

19 matches

Site Navigation

Mail list logo

Footer information