Re: Sort fields shouldn't be tokenized

2009-11-16 Thread J.J. Larrea
sort logic. - J.J. On Nov 16, 2009, at 11:38 AM, Jeff Plater wrote: Thanks - so if my sort field is a single term then I should be ok with using an analyzer (to lowercase it for example). -Jeff -Original Message- From: J.J. Larrea [mailto:j...@panix.com] Sent: Monday, November 16, 20

Re: Sort fields shouldn't be tokenized

2009-11-16 Thread J.J. Larrea
nexpected ways. Does the proviso make more sense now? - J.J. Larrea On Nov 16, 2009, at 10:36 AM, Jeff Plater wrote: I am looking at adding some sorting functionality to my application and read that Sort fields should not be tokenized - can anyone explain why? I have code that is tokeniz

Re: Storing a Lucene Index on a SAN Storage: good idea?

2009-09-27 Thread J.J. Larrea
omponent of the optimization process. - J.J. Larrea At 9:21 AM + 9/26/09, Mark Harwood wrote: >I have a client with 700 million doc index running on a SAN. The performance >is v good but this obviously depends on your choice of SAN config. In this >environment I have multiple search servers

Re: Optimizing index takes too long

2007-11-11 Thread J.J. Larrea
Hi. Here are a couple of thoughts: 1. Your problem description would be a little easier to parse if you didn't use the word "stored" to refer to fields which are not, in a Lucene sense, stored, only indexed. For example, one doesn't "store" stemmed and unstemmed versions, since stemming has ab

Re: Searching API: QueryParser vs Programatic queries

2006-05-22 Thread J.J. Larrea
At 10:15 AM +0100 5/22/06, Irving, Dave wrote: >- Is there maybe some room for more utility classes in Lucene which make >this easier? E.g: When building up a document, we don't have to worry >about running content through an analyser - but unless we use >QueryParser, there doesn't seem to be corre

Re: html parsers and numers of terms

2005-12-13 Thread J.J. Larrea
om an "index-eye" view, which can be very revealing. - J.J. At 11:36 AM -0500 12/13/05, Robert Watkins wrote: >Aha! I had, indeed, been fooled by Luke into thinking that the entities >had been converted upon analysis, but you have set me straight. > >Thanks, >-- Robe

Re: html parsers and numers of terms

2005-12-13 Thread J.J. Larrea
Beware of HTML/XML entities in your input stream! The Lucene analyzers (including StandardAnalyzer) do not interpret these representation-specific encodings, and assume the & and ; delimiters are punctuation. How they deal with punctuation depends on the specific Analyzer logic. For example,

Merging with IndexWriter.addIndexes(...)

2005-11-28 Thread J.J. Larrea
n for merging? Any advice on this or the general application design, would be appreciated. Thanks, J.J. Larrea PS: This was tested against SVN trunk revision 329490 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comm

Re: What is a Hits object?

2005-10-05 Thread J.J. Larrea
A Hits object is essentially a cache on query results. It caches in 2 ways: 1. When a query returning Hits is requested, only the top 100 document IDs and scores are requested from the scoring system, and the ID/Score pairs are stored in a list in the Hits object. Whenever a document ID, score

Re: Renewing IndexSearcher on index change.

2005-10-04 Thread J.J. Larrea
Oops! Yes that's correct. Thanks for catching it... - J.J. At 10:33 AM -0700 10/4/05, Chris Hostetter wrote: >: // feebly try to prevent concurrent reentry problems >: IndexWriter w = writer; >: w = null; >: try { >: w.c

Re: Renewing IndexSearcher on index change.

2005-10-04 Thread J.J. Larrea
At 6:39 PM +0200 10/4/05, Olivier Jaquemet wrote: >In every case I think I will use this to prevent any problem but why nobody >uses finalize methods? is it somehow bad to try to close things correctly that >way? Because they are not run under "brutal termination" conditions. For that you need

Re: Weird time results doing wildcard queries

2005-09-09 Thread J.J. Larrea
>just to clarify, i ment take the call to getMoreDocs(50) which is >currently in the Hits constructor, and refactor it out and into the >"Searcher.search" methods. that way the behavior is hte same as before >for all existing clients, but new subclasses cna change the behavior so >that hte "search

Re: Weird time results doing wildcard queries

2005-09-08 Thread J.J. Larrea
At 8:01 PM -0700 9/8/05, Chris Hostetter wrote: >: Which makes me wonder whether the caching logic of Hits, optimized for >: random- rather than linear-access, and not tuneable or controllable in >: 1.4.3, should be reviewed for a subsequent release, at least the >: API-breaking 2.0. I'll wager th

Re: Weird time results doing wildcard queries

2005-09-08 Thread J.J. Larrea
least the API-breaking 2.0. I'll wager that a majority of applications do nothing other than a one-time linear retrieval of Documents from Hits, with the potential for a lot of wasted cycles for those that retrieve more than a small number. - J.J. Larrea At 4:19 PM -0700 9/8/05, Chris Hoste