Re: Sorting consumes hundreds of MBytes RAM

2008-04-25 Thread Nadav Har'El
using. Instead of using a stored field, I would recommend using *payloads*. If you store the field's valye as payload on a custom term, you basically get a posting-list of the field value, which can be (theoretically, at least) efficiently skipped on one hand - and read in sequen

Re: problems with deleteDocuments

2007-07-07 Thread Nadav Har'El
the main text and the attachments. By the way, the method is called "deleteDocuments" - doesn't that imply that it's perfectly acceptable to delete many documents with one term? -- Nadav Har'El| Sunday, Jul 8 2007, 22 Tammuz 5767 IBM Ha

Re: indexing performance

2007-03-01 Thread Nadav Har'El
he count that has same field values. You need just the counts? And you want to do just whole-field matching, not word matching? In that case, Lucene might be an overkill for you. Or, if you do use Lucene, make sure to use "keyword" (untokenized) fields, not "tokenized" fields.

Re: NO_NORMS and TOKENIZED?

2007-02-15 Thread Nadav Har'El
he implications. If we had also a "TOKENIZED_NO_NORMS", why would new users accidentally use it? I guess the javadoc of this parameter could also warn against its use (something like "not recommended for general use", o

Re: NO_NORMS and TOKENIZED?

2007-01-23 Thread Nadav Har'El
should refer to setOmitNorms()? (Or I should learn to search the documentation better :-)). -- Nadav Har'El| Tuesday, Jan 23 2007, 4 Shevat 5767 IBM Haifa Research Lab |- |

NO_NORMS and TOKENIZED?

2007-01-23 Thread Nadav Har'El
e field's value *with* an Analyzer, but still disable the storing of norms (because the field length should not be considered in scoring)? Can't I do that? Was this intentional, or is this an oversight and a fifth option should be added? Thanks, Nadav. -- Nadav Har'El

Re: Websphere and Dark Matter

2007-01-23 Thread Nadav Har'El
On Mon, Jan 22, 2007, John Haxby wrote about "Re: Websphere and Dark Matter": > Nadav Har'El wrote: > Are you implying that the process memory shrinks, that memory is > returned to the kernel? I didn't read the page you referenced that way. > I know that if I a

Re: Websphere and Dark Matter

2007-01-16 Thread Nadav Har'El
ers as well. A good combination I once used is this: -XX:NewRatio=2 -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=30 But your milage may vary. [1] http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp -- N

Re: Indexing floating point number

2006-11-02 Thread Nadav Har'El
lication, where there is barely a handful of numeric fields, this slow encoding is shadowed by the much slower process of indexing the document itself. Not to mention that what usually really matters is the speed of the search or sort, not the speed of the one-time indexing. -- Nadav Har'El

Re: Indexing floating point number

2006-11-01 Thread Nadav Har'El
t; OR "2.41" OR "2.42" OR "2.43" OR "2.44" (note that this is an OR of just 7 posting lists, even if this range contains thousands of distinct values). I wonder if anybody ever done such a thing (or came up with an better solution) in Lucene. -- Nadav Har&#x

Re: HitCollector and Sort Objects

2006-07-01 Thread Nadav Har'El
I raised the idea of having a search() method which returns a Hits and calls a HitCollector, but was convinced that TopDocs+HitCollector is actually better. See: http://www.gossamer-threads.com/lists/lucene/java-dev/37277 Maybe this should be in the FAQ. -- Nadav Har'El

Re: Does more memory help Lucene?

2006-06-12 Thread Nadav Har'El
Otis Gospodnetic <[EMAIL PROTECTED]> wrote on 12/06/2006 04:36:45 PM: > Nadav, > > Look up one of my onjava.com Lucene articles, where I talk about > this. You may also want to tell Lucene to merge segments on disk > less frequently, which is what mergeFactor does. Thanks. Can you please point m

Re: Does more memory help Lucene?

2006-06-12 Thread Nadav Har&#x27;El
"Michael D. Curtin" <[EMAIL PROTECTED]> wrote on 12/06/2006 03:49:53 PM: > Nadav Har'El wrote: > > > What I couldn't figure out how to use, however, was the abundant memory (2 > > GB) that this machine has. > > > > I tried playing with Inde

Does more memory help Lucene?

2006-06-12 Thread Nadav Har&#x27;El
e speed of huge merges, for example? Thanks, Nadav. -- Nadav Har'El [EMAIL PROTECTED] +972-4-829-6326 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

re :Range queries

2006-05-09 Thread Nadav Har&#x27;El
;ConstantScoreRangeQuery" instead of "RangeQuery". It is still very inefficient, and you still need to remember to pad all your numbers so they sort properly *lexicographically* (e.g., 00-100), but at least you should not have exceptions any more. -- Nadav Har'El

Re: Lucene Seaches VS. Relational database Queries

2006-04-12 Thread Nadav Har&#x27;El
ueryParser, but you can do it with the SpanFirstQuery: for example if we index Jason Bateman as the three tokens Jason Bateman $ then we can search for it using something like SpanQuery[] terms = { new SpanTermQuery(new Term("actor", "Jason")),

Re: Small field indexing and ranking

2006-04-11 Thread Nadav Har&#x27;El
pact on the index size, and it may be possible to get similar results with no impact on index size and just a small run-time slowdown by using something like SpanNearQuery, or a variation on this idea. Again, I didn't yet try to do this myself, so I'm not sure how successful that woul

Re: Small field indexing and ranking

2006-04-11 Thread Nadav Har&#x27;El
ppear there, they actually appear very close, and in this case even in order. This sort of proximity-influenced scoring is missing from Lucene's QueryParser, and I've been wondering recently on how it is best to add it, and whether it is possible to easily do it with existing Lucene machinary

solution: RangeQuery with floating point numbers

2006-04-09 Thread Nadav Har&#x27;El
ect // every time, because that object is not thread safe. This may // be a performance bottleneck. DecimalFormat mantFormatter = new DecimalFormat(".##"); // 1234567890 String result

Re: Simpler QueryParser

2006-03-20 Thread Nadav Har&#x27;El
on), and the parsing will fail if these features are used (or, alternatively, think of what else you can do in this case). -- Nadav Har'El - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene query analysis

2006-03-15 Thread Nadav Har&#x27;El
dition to breaking up the text on white spaces, also breaks it up in other logical places (like punctuation, but not in every case), and more importantly for you, it indexes the text in lowercase. You should use StandardAnalyzer both during indexing, and du

Efficiently updating indexed documents

2006-02-28 Thread Nadav Har&#x27;El
ir.deleteDocument(doctodelete); doctodelete=docs.doc(); } } idsReplaced.clear(); ir.close(); } I did not test this idea too much, but in some initial experiments I tried, it seems to work. -- Nadav Har

Re: Get list with found words for a hit?

2006-02-27 Thread Nadav Har&#x27;El
ocument doc = hits.doc(i); TokenStream tokenStream = analyzer.tokenStream("storedContent", new StringReader(doc.get("storedContent"))); summary = highlighter.getBestFragments(to

Re: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-22 Thread Nadav Har&#x27;El
of this delete key isn't defined by Lucene, but I believe that the concept of such a key was "officially" sanctioned by Lucene with the deleteDocuments(Term) method (whose documentation even mentions the "unique ID string" scenari

Re: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-21 Thread Nadav Har&#x27;El
fied document if we search for the term again we'll find two documents. What about this idea? Does an implementation of something similar already exist? -- Nadav Har'El - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]