date:20060406

Calling addDocument twice for the same document

2006-04-06 Thread Daniel Noll

Hi all. I have a situation where a Document is constructed with a bunch of strings and a couple of readers. An error may occur while reading from the readers, and in these situations, we want to remove the reader and then try to index the same document again. I've made a test case which cre

Re: StopAnalyzer and apostrophes

2006-04-06 Thread Marvin Humphrey

On Apr 6, 2006, at 4:23 PM, Daniel Noll wrote: Marvin Humphrey wrote: I wrote: It looks like StopAnalyzer tokenizes by letter, and doesn't handle apostrophes. So, the input "I don't know" produces these tokens: don t know Is that right? It's not right. StopAnalyzer does to

Re: StopAnalyzer and apostrophes

2006-04-06 Thread Daniel Noll

Marvin Humphrey wrote: I wrote: It looks like StopAnalyzer tokenizes by letter, and doesn't handle apostrophes. So, the input "I don't know" produces these tokens: don t know Is that right? It's not right. StopAnalyzer does tokenize letter by letter, but 't' is a stopword, s

Re: highlighting - fuzzy search

2006-04-06 Thread Daniel Noll

Fisheye wrote: HashSet terms = new HashSet(); query.rewrite(reader).extractTerms(terms); Ok, but this delivers every term, not just a list of words the Levenshtein algorithm produced with similarity. I asked a similar thing in the past about term highlighting in general, and

Re: Getting count of documents matching a query?

2006-04-06 Thread Chris Hostetter

: I need the count, and don't need the docs at this point. If I had a : simple query, (e.g. "book") I can use docFreq(), and it's lightning : fast. If I just run it as a query it's much slower. I'm just : wondering if I did a custom scorer / similarity / hitcollector, how : much faster than a quer

Re: doc.get("contents")

2006-04-06 Thread Chris Hostetter

Fields, by default, are not stored. if you look at the FileDocument.java file in the demo, you should see that the contents field is created this way... // Add the contents of the file to a field named "contents". Specify a Reader, // so that the text of the file is tokenized and index

Getting count of documents matching a query?

2006-04-06 Thread Tom Hill

Hi - Is there a fast way (not easy, but speedy) of getting the count of documents that match a query? I need the count, and don't need the docs at this point. If I had a simple query, (e.g. "book") I can use docFreq(), and it's lightning fast. If I just run it as a query it's much slower. I'

doc.get("contents")

2006-04-06 Thread miki sun

Dear all I got a java.lang.NullPointerException at java.io.StringReader.(StringReader.java:33) error when processing the following code: for (int i = 0; i < theHits.length(); i++) { Document doc = theHits.doc(i); String contents = doc.get("contents") ; TokenStream tokenStream = analyzer.token

Question about Lucene's search algorithm

2006-04-06 Thread inge santoso

Hi all, Im still new to Lucene. I'm in the last year of my bachelor degree in Computer Science. My final thesis is about indexing and searching in Lucene 1.4.3. I've read about Space Optimizations for Total Ranking paper. My main question is : 1.What search algorithm

RE: Distributed Lucene.. - clustering as a requirement

2006-04-06 Thread Dmitry Goldenberg

I think it's a good idea. For an enterprise-level application, Lucene appears too file-system and too byte-sequence-centric a technology. Just my opinion. The Directory API is just too low-level. I'd be OK with an RDBMS-based Directory implementation I could take and use. But generally, I

Re: Distributed Lucene.. - clustering as a requirement

2006-04-06 Thread Chris Lamprecht

What about using lucene just for searching (i.e., no stored fields except maybe one "ID" primary key field), and using an RDBMS for storing the actual "documents"? This way you're using lucene for what lucene is best at, and using the database for what it's good at. At least up to a point -- RDBM

Re: Question related to using FieldCacheImpl

2006-04-06 Thread John Smith

Thank you JS --- Yonik Seeley <[EMAIL PROTECTED]> wrote: > On 4/6/06, John Smith <[EMAIL PROTECTED]> > wrote: > >// inherit javadocs > > public String[] getStrings (IndexReader reader, > String field) > > > > The string array I get back, is it guaranteed > that the first non-null value I

Re: Question related to using FieldCacheImpl

2006-04-06 Thread Yonik Seeley

On 4/6/06, John Smith <[EMAIL PROTECTED]> wrote: >// inherit javadocs > public String[] getStrings (IndexReader reader, String field) > > The string array I get back, is it guaranteed that the first non-null value > I encounter in the array is the minimum value for this field and iterating

Re: nested phrase queries

2006-04-06 Thread Erik Hatcher

Seeing this worries me we'll see users creating XML strings, then parsing them to get the desired query. I've seen this lots with QueryParser, but it would be even more gross to see folks do this with the XML syntax. So, here's my community service message for the day if you're creati

RE: Distributed Lucene.. - clustering as a requirement

2006-04-06 Thread Dmitry Goldenberg

I firmly believe that clustering support should be a part of Lucene. We've tried implementing it ourselves and so far have been unsuccessful. We tried storing Lucene indices in a database that is the back-end repository for our app in a clustered environment and could not overcome the indexing

Re: DateField vs DateTools

2006-04-06 Thread Daniel Naber

On Donnerstag 06 April 2006 19:50, John Smith wrote: > I have not drilled down into the implementation details too much, but > what was the reason for getting rid of these methods in Lucene 1.9? There is no limit on the given dates in DateTools (within the limits of what Java's Calendar/Date c

RE: Data structure of a Lucene Index

2006-04-06 Thread Dmitry Goldenberg

Ideally, I'd love to see an article explaining both in detail: the index structure as well as the merge algorithm... From: Prasenjit Mukherjee [mailto:[EMAIL PROTECTED] Sent: Tue 3/28/2006 11:57 PM To: java-user@lucene.apache.org Subject: Data structure of a Luce

Question related to using FieldCacheImpl

2006-04-06 Thread John Smith

Hi I need to access min and max values of a particular field in the index, as soon as a searcher is initialized. I don't need it later. Looking at old newsgroup mails, I found a few recommendations. One was to keep the min and max fields external to the index. But this will not work

DateField vs DateTools

2006-04-06 Thread John Smith

Hi We are in the process of upgrading Lucene from 1.2 to 1.9. There used to be 2 methods in DateField.java in 1.2 public static String MIN_DATE_STRING() public static String MAX_DATE_STRING() This basically gave the minimum and the maximum dates we could index using

Re: StopAnalyzer and apostrophes

2006-04-06 Thread Marvin Humphrey

I wrote: It looks like StopAnalyzer tokenizes by letter, and doesn't handle apostrophes. So, the input "I don't know" produces these tokens: don t know Is that right? It's not right. StopAnalyzer does tokenize letter by letter, but 't' is a stopword, so the tokens are:

Multiple Indexes Search

2006-04-06 Thread Yang Sun

Hi, Just wondering if there is anyway to search two indexes with relations like in the relational database. For example, in index1 there are fields "pid" and "content". in index2 there are fields "cid", "record", and "pid". I want to search keyword1 in content and keyword2 in record and they

StopAnalyzer and apostrophes

2006-04-06 Thread Marvin Humphrey

Greets, It looks like StopAnalyzer tokenizes by letter, and doesn't handle apostrophes. So, the input "I don't know" produces these tokens: don t know Is that right? Marvin Humphrey Rectangular Research http://www.rectangular.com/ -

easy way to perform range searches on numeric values

2006-04-06 Thread Bill Snyder

Hello, How can I configure Lucene to handle numeric range searches? (This question has been asked 100 times, I'm sure.) I've tried the suggestions on the SearchNumericalFields wiki page. This seems to work for simple queries. Searching for "line:[1 to 10]" gives me lines 1 thru 10 of the documen

Re: nested phrase queries

2006-04-06 Thread mark harwood

The XMLQueryParser in the contrib section also handles Spans (as well as a few other Lucene queries/filters not represented by the standard QueryParser). Here's an example of a complex query from the JUnit test killed died dead miner miners

Re: nested phrase queries

2006-04-06 Thread Erik Hatcher

On Apr 6, 2006, at 8:47 AM, Michael Dodson wrote: Can phrase queries be nested the same way boolean queries can be nested? Yes... using SpanNearQuery instead of PhraseQuery. I want a user query to be translated into a boolean query (say, x AND (y OR z)), and I want those terms to be withi

nested phrase queries

2006-04-06 Thread Michael Dodson

Can phrase queries be nested the same way boolean queries can be nested? I want a user query to be translated into a boolean query (say, x AND (y OR z)), and I want those terms to be within a certain distance of each other (approximately within the same sentence, so the slop would be about

RE: Optimize completely in memory with a FSDirectory?

2006-04-06 Thread Max Pfingsthorn

Hi, Thanks for your suggestion. I thought about the same, but somehow it didn't seem like such a good idea... Now that I think about it, it would take the same IO load (in terms of flushing many megabytes to disk) as optimizing in memory with the FSDirectory. Another weird thing we observed he

Re: highlighting - fuzzy search

2006-04-06 Thread Fisheye

HashSet terms = new HashSet(); query.rewrite(reader).extractTerms(terms); Ok, but this delivers every term, not just a list of words the Levenshtein algorithm produced with similarity. Regarding to the posts here in my opened thread, you guis seem to be experienced programmers so

Calling addDocument twice for the same document

Re: StopAnalyzer and apostrophes

Re: StopAnalyzer and apostrophes

Re: highlighting - fuzzy search

Re: Getting count of documents matching a query?

Re: doc.get("contents")

Getting count of documents matching a query?

doc.get("contents")

Question about Lucene's search algorithm

RE: Distributed Lucene.. - clustering as a requirement

Re: Distributed Lucene.. - clustering as a requirement

Re: Question related to using FieldCacheImpl

Re: Question related to using FieldCacheImpl

Re: nested phrase queries

RE: Distributed Lucene.. - clustering as a requirement

Re: DateField vs DateTools

RE: Data structure of a Lucene Index

Question related to using FieldCacheImpl

DateField vs DateTools

Re: StopAnalyzer and apostrophes

Multiple Indexes Search

StopAnalyzer and apostrophes

easy way to perform range searches on numeric values

Re: nested phrase queries

Re: nested phrase queries

nested phrase queries

RE: Optimize completely in memory with a FSDirectory?

Re: highlighting - fuzzy search

28 matches

Site Navigation

Mail list logo

Footer information