Hi all.
I have a situation where a Document is constructed with a bunch of
strings and a couple of readers. An error may occur while reading from
the readers, and in these situations, we want to remove the reader and
then try to index the same document again.
I've made a test case which cre
On Apr 6, 2006, at 4:23 PM, Daniel Noll wrote:
Marvin Humphrey wrote:
I wrote:
It looks like StopAnalyzer tokenizes by letter, and doesn't
handle apostrophes. So, the input "I don't know" produces these
tokens:
don
t
know
Is that right?
It's not right. StopAnalyzer does to
Marvin Humphrey wrote:
I wrote:
It looks like StopAnalyzer tokenizes by letter, and doesn't handle
apostrophes. So, the input "I don't know" produces these tokens:
don
t
know
Is that right?
It's not right. StopAnalyzer does tokenize letter by letter, but 't' is
a stopword, s
Fisheye wrote:
HashSet terms = new HashSet();
query.rewrite(reader).extractTerms(terms);
Ok, but this delivers every term, not just a list of words the Levenshtein
algorithm produced with similarity.
I asked a similar thing in the past about term highlighting in general,
and
: I need the count, and don't need the docs at this point. If I had a
: simple query, (e.g. "book") I can use docFreq(), and it's lightning
: fast. If I just run it as a query it's much slower. I'm just
: wondering if I did a custom scorer / similarity / hitcollector, how
: much faster than a quer
Fields, by default, are not stored. if you look at the FileDocument.java
file in the demo, you should see that the contents field is created this
way...
// Add the contents of the file to a field named "contents". Specify a
Reader,
// so that the text of the file is tokenized and index
Hi -
Is there a fast way (not easy, but speedy) of getting the count of
documents that match a query?
I need the count, and don't need the docs at this point. If I had a
simple query, (e.g. "book") I can use docFreq(), and it's lightning
fast. If I just run it as a query it's much slower. I'
Dear all
I got a java.lang.NullPointerException at
java.io.StringReader.(StringReader.java:33) error when processing the
following code:
for (int i = 0; i < theHits.length(); i++)
{
Document doc = theHits.doc(i);
String contents = doc.get("contents") ;
TokenStream tokenStream = analyzer.token
Hi all,
Im still new to Lucene. I'm in the last year of my bachelor degree in
Computer Science. My final thesis is about indexing and searching in Lucene
1.4.3. I've read about Space Optimizations for Total Ranking paper. My main
question is :
1.What search algorithm
I think it's a good idea. For an enterprise-level application, Lucene appears
too file-system and too byte-sequence-centric a technology. Just my opinion.
The Directory API is just too low-level.
I'd be OK with an RDBMS-based Directory implementation I could take and use.
But generally, I
What about using lucene just for searching (i.e., no stored fields
except maybe one "ID" primary key field), and using an RDBMS for
storing the actual "documents"? This way you're using lucene for what
lucene is best at, and using the database for what it's good at. At
least up to a point -- RDBM
Thank you
JS
--- Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On 4/6/06, John Smith <[EMAIL PROTECTED]>
> wrote:
> >// inherit javadocs
> > public String[] getStrings (IndexReader reader,
> String field)
> >
> > The string array I get back, is it guaranteed
> that the first non-null value I
On 4/6/06, John Smith <[EMAIL PROTECTED]> wrote:
>// inherit javadocs
> public String[] getStrings (IndexReader reader, String field)
>
> The string array I get back, is it guaranteed that the first non-null value
> I encounter in the array is the minimum value for this field and iterating
Seeing this worries me we'll see users creating XML strings, then
parsing them to get the desired query. I've seen this lots with
QueryParser, but it would be even more gross to see folks do this
with the XML syntax. So, here's my community service message for the
day if you're creati
I firmly believe that clustering support should be a part of Lucene. We've
tried implementing it ourselves and so far have been unsuccessful. We tried
storing Lucene indices in a database that is the back-end repository for our
app in a clustered environment and could not overcome the indexing
On Donnerstag 06 April 2006 19:50, John Smith wrote:
> I have not drilled down into the implementation details too much, but
> what was the reason for getting rid of these methods in Lucene 1.9?
There is no limit on the given dates in DateTools (within the limits of
what Java's Calendar/Date c
Ideally, I'd love to see an article explaining both in detail: the index
structure as well as the merge algorithm...
From: Prasenjit Mukherjee [mailto:[EMAIL PROTECTED]
Sent: Tue 3/28/2006 11:57 PM
To: java-user@lucene.apache.org
Subject: Data structure of a Luce
Hi
I need to access min and max values of a particular field in the index, as
soon as a searcher is initialized. I don't need it later. Looking at old
newsgroup mails, I found a few recommendations.
One was to keep the min and max fields external to the index. But this will
not work
Hi
We are in the process of upgrading Lucene from 1.2 to 1.9.
There used to be 2 methods in DateField.java in 1.2
public static String MIN_DATE_STRING()
public static String MAX_DATE_STRING()
This basically gave the minimum and the maximum dates we could index using
I wrote:
It looks like StopAnalyzer tokenizes by letter, and doesn't handle
apostrophes. So, the input "I don't know" produces these tokens:
don
t
know
Is that right?
It's not right. StopAnalyzer does tokenize letter by letter, but 't'
is a stopword, so the tokens are:
Hi,
Just wondering if there is anyway to search two indexes with relations
like in the relational database. For example, in index1 there are fields
"pid" and "content". in index2 there are fields "cid", "record", and
"pid". I want to search keyword1 in content and keyword2 in record and
they
Greets,
It looks like StopAnalyzer tokenizes by letter, and doesn't handle
apostrophes. So, the input "I don't know" produces these tokens:
don
t
know
Is that right?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
-
Hello,
How can I configure Lucene to handle numeric range searches? (This question
has been asked 100 times, I'm sure.)
I've tried the suggestions on the SearchNumericalFields wiki page. This
seems to work for simple queries. Searching for "line:[1 to 10]" gives me
lines 1 thru 10 of the documen
The XMLQueryParser in the contrib section also handles
Spans (as well as a few other Lucene queries/filters
not represented by the standard QueryParser).
Here's an example of a complex query from the JUnit
test
killed
died
dead
miner miners
On Apr 6, 2006, at 8:47 AM, Michael Dodson wrote:
Can phrase queries be nested the same way boolean queries can be
nested?
Yes... using SpanNearQuery instead of PhraseQuery.
I want a user query to be translated into a boolean query (say, x
AND (y OR z)), and I want those terms to be withi
Can phrase queries be nested the same way boolean queries can be nested?
I want a user query to be translated into a boolean query (say, x AND
(y OR z)), and I want those terms to be within a certain distance of
each other (approximately within the same sentence, so the slop would
be about
Hi,
Thanks for your suggestion. I thought about the same, but somehow it didn't
seem like such a good idea... Now that I think about it, it would take the same
IO load (in terms of flushing many megabytes to disk) as optimizing in memory
with the FSDirectory.
Another weird thing we observed he
HashSet terms = new HashSet();
query.rewrite(reader).extractTerms(terms);
Ok, but this delivers every term, not just a list of words the Levenshtein
algorithm produced with similarity. Regarding to the posts here in my opened
thread, you guis seem to be experienced programmers so
28 matches
Mail list logo