Hello,
I am using Lucene to build an index from roughly 10 million documents
in number. The documents are about 4 TB in total.
After some trial runs, indexing a subset of the documents I am trying
to figure out a hosting service configuration to create a full index
from the entire 10 TB of data
On 17-Dec-07, at 11:39 AM, Beyer,Nathan wrote:
Would using Field.Index.UN_TOKENIZED be the same as tokenizing a field
into one token?
Indeed.
-Mike
-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Monday, December 17, 2007 12:53 PM
To: java-user@lucene.apache.or
I'm working on a project where we are indexing content for several different
languages - English, Spanish, French and German. I have built separate
indexes for each language using the proper Analyzer for each
language(StandardAnalyzer for English, FrenchAnalyzer for French, etc.). We
have a require
Hi Sirish,
A few hours ago I sent a reply to your message, if my
understanding is correct, you indexed a doc with text
as
Health and Safety
and you used phrase
Health Safety
to create a phrase query. If that is the case, this is
normal since you used StandardAnalyzer to tokenize the
input tex
Would using Field.Index.UN_TOKENIZED be the same as tokenizing a field
into one token?
-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Monday, December 17, 2007 12:53 PM
To: java-user@lucene.apache.org
Subject: Re: thoughts/suggestions for analyzing/tokenizing class na
Please do not highack the thread. When starting a new topic, do NOT
use "reply to", start an entirely new e-mail. Otherwise your topic often
gets ignored by people who are uninterested in the original thread.
Best
Erick
On Dec 17, 2007 5:57 AM, anjana m <[EMAIL PROTECTED]> wrote:
> how to i use
Either index them as a series of tokens:
org
org.apache
org.apache.lucene
org.apache.lucene.document
org.apache.lucene.document.Document
or index them as a single token, and use prefix queries (this is what
I do for reverse domain names):
classname:(org.apache org.apache.*)
Note that "class
Hi anjana m,
You're going to have lots of trouble getting a response, for two reasons:
1. You are replying to an existing thread and changing the subject. Don't do
that. When you have a question, start a new thread by creating a new email
instead of replying.
2. You are not telling the list
Hi,
Do you mean that your query phrase is "Health Safety",
but docs with "Health and Safety" returned?
If that is the case, the reason is that StandardAnalyzer
filters out "and" (also "or, "in" and others) as stop
words during indexing, and the QueryParser filters those
words out also.
Best reg
I have the following code for search:
BooleanQuery bQuery = new BooleanQuery();
Query queryAuthor;
queryAuthor = new TermQuery(new Term(IFIELD_LEAD_AUTHOR,
author.trim().toLowerCase()));
bQuery.add(queryAuthor, BooleanClause.Occur.MUST);
...
Good point.
I don't want the sub-package names on their own to match.
Text (class name)
- "org.apache.lucene.document.Document"
Queries that would match
- "org.apache", "org.apache.lucene.document"
Queries that DO NOT match
- "apache", "lucene", "document"
-Nathan
-Original Message-
On 15-Dec-07, at 3:14 PM, Beyer,Nathan wrote:
I have a few fields that use package names and class names and I've
been
looking for some suggestions for analyzing these fields.
A few examples -
Text (class name)
- "org.apache.lucene.document.Document"
Queries that would match
- "org.apache" ,
I don't consider sending this kind of message to the list pollution.
It's good to take a step back from time to time and remember that
almost all of us volunteer here, even if we get paid to work w/
Lucene. I am constantly amazed at the Lucene community and what it
has to offer in the way
Hi,
I have got invaluable help from several people of this list.
Unfortunately I couldn't guess the email of some of you.
So, many thanks to all who have helped me.
Merry Christmas and a Hapy New Year to you all.
(Perhaps someone comes up with a means to say 'thank you'
without 'polluting' the l
On Dec 17, 2007, at 5:14 AM, qvall wrote:
So does it mean that if I my query doesn't support prefix or wild-char
queries then I don't need to use rewrite() for highlighting?
As long as the terms you want highlighted are extractable from the
Query instance, all is fine.
However, it wouldn't
On Dec 17, 2007, at 3:31 AM, Helmut Jarausch wrote:
FuzzyQuery (in the 2.2.0 API) may take 3 arguments,
term , minimumSimilarity and prefixLength
Is there any syntax to specify the 3rd argument
in a query term for QueryParser?
(I haven't found any the current docs)
No, there isn't. But you
Hi,
We are working with a web server and 10 search servers, these 10 servers
have index fragments on it. All available fragments of these search servers
are binding at their start up time. Remote Parallel MultiSearcher is used
for searching on these indices. When a search request comes, first it
l
hey i amnot bale to comple packages are not found..
i download..the luncene package..
help me..
.lucene.search.Hits;
import org.apache.lucene.search.Query;
import org.apache.lucene.document.Field;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.index.IndexWriter;
import org.apach
See in Lucene FAQ:
"Are Wildcard, Prefix, and Fuzzy queries case sensitive?"
On Dec 17, 2007 11:27 AM, Helmut Jarausch <[EMAIL PROTECTED]>
wrote:
> Hi,
>
> please help I am totally puzzled.
>
> The same query, once with a direct call to FuzzyQuery
> succeeds while the same query with QueryParse
how to i use lucene search to serach files of the local system
On Dec 17, 2007 2:11 PM, Helmut Jarausch <[EMAIL PROTECTED]>
wrote:
> Hi,
>
> according to the LiA book the FuzzyQuery distance is computed as
>
> 1- distance / min(textlen,targetlen)
>
> Given
> def addDoc(text, writer):
>doc = D
So does it mean that if I my query doesn't support prefix or wild-char
queries then I don't need to use rewrite() for highlighting?
--
View this message in context:
http://www.nabble.com/Query.rewrite---help-me-to-understand-it-tp14314507p14370200.html
Sent from the Lucene - Java Users mailing l
Hi,
please help I am totally puzzled.
The same query, once with a direct call to FuzzyQuery
succeeds while the same query with QueryParser fails.
What am I missing?
Sorry, I'm using pylucene (with lucene-java-2.2.0-603782)
#!/usr/bin/python
import lucene
from lucene import *
lucene.initVM(luce
Hi,
according to the LiA book the FuzzyQuery distance is computed as
1- distance / min(textlen,targetlen)
Given
def addDoc(text, writer):
doc = Document()
doc.add(Field("field", text,
Field.Store.YES, Field.Index.TOKENIZED))
writer.addDocument(doc)
addDoc("
Hi,
FuzzyQuery (in the 2.2.0 API) may take 3 arguments,
term , minimumSimilarity and prefixLength
Is there any syntax to specify the 3rd argument
in a query term for QueryParser?
(I haven't found any the current docs)
Many thanks for a hint,
Helmut Jarausch
Lehrstuhl fuer Numerische Mathematik
24 matches
Mail list logo