Hello Scott!
I think your index is just not large really. My Sharehound's indexes of my
corporate LAN is about 10G/10mlns of (really small) documents now, and queries
get really little time, less than a second for non-sorted queries and some more
for sorted. The machine is some P4 with 1G RAM. I u
I recently played around with a 2 million doc index of docs that averaged
between 2-10k. The system had 4 gig of ram and a 3 gig dual core proc (not
using a parallel searcher to take advantage of the extra core)...pretty
beefy, but with 4 times the docs your talking about. I didn't see a query
tha
Terry Steichen <[EMAIL PROTECTED]> wrote on 13/10/2006 08:01:11:
> You can just add a field to your indexed docs that always evaluates to a
> fixed value. Then you can do queries like: +doc:1 -id:test
Alternatively you can use MatchAllDocsQuery, e.g.
BooleanQuery bq = new BooleanQuery();
Hello Scott!
I think your index is just not large really. My Sharehound's indexes of my
corporate LAN is about 10G/10mlns of (really small) documents now, and queries
get really little time, less than a second for non-sorted queries and some more
for sorted. The machine is some P4 with 1G RAM. I u
Thanks Mark!
I have to mention Benoit Mercier here who worked with me so we could
understand how to expand a term and use TOKEN_MGR_DECLS.
Patrick
On 10/13/06, Mark Miller <[EMAIL PROTECTED]> wrote:
Great work Patrick. I was unfamiliar with the use of TOKEN_MGR_DECLS.
Looks
like a powerfull f
Submitted to Jira with key LUCENE-682
Patrick
Grant Ingersoll wrote:
Hi Patrick,
Thanks for the work. Create a bug in JIRA and upload a patch (see svn
diff). See the Wiki for information on how to contribute.
Thanks,
Grant
-
Hi,
I want to access Lucene index with SRW Web Service and the SRU. I know
that Online Computer Library Center has one implementation:
http://www.oclc.org/research/announcements/2003-11-07b.htm
But it is kind of limited to DSpace digital repository system (I will
check more about that).
Could yo
On 10/13/06, Serhiy Polyakov <[EMAIL PROTECTED]> wrote:
Hi,
I know I can do DB -> XML ->
Lucene but may be there are other solutions?
There is no need to go from DB -> XML -> Lucene. While you can write an XML
Document handler for Lucene (as is done in LIA), it would be just as easy to
writ
You can use DBSight's free version. It'll flatten complicated database
structures into Lucene index. Any JDBC-supported database are
supported.
--
Chris Lu
-
Instant Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Is there anyway to do highlighting when using a WildcardQuery when
there is no IndexReader available? I simply want to do it with a
chunk of text, but it fails because the WildcardQuery needs to call
rewrite - but doesn't know about the IndexReader.
Code (using PyLucene-2.0.0 - can translat
Hi,
I need to index a database with Lucene. Can you suggest where to start
looking for the DB to Lucene parsers? I will need to parse data from
MySQL, MS SQL, Oracle and several other databases. Structure of data
is pretty simple - almost flat tables. I know I can do DB -> XML ->
Lucene but may b
Great work Patrick. I was unfamiliar with the use of TOKEN_MGR_DECLS. Looks
like a powerfull feature for dynamic token selection. Thanks a lot,
- Mark
On 10/13/06, Patrick Turcotte <[EMAIL PROTECTED]> wrote:
Mark Miller wrote:
> Could you say in a few words what you did to accomplish this? I k
Mark Miller wrote:
Could you say in a few words what you did to accomplish this? I know that
you mentioned you used a resource bundle, but what part of the code reads
this resource bundle? What method did you use to get by the JavaCC
issues?
Basically:
* I used TOKEN_MGR_DECLS to decla
I realize my statement of dread may be news to some; here are my references.
QueryParser not handling queries containing AND and OR
http://issues.apache.org/jira/browse/LUCENE-167
Query Parser flags clauses with explicit OR as required when followed by
explicit AND
http://issues.apache.org/jir
Had this page open for somebody else:
http://wiki.apache.org/jakarta-lucene/HowToContribute
Otis
- Original Message
From: Patrick Turcotte <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, October 13, 2006 10:40:27 AM
Subject: QueryParser syntax French Operator : DONE!
"title:Lucene author:Otis^2.0" for example.
You can also call setBoost(float) on the query object (see
http://www.lucenebook.com/search?query=setBoost for some examples).
Otis
- Original Message
From: Vinny Ng <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, October 13
Could you say in a few words what you did to accomplish this? I know that
you mentioned you used a resource bundle, but what part of the code reads
this resource bundle? What method did you use to get by the JavaCC issues?
thanks,
-Mark
On 10/13/06, Patrick Turcotte <[EMAIL PROTECTED]> wrote:
You can just add a field to your indexed docs that always evaluates to a
fixed value. Then you can do queries like: +doc:1 -id:test
karl wettin wrote:
13 okt 2006 kl. 09.59 skrev tony yin:
I wanta search several fields use NOT condition, but how?
for example:
I store "test" in {"id", "name"
On Oct 12, 2006, at 9:25 PM, <[EMAIL PROTECTED]> wrote:
Does the Sort function create some kind of internal cache?
Observing the heap, it seems that a full garbage collection after
calling IndexSearcher.close() still leaves a lot of memory occupied.
Yes, sorting caches, potentially a lot.
On Oct 13, 2006, at 3:42 AM, Antony Bowesman wrote:
I am writing a framework that needs to be able to index documents
from a range of languages where just the character set of the
document is known. Has anyone looked at or is using language
analysis to determine the language of a document
Hi Patrick,
Thanks for the work. Create a bug in JIRA and upload a patch (see
svn diff). See the Wiki for information on how to contribute.
Thanks,
Grant
On Oct 13, 2006, at 10:40 AM, Patrick Turcotte wrote:
Hello!
This may not be the best place for this message, sorry if this is
the
Hi List,
I'd like to have a query consisting of different keywords on different
fields, e.g. "title:Lucene" "author:Otis" with the second part having
boosted value of 2.
Assuming I use the same Analyzer for both parts of the query then how should
i compose my query ?
Thanks a lot.
Ng
Hello!
This may not be the best place for this message, sorry if this is the
case, but since this is the result of a question I asked here, I decided
to post it here. If I'm in error, please refer me to the best procedure.
Thanks!
I've completed the desired "patch". I now have a version of t
Hello Antony,
I have a similar problem. My collection contains mainly German
documents, but some in English and few in French, Spain and Latin. I
know that each language has its own stemming rules.
Language detection is not my domain. But I can imagine it could be
possible to detect the lang
Generally, stemming is not a method for index size reduction even though
that might be a side effect. It is very useful in search however...you would
generally want a search for skiing to also hit ski and skier (I can't spell
so don't get caught up on that). There are lots of those examples...if y
13 okt 2006 kl. 12.32 skrev Cam Bazz:
Is there any difference on the field data format. I like to store
strings, numbers, and dates in fields.
I was storing everything as string. but is there another way,
especially for storing date types?
On a low level everything is stored as terms (read
13 okt 2006 kl. 09.59 skrev tony yin:
I wanta search several fields use NOT condition, but how?
for example:
I store "test" in {"id", "name", "value", ...} fields.
now I search "test" NOT in "id". That's it.
Can anyone help me?
You will not get any matchs looking for just a boolean NOT-claus
An EJB container will generally not give you better performance than a non
EJB container (other than it might be a more efficient...but that will not
be because it is an EJB container). The main difference is that you will be
able to use EJB's and the other Java EE goodies that a J2EE container
pr
This won't be *really* helpful, but I remember this being discussed at some
length a while ago. You'd be able to see some good info if you searched the
list archive, probably for language
I didn't pay much attention since this isn't something I'm concerned with
lately, so I can't be much real hel
On another note...http://famestalker.com
/devwik/ will be done soon...I only
The url gives a not found 404 error here.
Due to a typo on my part:
http://famestalker.com/devwiki/
On 10/13/06, Paul Elschot <[EMAIL PROTECTED]> wrote:
On Friday 13 October 2006 01:55, Mark Miller wrote:
> The
Hello,
Is there any difference on the field data format. I like to store
strings, numbers, and dates in fields.
I was storing everything as string. but is there another way, especially
for storing date types?
Best Regards,
-C.B.
---
I wanta search several fields use NOT condition, but how?
for example:
I store "test" in {"id", "name", "value", ...} fields.
now I search "test" NOT in "id". That's it.
Can anyone help me?
--
Kindly Regards
Tony
===
Hi,
I am a newbie with Lucene and I am working out the best way to index email data.
An earlier poster talked about index attachments with two alternatives:
However, there is a third alternative:
Each message/attachment is indexed as a separate Document with the email header
data included in
Hello,
I'm new to Lucene and wanted some advice on analyzers, stemmers and language
analysis. I've got LIA, so have read it's chapters.
I am writing a framework that needs to be able to index documents from a range
of languages where just the character set of the document is known. Has anyo
On Friday 13 October 2006 01:55, Mark Miller wrote:
> There is also the Surround Query Parser in contrib by the way...I would bet
> that Paul will tell you that it does not have these issues. I can't wait to
Indeed.
> see the replies on this one...I didn't realize that the QueryParser had
> these
35 matches
Mail list logo