Re: Large index question

2006-10-13 Thread Artem Vasiliev
Hello Scott! I think your index is just not large really. My Sharehound's indexes of my corporate LAN is about 10G/10mlns of (really small) documents now, and queries get really little time, less than a second for non-sorted queries and some more for sorted. The machine is some P4 with 1G RAM. I u

Re: Large index question

2006-10-13 Thread Mark Miller
I recently played around with a 2 million doc index of docs that averaged between 2-10k. The system had 4 gig of ram and a 3 gig dual core proc (not using a parallel searcher to take advantage of the extra core)...pretty beefy, but with 4 times the docs your talking about. I didn't see a query tha

Re: advanced search

2006-10-13 Thread Doron Cohen
Terry Steichen <[EMAIL PROTECTED]> wrote on 13/10/2006 08:01:11: > You can just add a field to your indexed docs that always evaluates to a > fixed value. Then you can do queries like: +doc:1 -id:test Alternatively you can use MatchAllDocsQuery, e.g. BooleanQuery bq = new BooleanQuery();

Re: Large index question

2006-10-13 Thread Artem
Hello Scott! I think your index is just not large really. My Sharehound's indexes of my corporate LAN is about 10G/10mlns of (really small) documents now, and queries get really little time, less than a second for non-sorted queries and some more for sorted. The machine is some P4 with 1G RAM. I u

Re: QueryParser syntax French Operator : DONE!

2006-10-13 Thread Patrick Turcotte
Thanks Mark! I have to mention Benoit Mercier here who worked with me so we could understand how to expand a term and use TOKEN_MGR_DECLS. Patrick On 10/13/06, Mark Miller <[EMAIL PROTECTED]> wrote: Great work Patrick. I was unfamiliar with the use of TOKEN_MGR_DECLS. Looks like a powerfull f

Re: QueryParser syntax French Operator : DONE!

2006-10-13 Thread Patrick Turcotte
Submitted to Jira with key LUCENE-682 Patrick Grant Ingersoll wrote: Hi Patrick, Thanks for the work. Create a bug in JIRA and upload a patch (see svn diff). See the Wiki for information on how to contribute. Thanks, Grant -

Lucene SRW/SRU

2006-10-13 Thread Serhiy Polyakov
Hi, I want to access Lucene index with SRW Web Service and the SRU. I know that Online Computer Library Center has one implementation: http://www.oclc.org/research/announcements/2003-11-07b.htm But it is kind of limited to DSpace digital repository system (I will check more about that). Could yo

Re: DB to Lucene parsers

2006-10-13 Thread Mark Miller
On 10/13/06, Serhiy Polyakov <[EMAIL PROTECTED]> wrote: Hi, I know I can do DB -> XML -> Lucene but may be there are other solutions? There is no need to go from DB -> XML -> Lucene. While you can write an XML Document handler for Lucene (as is done in LIA), it would be just as easy to writ

Re: DB to Lucene parsers

2006-10-13 Thread Chris Lu
You can use DBSight's free version. It'll flatten complicated database structures into Lucene index. Any JDBC-supported database are supported. -- Chris Lu - Instant Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com

highlighting with WildcardQuery

2006-10-13 Thread James O'Rourke
Is there anyway to do highlighting when using a WildcardQuery when there is no IndexReader available? I simply want to do it with a chunk of text, but it fails because the WildcardQuery needs to call rewrite - but doesn't know about the IndexReader. Code (using PyLucene-2.0.0 - can translat

DB to Lucene parsers

2006-10-13 Thread Serhiy Polyakov
Hi, I need to index a database with Lucene. Can you suggest where to start looking for the DB to Lucene parsers? I will need to parse data from MySQL, MS SQL, Oracle and several other databases. Structure of data is pretty simple - almost flat tables. I know I can do DB -> XML -> Lucene but may b

Re: QueryParser syntax French Operator : DONE!

2006-10-13 Thread Mark Miller
Great work Patrick. I was unfamiliar with the use of TOKEN_MGR_DECLS. Looks like a powerfull feature for dynamic token selection. Thanks a lot, - Mark On 10/13/06, Patrick Turcotte <[EMAIL PROTECTED]> wrote: Mark Miller wrote: > Could you say in a few words what you did to accomplish this? I k

Re: QueryParser syntax French Operator : DONE!

2006-10-13 Thread Patrick Turcotte
Mark Miller wrote: Could you say in a few words what you did to accomplish this? I know that you mentioned you used a resource bundle, but what part of the code reads this resource bundle? What method did you use to get by the JavaCC issues? Basically: * I used TOKEN_MGR_DECLS to decla

Re: QueryParser Is Badly Broken

2006-10-13 Thread Renaud Waldura
I realize my statement of dread may be news to some; here are my references. QueryParser not handling queries containing AND and OR http://issues.apache.org/jira/browse/LUCENE-167 Query Parser flags clauses with explicit OR as required when followed by explicit AND http://issues.apache.org/jir

Re: QueryParser syntax French Operator : DONE!

2006-10-13 Thread Otis Gospodnetic
Had this page open for somebody else: http://wiki.apache.org/jakarta-lucene/HowToContribute Otis - Original Message From: Patrick Turcotte <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, October 13, 2006 10:40:27 AM Subject: QueryParser syntax French Operator : DONE!

Re: multiple field query

2006-10-13 Thread Otis Gospodnetic
"title:Lucene author:Otis^2.0" for example. You can also call setBoost(float) on the query object (see http://www.lucenebook.com/search?query=setBoost for some examples). Otis - Original Message From: Vinny Ng <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, October 13

Re: QueryParser syntax French Operator : DONE!

2006-10-13 Thread Mark Miller
Could you say in a few words what you did to accomplish this? I know that you mentioned you used a resource bundle, but what part of the code reads this resource bundle? What method did you use to get by the JavaCC issues? thanks, -Mark On 10/13/06, Patrick Turcotte <[EMAIL PROTECTED]> wrote:

Re: advanced search

2006-10-13 Thread Terry Steichen
You can just add a field to your indexed docs that always evaluates to a fixed value. Then you can do queries like: +doc:1 -id:test karl wettin wrote: 13 okt 2006 kl. 09.59 skrev tony yin: I wanta search several fields use NOT condition, but how? for example: I store "test" in {"id", "name"

Re: Avoiding sort by date

2006-10-13 Thread Erik Hatcher
On Oct 12, 2006, at 9:25 PM, <[EMAIL PROTECTED]> wrote: Does the Sort function create some kind of internal cache? Observing the heap, it seems that a full garbage collection after calling IndexSearcher.close() still leaves a lot of memory occupied. Yes, sorting caches, potentially a lot.

Re: Analyzers and multiple languages

2006-10-13 Thread Erik Hatcher
On Oct 13, 2006, at 3:42 AM, Antony Bowesman wrote: I am writing a framework that needs to be able to index documents from a range of languages where just the character set of the document is known. Has anyone looked at or is using language analysis to determine the language of a document

Re: QueryParser syntax French Operator : DONE!

2006-10-13 Thread Grant Ingersoll
Hi Patrick, Thanks for the work. Create a bug in JIRA and upload a patch (see svn diff). See the Wiki for information on how to contribute. Thanks, Grant On Oct 13, 2006, at 10:40 AM, Patrick Turcotte wrote: Hello! This may not be the best place for this message, sorry if this is the

multiple field query

2006-10-13 Thread Vinny Ng
Hi List, I'd like to have a query consisting of different keywords on different fields, e.g. "title:Lucene" "author:Otis" with the second part having boosted value of 2. Assuming I use the same Analyzer for both parts of the query then how should i compose my query ? Thanks a lot. Ng

QueryParser syntax French Operator : DONE!

2006-10-13 Thread Patrick Turcotte
Hello! This may not be the best place for this message, sorry if this is the case, but since this is the result of a question I asked here, I decided to post it here. If I'm in error, please refer me to the best procedure. Thanks! I've completed the desired "patch". I now have a version of t

Re: Analyzers and multiple languages

2006-10-13 Thread Soeren Pekrul
Hello Antony, I have a similar problem. My collection contains mainly German documents, but some in English and few in French, Spain and Latin. I know that each language has its own stemming rules. Language detection is not my domain. But I can imagine it could be possible to detect the lang

Re: Analyzers and multiple languages

2006-10-13 Thread Mark Miller
Generally, stemming is not a method for index size reduction even though that might be a side effect. It is very useful in search however...you would generally want a search for skiing to also hit ski and skier (I can't spell so don't get caught up on that). There are lots of those examples...if y

Re: lucene field data types

2006-10-13 Thread karl wettin
13 okt 2006 kl. 12.32 skrev Cam Bazz: Is there any difference on the field data format. I like to store strings, numbers, and dates in fields. I was storing everything as string. but is there another way, especially for storing date types? On a low level everything is stored as terms (read

Re: advanced search

2006-10-13 Thread karl wettin
13 okt 2006 kl. 09.59 skrev tony yin: I wanta search several fields use NOT condition, but how? for example: I store "test" in {"id", "name", "value", ...} fields. now I search "test" NOT in "id". That's it. Can anyone help me? You will not get any matchs looking for just a boolean NOT-claus

Re: a design question

2006-10-13 Thread Mark Miller
An EJB container will generally not give you better performance than a non EJB container (other than it might be a more efficient...but that will not be because it is an EJB container). The main difference is that you will be able to use EJB's and the other Java EE goodies that a J2EE container pr

Re: Analyzers and multiple languages

2006-10-13 Thread Erick Erickson
This won't be *really* helpful, but I remember this being discussed at some length a while ago. You'd be able to see some good info if you searched the list archive, probably for language I didn't pay much attention since this isn't something I'm concerned with lately, so I can't be much real hel

Re: QueryParser Is Badly Broken

2006-10-13 Thread Mark Miller
On another note...http://famestalker.com /devwik/ will be done soon...I only The url gives a not found 404 error here. Due to a typo on my part: http://famestalker.com/devwiki/ On 10/13/06, Paul Elschot <[EMAIL PROTECTED]> wrote: On Friday 13 October 2006 01:55, Mark Miller wrote: > The

lucene field data types

2006-10-13 Thread Cam Bazz
Hello, Is there any difference on the field data format. I like to store strings, numbers, and dates in fields. I was storing everything as string. but is there another way, especially for storing date types? Best Regards, -C.B. ---

advanced search

2006-10-13 Thread tony yin
I wanta search several fields use NOT condition, but how? for example: I store "test" in {"id", "name", "value", ...} fields. now I search "test" NOT in "id". That's it. Can anyone help me? -- Kindly Regards Tony ===

Email and attachments

2006-10-13 Thread Antony Bowesman
Hi, I am a newbie with Lucene and I am working out the best way to index email data. An earlier poster talked about index attachments with two alternatives: However, there is a third alternative: Each message/attachment is indexed as a separate Document with the email header data included in

Analyzers and multiple languages

2006-10-13 Thread Antony Bowesman
Hello, I'm new to Lucene and wanted some advice on analyzers, stemmers and language analysis. I've got LIA, so have read it's chapters. I am writing a framework that needs to be able to index documents from a range of languages where just the character set of the document is known. Has anyo

Re: QueryParser Is Badly Broken

2006-10-13 Thread Paul Elschot
On Friday 13 October 2006 01:55, Mark Miller wrote: > There is also the Surround Query Parser in contrib by the way...I would bet > that Paul will tell you that it does not have these issues. I can't wait to Indeed. > see the replies on this one...I didn't realize that the QueryParser had > these