RE: example on RegexQuery

2008-10-24 Thread Agrawal, Aashish (IT)
Hi, I want to use lucene for a simple search engine with regex support . I tried using RegexQuery.. but seems I am missing something. Is there any working exmaple on using RegexQuery ?? thanks Aashish Agrawal NOTICE: If received in err

tag search

2008-10-24 Thread Borja Martín
Hi, I want to index a document that has a field called 'tags' that looks like that : 'foo, foo bar' The comma is the separator for each tag, so I have a tag with the value 'foo' and another one with 'foo bar' What I want to do is to be able to retrieve the documents with certain tag(only one tag p

Re: Combining keyword queries with database-style queries

2008-10-24 Thread Niels Ott
Erick, this RangeQuery thing looks promising. It might be a bit hacky but it will most probably do the job in the given time and framework. Thanks a lot, Niels Erick Erickson schrieb: Well, assuming that token_count is an indexed field in your documents (i.e. not something you're computi

RE: tag search

2008-10-24 Thread Daan de Wit
Hi Borja, Try to add multiple untokenized fields named 'tag', each holding one tag. Regards, Daan > -Original Message- > From: Borja Martín [mailto:[EMAIL PROTECTED] > Sent: vrijdag 24 oktober 2008 12:59 > To: java-user@lucene.apache.org > Subject: tag search > > Hi, > I want to index a

Re: tag search

2008-10-24 Thread Borja Martín
I already tried that but with no success. Here is a snippet of what if I tried: http://pastebin.com/m41f6719a Regards Daan de Wit escribió: Hi Borja, Try to add multiple untokenized fields named 'tag', each holding one tag. Regards, Daan -Original Message- From: Borja Martín [mai

Re: Any Spanish analyzer available?

2008-10-24 Thread Grant Ingersoll
The Snowball stuff supports Spanish. On Oct 23, 2008, at 6:13 PM, Zhang, Lisheng wrote: Hi, Is there any Spanish analyzer available for lucene applications? I did not see any in lucene 2.4.0 contribute folders. Thanks very much for helps, Lisheng -

Re: tag search

2008-10-24 Thread Grant Ingersoll
You either need to write a tokenizer that breaks on comma or you can do as Daan suggested. On Oct 24, 2008, at 6:58 AM, Borja Martín wrote: Hi, I want to index a document that has a field called 'tags' that looks like that : 'foo, foo bar' The comma is the separator for each tag, so I have a

Re: tag search

2008-10-24 Thread Borja Martín
Sorry, as I was trying to do this with the php implementation and thought it was a problem with the query syntax, I sent the message to this list too. But it seems that is the php version lacks from some features. Sorry for the incoveniences. Regards. Daan de Wit escribió: Hi Borja, Try to

Re: Combining keyword queries with database-style queries

2008-10-24 Thread Erick Erickson
Hacky is in the eye of the hacker . It's hard to keep in mind that Lucene is a search engine, not a database, so whenever I find myself thinking in database terms, I'm usually making things difficult. It operates on strings, not the "usual" data types that one thinks are available in programming l

RE: Multi -threaded indexing of large number of PDF documents

2008-10-24 Thread Sudarsan, Sithu D.
Hi Glen, Mike, Grant & Mark Thank you for the quick responses. 1. Yes, I'm looking now at ThreadPoolExecutor. Looking for a sample code to improve the multi-threaded code. 2. We'll try using as many Indexwriters as the number of cores, first (which is 2cpu x 4 core = 8). 3. Yes, PDFBox except

RE: Multi -threaded indexing of large number of PDF documents

2008-10-24 Thread Toke Eskildsen
On Fri, 2008-10-24 at 16:01 +0200, Sudarsan, Sithu D. wrote: > 4. We've tried using larger JVM space by defining -Xms1800m and > -Xmx1800m, but it runs out of memory. Only -Xms1080m and -Xmx1080m seems > stable. That is strange as we have 32 GB of RAM and 34GB swap space. > Typically no other appli

RE: Multi -threaded indexing of large number of PDF documents

2008-10-24 Thread Sudarsan, Sithu D.
There have been some earlier messages, where memory consumption issue for Lucene Documents due to 64 bit (double that of 32 bit). We expect the index to grow very large, and we may end up maintaining more than one with different analyzers for the same data set. Hence we are concerned about the in

performance boost through multithreaded query processing?

2008-10-24 Thread pfaun
Hello, Currently we are facing the problem that some searches espacially fuzzy (term~0.6) wildcard searches (*term*) needs some time depending on the field-searchword combination (the more terms there are the more processing has to be done). We improved the performance through caching the bitse

WG: performance boost through multithreaded query processing?

2008-10-24 Thread pfaun
Hello, Currently we are facing the problem that some searches espacially fuzzy (term~0.6) wildcard searches (*term*) needs some time depending on the field-searchword combination (the more terms there are the more processing has to be done). We improved the performance through caching the bitse

Combining keyword queries with database-style queries

2008-10-24 Thread Niels Ott
Hi everybody, I need to query for documents not only for search terms but also for numeric values (or other general types). Let me try to explain with a hypothetical example. Assuming there is a value for the number words in each document (or the number of person names, or whatever), I would

RE: example on RegexQuery

2008-10-24 Thread Steven A Rowe
Hi Aashish, On 10/24/2008 at 3:35 AM, Agrawal, Aashish (IT) wrote: > I want to use lucene for a simple search engine with regex support . > I tried using RegexQuery.. but seems I am missing something. > Is there any working exmaple on using RegexQuery ?? How about TestRegexQuery?:

Multiple values in field

2008-10-24 Thread agatone
Hello, I know I can store multiple values under same field and I can later retrieve all those values. But the problem I have is a bit structure related. When I'm reading those fields (that usually have more than one value) it happens that it has only one value and I cannot know if that field is m

Re: Combining keyword queries with database-style queries

2008-10-24 Thread Erick Erickson
Is this an inadvertent re-post or is there still something you're wondering about? Erick On Wed, Oct 22, 2008 at 9:14 AM, Niels Ott <[EMAIL PROTECTED]> wrote: > Hi everybody, > > I need to query for documents not only for search terms but also for > numeric values (or other general types). Let m

RE: Multi -threaded indexing of large number of PDF documents

2008-10-24 Thread Toke Eskildsen
Sudarsan, Sithu D. [EMAIL PROTECTED] wrote: > There have been some earlier messages, where memory consumption issue > for Lucene Documents due to 64 bit (double that of 32 bit). All pointers are doubled, yes. While not a doubling in total RAM consumption, it does give a substantial overhead. > We

Re: Multiple values in field

2008-10-24 Thread Erick Erickson
I *think* what you're looking for is Document.getFields(String field), which returns a list corresponding to every Document.add() you did originally. Alternatively, you could always index a companion field that had the count of times you called Document.add() on a particular field. Best Erick

Re: Multi -threaded indexing of large number of PDF documents

2008-10-24 Thread Michael McCandless
Sudarsan, Sithu D. wrote: Hi Glen, Mike, Grant & Mark Thank you for the quick responses. 1. Yes, I'm looking now at ThreadPoolExecutor. Looking for a sample code to improve the multi-threaded code. 2. We'll try using as many Indexwriters as the number of cores, first (which is 2cpu x 4 c

Re: Any Spanish analyzer available?

2008-10-24 Thread Marcelo Ochoa
Zhang: I have done a simple SpanishAnalyzer for Lucene Domain Index test suites which index Spanish WikiPedia dumps. This simple analyzer have a list of stops words and is faster than SnowballAnalyzer which also performs stemming. You can get the code using CVS from SourceForget.net servers or

Re: Multiple values in field

2008-10-24 Thread agatone
That sounds like abuse of Document.add() :) Ok, so adding first one extra "empty" value for every field i wish to mark as multi. Well if that ain't so wrong, I'll use that :) Ty Erick Erickson wrote: > > I *think* what you're looking for is Document.getFields(String field), > which returns

Lucene Input/Output error

2008-10-24 Thread JulieSoko
Hello All, First of all I’m new to Lucene, and have written code using it to search over 1 to man indexes, using a user defined query. I don't have any code on this system so have to type everything in here... I have the following design but am getting An Input / Output error exception which

Re: Multiple values in field

2008-10-24 Thread Erick Erickson
No, no, no... Say you have the following Document doc = new Document() doc.add("field1", "stuff", blah, blah) doc.add("field1", "more stuff", blah, blah) doc.add("field1", "stuff and nonsense", blah, blah) IndexWriter.addDocument(doc) Now, in your search code that document comes up as a hit an

Possible payload bug in lucene

2008-10-24 Thread Fatih Emekci
Hi all, I am getting the below exception when try to read the payload data: [java] java.lang.NullPointerException [java] at org.apache.lucene.index.MultiSegmentReader$MultiTermPositions.nextPosition(MultiSegmentReader.java:631) However, if I optimize the index before reading the payloa