Re: I need 'cat???' to match 'cat' again!

2007-06-07 Thread Chris Hostetter
: Isn't RegexQuery slower than '???' at the end of a : word? I've nevered used RegexQuery but a quick glance at the regex javadocs indicates that some "RegexCapabilities" can optimize the cases with a fixed prefix, and JakartaRegexpCapabilities is one of those cases ... so if you construct a Rege

Re: Need Lucene Compression help -- can pay nominal fee

2007-06-07 Thread Grant Ingersoll
Have a look at http://www.gossamer-threads.com/lists/lucene/java-dev/ 38880?search_string=compression;#38880 The upshot is that you should compress the data yourself and then store it as a binary field (Field Constructor: public Field(String name, byte[] value, Store store) ). This way yo

Re: I need 'cat???' to match 'cat' again!

2007-06-07 Thread Erick Erickson
Well, what you're really doing, in your example, is searching on all the terms that start with cat and are less than 7 characters long. So it seems to me that you can pick out terms yourself and assemble your own bit OR clause rather than rely on Lucene's old behavior. By that, I mean use a Wild

RE: How can I search over all documents NOT in a certain subset?

2007-06-07 Thread Hilton Campbell
Yes, that's actually come up. The document ids are indeed changing which is causing problems. I'm still trying to work it out myself, but any help would most definitely be appreciated. Thanks, Hilton Campbell -Original Message- From: Antony Bowesman [mailto:[EMAIL PROTECTED] Sent: Wedn

FNFE on the index

2007-06-07 Thread moraleslos
Hi, I'm encountering this error and not sure why this is happening: java.io.FileNotFoundException: /index/book/_19b87.tis (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:212)

Re: FNFE on the index

2007-06-07 Thread Koji Sekiguchi
Have you read the following article at Lucene FAQ? Why am I getting an IOException that says "Too many open files"? http://wiki.apache.org/lucene-java/LuceneFAQ#head-48921635adf2c968f7936dc07d51dfb40d638b82 Thank you, Koji moraleslos wrote: Hi, I'm encountering this error and not sure why th

np-pandock search problem (again, with more detail)

2007-06-07 Thread John Powers
Hello I've asked before on this issue, and I think I have more information now. I have in a lucene 1.4 index, some Field.Text fields stored.I've been focusing on the one called "name" In luke 0.7 , run on the command line from a jar, if I do a search for Name:"np-pandock*" I ge

Re: np-pandock search problem (again, with more detail)

2007-06-07 Thread Michael D. Curtin
John Powers wrote: Np-pandock Np-pandock-1 Np-pandock-2 Np-pandock-L Np-pandock-L1 Np-pandock-L2 I'm not positive, but I think StandardAnalyzer splits this input at the hyphens. That is, it gives the terms "Np", "pandock", "1", "2", "L", "L1", and "L2", but NOT "Np-pandoc", etc. --MD

Re: Need Lucene Compression help -- can pay nominal fee

2007-06-07 Thread Chris Hostetter
: I need to store all the attributes of the document i index as part of the : index. And I need to get the size of the files as close to 20% of the : original size as possible. If anyone can help with this I can pay a nominal : fee. Please contact me if anyone can help. Let's be clear about somet

Re: np-pandock search problem (again, with more detail)

2007-06-07 Thread Doron Cohen
Michael D. Curtin wrote: > > Np-pandock-L1 > > Np-pandock-L2 > > I'm not positive, but I think StandardAnalyzer splits this input at the > hyphens. That is, it gives the terms "Np", "pandock", "1", "2", "L", > "L1", and "L2", but NOT "Np-pandoc", etc. I think it splits by hyphens unless the no-h

Re: np-pandock search problem (again, with more detail)

2007-06-07 Thread Michael D. Curtin
Doron Cohen wrote: I think it splits by hyphens unless the no-hyphen part has digits, so: np-pandock-a7 becomes np pandock-a7 This is for the indexing part. Wow! Do you know the thinking behind that, i.e. why a number in a hyphenated expression prevents the split? --MDC

Re: np-pandock search problem (again, with more detail)

2007-06-07 Thread Doron Cohen
"Michael D. Curtin" <[EMAIL PROTECTED]> wrote on 07/06/2007 13:30:28: > > I think it splits by hyphens unless the no-hyphen > > part has digits, so: > > np-pandock-a7 > > becomes > > np > > pandock-a7 > > This is for the indexing part. > > Wow! Do you know the thinking behind that, i.e. why

[ANN] Qsol 1.0 - First version of my proximity/precedence/customizable Query parser

2007-06-07 Thread Mark Miller
http://myhardshadow.com/qsol.php Qsol 1.0 has been released. Qsol is my very customizable query parser i.e. customizable syntax, order of operations, etc. A handful of the features: 1.Proximity Operators in the search syntax 2.Paragraph/Sentence proximity searching 3.FieldBreaker for proxim

Lucene & MySql

2007-06-07 Thread Lindsey Hess
Hi, I'm new to Lucene...very new. I'd like to use Lucene to index a MySQL database (six tables, actually), and then use it to search the database in lieu of using SQL. I can't seem to find any sample code to do this, so I was hoping that someone could share some or point me in the right direc

Re: np-pandock search problem (again, with more detail)

2007-06-07 Thread Michael D. Curtin
Doron Cohen wrote: From the StandardAnalyzer javacc grammar : // floating point, serial, model numbers, ip addresses, etc. // every other segment must have at least one digit etc. <#P: ("_"|"-"|"/"|"."|",") > My understanding of this: a non-whitespace sequence is broken at eithe

Re: Lucene & MySq

2007-06-07 Thread Chris Lu
I think DBSight can be a great learning tool for Lucene. You can just use the web UI to configure for all your tables and flatten objects into Lucene's documents. -- Chris Lu - Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo:

Re: np-pandock search problem (again, with more detail)

2007-06-07 Thread Erick Erickson
Actually, my mind kind of overloaded when I read the following from the (2.1) javadoc - Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token. - Splits words at hyphens, unless there's a number in

Re: Lucene & MySq

2007-06-07 Thread Erick Erickson
Understand that Lucene is an indexing engine. "out of the box", there's no understanding of databases etc. built in. But as Chris Lu points out, there are applications out there that do this for you. If you try to roll your own, you'll have to write some code that queries the database, and us

RE: Case Insensitive but not Tokenized

2007-06-07 Thread Anna Putnam
Hoss, The KeywordTokenizer and LowerCaseFilter worked great and was exactly what I needed. Thanks! -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 06, 2007 11:25 AM To: java-user@lucene.apache.org Subject: Re: Case Insensitive but not Tokenized

How to implement AJAX search~Lucene Search part?

2007-06-07 Thread Chris Lu
Hi, I would like to implement an AJAX search. Basically when user types in several characters, I will try to search the Lucene index and found all possible matching items. Seems I need to use wildcard query like "test*" to matching anything. Is this the only way to do it? It doesn't seems quite

Documentation Promotion is in Motion!

2007-06-07 Thread Grant Ingersoll
Calling all Lucene Users! You know you love Lucene for a whole variety of reasons (fast, friendly, fun, did I say fast?) so how about showing a little love back? :-) We (as in the committers and contributors) are trying out a new release mechanism whereby we are implementing a code freez

RE: How to implement AJAX search~Lucene Search part?

2007-06-07 Thread Anna Putnam
Check out http://www.brandspankingnew.net/specials/ajax_autosuggest/ajax_autosugge st_autocomplete.html It takes an XML response as input (which could be backed by lucene). I have implemented this and it works pretty fast, although I do have a small dataset. -Anna -Original Message- F