: Isn't RegexQuery slower than '???' at the end of a
: word?
I've nevered used RegexQuery but a quick glance at the regex javadocs
indicates that some "RegexCapabilities" can optimize the cases with a
fixed prefix, and JakartaRegexpCapabilities is one of those cases ... so
if you construct a Rege
Have a look at http://www.gossamer-threads.com/lists/lucene/java-dev/
38880?search_string=compression;#38880
The upshot is that you should compress the data yourself and then
store it as a binary field (Field Constructor: public Field(String
name, byte[] value, Store store) ). This way yo
Well, what you're really doing, in your example, is searching
on all the terms that start with cat and are less than 7 characters
long.
So it seems to me that you can pick out terms yourself and assemble
your own bit OR clause rather than rely on Lucene's old behavior.
By that, I mean use a Wild
Yes, that's actually come up. The document ids are indeed changing which is
causing problems. I'm still trying to work it out myself, but any help
would most definitely be appreciated.
Thanks,
Hilton Campbell
-Original Message-
From: Antony Bowesman [mailto:[EMAIL PROTECTED]
Sent: Wedn
Hi,
I'm encountering this error and not sure why this is happening:
java.io.FileNotFoundException: /index/book/_19b87.tis (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:212)
Have you read the following article at Lucene FAQ?
Why am I getting an IOException that says "Too many open files"?
http://wiki.apache.org/lucene-java/LuceneFAQ#head-48921635adf2c968f7936dc07d51dfb40d638b82
Thank you,
Koji
moraleslos wrote:
Hi,
I'm encountering this error and not sure why th
Hello
I've asked before on this issue, and I think I have more information
now.
I have in a lucene 1.4 index, some Field.Text fields stored.I've
been focusing on the one called "name"
In luke 0.7 , run on the command line from a jar, if I do a search for
Name:"np-pandock*" I ge
John Powers wrote:
Np-pandock
Np-pandock-1
Np-pandock-2
Np-pandock-L
Np-pandock-L1
Np-pandock-L2
I'm not positive, but I think StandardAnalyzer splits this input at the
hyphens. That is, it gives the terms "Np", "pandock", "1", "2", "L",
"L1", and "L2", but NOT "Np-pandoc", etc.
--MD
: I need to store all the attributes of the document i index as part of the
: index. And I need to get the size of the files as close to 20% of the
: original size as possible. If anyone can help with this I can pay a nominal
: fee. Please contact me if anyone can help.
Let's be clear about somet
Michael D. Curtin wrote:
> > Np-pandock-L1
> > Np-pandock-L2
>
> I'm not positive, but I think StandardAnalyzer splits this input at the
> hyphens. That is, it gives the terms "Np", "pandock", "1", "2", "L",
> "L1", and "L2", but NOT "Np-pandoc", etc.
I think it splits by hyphens unless the no-h
Doron Cohen wrote:
I think it splits by hyphens unless the no-hyphen
part has digits, so:
np-pandock-a7
becomes
np
pandock-a7
This is for the indexing part.
Wow! Do you know the thinking behind that, i.e. why a number in a
hyphenated expression prevents the split?
--MDC
"Michael D. Curtin" <[EMAIL PROTECTED]> wrote on 07/06/2007 13:30:28:
> > I think it splits by hyphens unless the no-hyphen
> > part has digits, so:
> > np-pandock-a7
> > becomes
> > np
> > pandock-a7
> > This is for the indexing part.
>
> Wow! Do you know the thinking behind that, i.e. why
http://myhardshadow.com/qsol.php
Qsol 1.0 has been released. Qsol is my very customizable query parser
i.e. customizable syntax, order of operations, etc.
A handful of the features:
1.Proximity Operators in the search syntax
2.Paragraph/Sentence proximity searching
3.FieldBreaker for proxim
Hi,
I'm new to Lucene...very new. I'd like to use Lucene to index a MySQL database
(six tables, actually), and then use it to search the database in lieu of using
SQL. I can't seem to find any sample code to do this, so I was hoping that
someone could share some or point me in the right direc
Doron Cohen wrote:
From the StandardAnalyzer javacc grammar :
// floating point, serial, model numbers, ip addresses, etc.
// every other segment must have at least one digit
etc.
<#P: ("_"|"-"|"/"|"."|",") >
My understanding of this: a non-whitespace sequence is broken
at eithe
I think DBSight can be a great learning tool for Lucene. You can just
use the web UI to configure for all your tables and flatten objects
into Lucene's documents.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo:
Actually, my mind kind of overloaded when I read the following from
the (2.1) javadoc
- Splits words at punctuation characters, removing punctuation.
However, a dot that's not followed by whitespace is considered part of a
token.
- Splits words at hyphens, unless there's a number in
Understand that Lucene is an indexing engine. "out of the box", there's
no understanding of databases etc. built in. But as Chris Lu points out,
there are applications out there that do this for you.
If you try to roll your own, you'll have to write some code that
queries the database, and us
Hoss,
The KeywordTokenizer and LowerCaseFilter worked great and was exactly
what I needed.
Thanks!
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 06, 2007 11:25 AM
To: java-user@lucene.apache.org
Subject: Re: Case Insensitive but not Tokenized
Hi,
I would like to implement an AJAX search. Basically when user types in
several characters, I will try to search the Lucene index and found
all possible matching items.
Seems I need to use wildcard query like "test*" to matching anything.
Is this the only way to do it? It doesn't seems quite
Calling all Lucene Users!
You know you love Lucene for a whole variety of reasons (fast,
friendly, fun, did I say fast?) so how about showing a little love
back? :-)
We (as in the committers and contributors) are trying out a new
release mechanism whereby we are implementing a code freez
Check out
http://www.brandspankingnew.net/specials/ajax_autosuggest/ajax_autosugge
st_autocomplete.html
It takes an XML response as input (which could be backed by lucene).
I have implemented this and it works pretty fast, although I do have a
small dataset.
-Anna
-Original Message-
F
22 matches
Mail list logo