On 5/31/05, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Matt Quail wrote:
> > I have wondered about this as well. Are there any *sure fire* ways of
> > creating (and updating) two indices so that doc numbers in one index
> > deliberately correspond to doc numbers in the other index?
>
> If you add t
Hi, Erik,
Thanks for your info.
No, I haven't tried it yet. I will give it a try and maybe produce
some Chinese/English text search demo online.
Currently I used Lucene as the indexing engine for Velocity mailing
list search. I have a demo at www.jhsystems.net.
It is yet another mailing list s
Robert,
I'm very likely going to be using DSpace and some related
technologies from the SIMILE project very soon :)
On May 31, 2005, at 5:08 PM, Tansley, Robert wrote:
Hi all,
The DSpace (www.dspace.org) currently uses Lucene to index metadata
(Dublin Core standard) and extracted full-text
Jian - have you tried Lucene's StandardAnalyzer with Chinese? It
will keep English as-is (removing stop words, lowercasing, and such)
and separate CJK characters into separate tokens also.
Erik
On May 31, 2005, at 5:49 PM, jian chen wrote:
Hi,
Interesting topic. I thought about this
Adding new terms and re-indexing the document is the desired behavior.
One (non-scalable) solution would be to parse the toString of the
termFreqVector (freq {myTermField: red/2, green/1, blue/1}) and create a
new string representation of the expanded terms: (red red green blue)
This obviously
Hi,
Interesting topic. I thought about this as well. I wanted to index
Chinese text with English, i.e., I want to treat the English text
inside Chinese text as English tokens rather than Chinese text tokens.
Right now I think maybe I have to write a special analyzer that takes
the text input, and
Hi all,
The DSpace (www.dspace.org) currently uses Lucene to index metadata
(Dublin Core standard) and extracted full-text content of documents
stored in it. Now the system is being used globally, it needs to
support multi-language indexing.
I've looked through the mailing list archives etc. and
Is your intent to persist the changed vector somehow or just use it in
your application for the immediate search?
TermFreqVector is an interface, so if you aren't persisting, I would
write a wrapper class around the one that is returned by Lucene that has
add/set methods on it for manipulating the
have you tried the suggestion i made regarding FieldCache from the first
thread in which you asked this question?
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200505.mbox/[EMAIL
PROTECTED]
: Date: Tue, 31 May 2005 11:42:46 -0700
: From: Kevin Burton <[EMAIL PROTECTED]>
: Reply-T
Andrew Boyd wrote:
How about using range query?
private Term begin, end;
begin = new Term("dateField",
DateTools.dateToString(Date.valueOf(<"backInTimeStringDate">)));
end = new Term("dateField",
DateTools.dateToString(Date.valueOf(<"farFutureStringDate">)));
Ha.. crap. That won't wor
Lucene rewrites RangeQueries into a BooleanQuery containing a bunch of
OR'd terms. If you have too many terms (dates in your case), you will
run into a TooManyClauses exception. I think the default is about
1024; you can set it with BooleanQuery.setMaxClauseCount().
On 5/31/05, Kevin Burton <[EM
Matt Quail wrote:
I have a similar problem, for which ParallelReader looks like a good
solution -- except for the problem of creating a set of indices with
matching document numbers.
I have wondered about this as well. Are there any *sure fire* ways of
creating (and updating) two indices so
On Monday 30 May 2005 18:54, Andrew Boyd wrote:
> Now that the QueryParser knows about position increments has anyone
> used this to do stemming at query time and not at indexing time? I
> suppose one would need a reverse stemmer. Given the query breath it
> would need to inject breathe, breat
Andrew Boyd wrote:
How about using range query?
private Term begin, end;
begin = new Term("dateField",
DateTools.dateToString(Date.valueOf(<"backInTimeStringDate">)));
end = new Term("dateField",
DateTools.dateToString(Date.valueOf(<"farFutureStringDate">)));
RangeQuery query = new RangeQ
Only way I see to do this is to get a TermEnum for that field, and grab the
first. Then
iterate until you find the last one. This is similar behavior to the
TermEnum.skipTo
method. A better solution would be to record the minimum and maximum dates in
the index
as you index them. Each time yo
How about using range query?
private Term begin, end;
begin = new Term("dateField",
DateTools.dateToString(Date.valueOf(<"backInTimeStringDate">)));
end = new Term("dateField",
DateTools.dateToString(Date.valueOf(<"farFutureStringDate">)));
RangeQuery query = new RangeQuery(begin, end, true)
On May 31, 2005, at 4:06 AM, Paul Libbrecht wrote:
Le 30 mai 05, à 22:13, Doug Hughes a écrit :
Ok, so more than one keyword can be stored in a keyword field.
Interesting!
Yes, yes, yes!! You can do:
doc.add("link","xx");
doc.add("link","yy");
Well, that's not quite correct API, but
Le 30 mai 05, à 22:13, Doug Hughes a écrit :
Ok, so more than one keyword can be stored in a keyword field.
Interesting!
Yes, yes, yes!! You can do:
doc.add("link","xx");
doc.add("link","yy");
and matches will match any of them!
I found this in the book and not in the javadoc and I'd recomm
You'd only need position-increment if using phrase-query...
otherwise... positions are quite much ignored and you can expand the
query with an or.
Eg, I'd do expand the query for breath to:
Term(breath)^2 or (Term(breathes) or Term(breathe) or Term(breathing))
I am not sure you can make a phra
I have an index with a date field. I want to quickly find the minimum
and maximum values in the index.
Is there a quick way to do this? I looked at using TermInfos and
finding the first one but how to I find the last?
I also tried the new sort API and the performance was horrible :-/
Any i
20 matches
Mail list logo