On Sun, 2006-05-21 at 04:46 +0200, karl wettin wrote:
> Do I have any alternatives?
>
>
> What I really want is:
> {
> for (Classification c : myDocument) {
> doc.add(new Field(c.getFieldName(), c.tokenStreamFactory()...
> }
> indexWriter.add(doc, perFieldsAnalyzer);
> }
Patch now in J
I did it like below, but used the lucene score instead.
Will report back with results in a month or so.
On Thu, 2006-05-25 at 11:51 +0200, karl wettin wrote:
> Did anyone write some neat tool for statistical analysis of hits over
> time? I need one. And it must be fast. Was thinking something lik
There is nothing special in lucene to help you do this ... it would have
to be done in your own code.
: If I have user info in 2 different sources (index)and want to search for
: fields on both, but the search should
: join the resulting records using a common field (user id for example). Is
: th
: How easy is to add new fields to the documents in the index?
: Suppose that today I can search for book title and decide that including the
: author in the search would be a good idea. How easy is to do that with
: lucene?
very. whenevery you add a document, you specify what fields that docume
On Fri, 2006-05-26 at 17:50 -0300, Leandro Saad wrote:
> Hi all. I'm very new to lucene. All I have done is read some docs about how
> it works, which brings to the question:
>
> How easy is to add new fields to the documents in the index?
> Suppose that today I can search for book title and decid
: Thks for the reply, ut I don't know how to do this change in
: SOLatin1AccentFilter.
: Can you give me some advice in this action?
I've never really looked at the internals of ISOLatin1AccentFilter, but
the basic idea is to subclass it with a new TokenFilter that maintains a
one token "buffer"
Luke shows the total index size the same, and yes, it appears to list all
the files. There are 997 of them which are tough to count using that
interface with Cygwin/X.
> Also, you may want to see if you have any stale locks or the like that is
preventing you from doing an optimize.
No lock files,
My second question is: can I join the results os multiple indexes using a
common field?
If I have user info in 2 different sources (index)and want to search for
fields on both, but the search should
join the resulting records using a common field (user id for example). Is
this possible?
--
Leandr
Hi all. I'm very new to lucene. All I have done is read some docs about how
it works, which brings to the question:
How easy is to add new fields to the documents in the index?
Suppose that today I can search for book title and decide that including the
author in the search would be a good idea.
On Freitag 26 Mai 2006 17:46, Mike Richmond wrote:
> I am then storing this in a stored, untokenized field named "date".
From the API docs:
The field must be indexed, but should not be tokenized, and does not need
to be stored (unless you happen to want it back with the rest of your
document d
It kind of sounds like those files are corrupted, but I can't say for
sure. When you look in Luke at your index (the one with all the files,
not the new one) do you see all the documents you would expect to see
with values that seem reasonable? Also, in Luke, you can see a listing
of all the
Indexing 55648 documents in a new clean directory, I see only .cfs files (+
deletable + segments). Disk usage is 65K for all of these, which means that
each message takes ~1K of index space rather than > 10K as it does in my
99GB index.
Bearing in mind that the large index has > 5 million Lucene
I'm running out the door, so only a quick reply... yes you can. Look
at the subSearcher(?) method - that'll give you the index. Your
application will need to keep track of which indexes correspond to
which indices. Check the archives for the answer too, sorry for the
short reply.
> Note that IndexReader has a main() that will list the contents of compound
index files.
It looks like some of my index is compound and some isn't. My not very well
informed guess is that an optimize() got interrupted somewhere along the
line.
If I try to optimize the index now, it throws except
I just tried to optimise my index, using the lucli command line client, and
got:
8<
lucli> optimize
Starting to optimize index.
java.io.IOException: Cannot overwrite:
/mnt/sdb1/lucene-index/index-1/_2lhqi.fnm
at
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.j
On Friday 26 May 2006 19:13, Ken Krugler wrote:
> >On Friday 26 May 2006 16:14, Michael Chan wrote:
> >> Hi,
> >>
> > > I have a 5gb index containing 2mil documents and am trying to run
> >> 1mil+ queries against it. Most of the queries are SpanQueries and it
> >> occurs to me that the search p
When using a MultiSearcher Is there anyway to get the name of the
index that a hit came from? One way would be to add the index name as
a field to each document, but I am hoping to avoid this.
Thanks,
Mike
-
To unsubscribe, e
Rob Staveley (Tom) wrote:
Is there a tool I can use to see how much of the index is occupied by the
different fields I am indexing?
Note that IndexReader has a main() that will list the contents of
compound index files.
Doug
--
Interesting. I am explicitly turning on the compound file format when I
start my application, but I am suspicious about my optimizing thread. It
*ought* to be optimising every 30 minutes, using thread synchronisation to
prevent the writer from trying to write while optimisation takes place, but
it
It seems odd to me that if you are using the CFS format, why you would
have the .fdt, .frq and .prx files in addition to the .cfs files. My
understanding is all files (except deletable and segment) get put inside
of the CFS file. Looking at my indices, I only have the CFS file. Are
you optim
That's a really good idea, but I've got a total of 38 fields only. It is
true that some of them are empty, but that can't account for the bulk.
-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: 26 May 2006 17:50
To: java-user@lucene.apache.org
Subject: RE: Seeing wh
On Friday 26 May 2006 16:14, Michael Chan wrote:
Hi,
> I have a 5gb index containing 2mil documents and am trying to run
1mil+ queries against it. Most of the queries are SpanQueries and it
occurs to me that the search performance is quite slow when using 2, 3
SpanOrQueries nested inside
> Is there anything I can learn from the index directory's file listing?
Running this nasty little BASH one-liner...
$ for i in `ls * | perl -nle 'if (/^.+(\..+)/) {print $1;}' | sort |
uniq`;do ls -l *$i | awk '{SUM = SUM + $5} END {if (SUM > 1e10) {print
"'$i': ", SUM}}'; done
... I see
: PS: I am a newbie to the mailing list - I hope I've got the etiquette right
you may have figured this out already, but please CC email to
multiple lucene mailing lists -- in this particular case,
[EMAIL PROTECTED] is just a legacy alias that points at [EMAIL PROTECTED] -- so
there's *really* no
are you by any chance using different field names for each document -- or
do you have a wide range of field names that aren't the same for each
document? ... you mentioned indexing emails, email has a very loose header
structure that allows MTAs to add arbitrary "X" headers, are you
converting eve
On Friday 26 May 2006 16:14, Michael Chan wrote:
> Hi,
>
> I have a 5gb index containing 2mil documents and am trying to run
> 1mil+ queries against it. Most of the queries are SpanQueries and it
> occurs to me that the search performance is quite slow when using 2, 3
> SpanOrQueries nested inside
I'm running into similar sort issues when I try to sort my results on
a date field that was created using the DateTools class as follows:
DateTools.dateToString(dateObj, DateTools.Resolution.SECOND);
I am then storing this in a stored, untokenized field named "date".
When I sort the results by d
Hi Edgar,
Are there any technical reports explaining your design and
implementation of LM on Lucene? Or what source files are exactly "LM
extension"?
--
Best regards,
Charlie
---
Friday, May 26, 2006, 7:36:14 AM, you wrote:
> Hi Edgar,
> While doing the integration/updating for Lucene 1.9, c
Dennis Kubes wrote:
The server is headless (i.e. no X-Windows). I've tried lucli, but that
doesn't have Luke's whistles and bells. Does Luke have a non-GUI equivalent,
Grant?
You can tunnel your X session through ssh. If that's not possible, AND
you are familiar with Lucene API, then you ca
> I can't see how Luke is going to show me what's occupying most of my
index.
I do however notice that none of my stored fields are stored compressed.
Presumably Field.Store COMPRESS is something that is new in Lucene 1.9 and
wasn't available in 1.4.3?? However, it is still hard to see what's c
Hi,
I have a 5gb index containing 2mil documents and am trying to run
1mil+ queries against it. Most of the queries are SpanQueries and it
occurs to me that the search performance is quite slow when using 2, 3
SpanOrQueries nested inside a SpanNearQuery, which in turn is nested
inside another Spa
Luke is working nicely with a XWin32 demo server, I just downloaded from
StarNet, with a bit of SSH tunnelling :-) [I couldn't immediately figure
out how to do it with Cygwin/X.]
However, I can't see how Luke is going to show me what's occupying most of
my index.
-Original Message-
From
Or you can use ssh -X for X11 forwarding. I don't know how it's working
in windows (some x client app) but great on linux(es) with huge bandwidth.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EM
I don't believe it does. Is there anyway you can mount the drive where
the index lives? Can you copy the index to someplace that allows you to
run Luke?
Otherwise, you could write a simple standalone program that dumps the
terms and their freqs from the command line. I don't think it would
Hi Edgar,
While doing the integration/updating for Lucene 1.9, could you be more
open and clear about the design so that people can
1)Understand it,
2)Extend it,
Just an recommendation.
Cheers,
Murat
Edgar Meij wrote:
Hi Ganesh,
We have developed a Language Modeling extension to Lucene at
The server is headless (i.e. no X-Windows). I've tried lucli, but that
doesn't have Luke's whistles and bells. Does Luke have a non-GUI equivalent,
Grant?
-Original Message-
From: Grant Ingersoll [mailto:[EMAIL PROTECTED]
Sent: 26 May 2006 12:41
To: java-user@lucene.apache.org
Subject: Re
Give Luke a try. Google for "Luke Lucene" and you should find it.
Otherwise check the Lucene website for a reference.
Rob Staveley (Tom) wrote:
In my index of e-mail message parts, it looks like 23K is being used up for
each indexed message part, which is way more than I'd expect.
I have a
In my index of e-mail message parts, it looks like 23K is being used up for
each indexed message part, which is way more than I'd expect.
I have a total of 37 fields per message part.
I tokenize, index and do not store message part bodies.
I store a <= 300 character synopsis of each message part.
Hi Ganesh,
We have developed a Language Modeling extension to Lucene at the
University of Amsterdam. It can be found here:
http://ilps.science.uva.nl/Resources/#lm-lucen
It was build around Lucene 1.4.3, so it isn't source compatible with
the latest Lucene version. We are currently working on
i
I am indexing e-mail in a compound index and for e-mail which is stored in
~60G (in Bzip2 compressed form), I have an index which is now 80G.
Is there a tool I can use to see how much of the index is occupied by the
different fields I am indexing?
PS: I am a newbie to the mailing list - I hope I'
Thks for the reply, ut I don't know how to do this change in
SOLatin1AccentFilter.
Can you give me some advice in this action?
2006/5/25, Chris Hostetter <[EMAIL PROTECTED]>:
I think I'm missing something here. the whole point of the
ISOLatin1AccentFilter is to replace accented characters wit
41 matches
Mail list logo