I just did that so I could read it. :) I'll leave it up until Glen resends
or posts it somewhere...
http://www.casscostello.com/?page_id=28
On Tue, Apr 15, 2008 at 5:18 PM, Ian Holsman <[EMAIL PROTECTED]> wrote:
> Hi Glen.
> can you resend this in plain text?
> or put the HTML up on a server s
Hi Glen.
can you resend this in plain text?
or put the HTML up on a server somewhere and point to it with a brief
summary in the post?
I'd love to look and read it, all those tags are making me go blind.
Glen Newton wrote:
Hardware Environment
Dedicated machine for indexing: yes
CPU: D
Characters or "terms"? (And btw: what's the difference?) The javadoc says
10,000 "terms", which I assume generally equates to "words" (and given that the
analyzer might use stemming, stop words, etc.).
Great info. Thanks again!
-AJ
- Original Message -
From: Erick Erickson
To
Well, "my way" would certainly be simpler to read six months from
now when you look at this code again
And I'm quite sure you can add the same field multiple times, so
whatever you want.
Do note, though, that Lucene defaults to 10,000 characters in any
single field no matter which way yo
The index should be identical in these two cases as long as the
single string yields the same tokens during analysis as the
concatenation of the tokens from the separate strings.
So index size & search speed would be the same.
Mike
Darren Govoni wrote:
I guess I meant searching the index
On Dienstag, 15. April 2008, palexv wrote:
> I have not tokenized phrases in index.
> What query should I use?
> Simple TermQuery does not work.
Probably PhraseQuery with an argument like "java dev" (no asterisk).
> If I try to use QueryParser , what analyzer should I use?
Probably KeywordAnaly
I ended up doing this:
String docText = doc.get("body");
Field fCurAll = doc.getField("all");
if ((fCurAll != null) && (docText != null)) {
String newAll = fCurAll.stringValue() + "
I guess I meant searching the index, size of index etc.
So they would search essentially the same?
Sorry that wasn't clear from my original email.
Darren
- Original Message -
From: "Erick Erickson" <[EMAIL PROTECTED]>
To:
Sent: Tuesday, April 15, 2008 1:15 PM
Subject: Re: Which will
Test
--
-
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
I wouldn't worry about it too much, since there'll be overhead for you
building up the string in the first place as well. I suspect that the
time difference will be dwarfed by the indexing process. So I'd do what's
easiest first...
Erick
On Tue, Apr 15, 2008 at 10:51 AM, darren <[EMAIL PROTEC
You can freely add the same field (with different text) to a doc. For
instance
Document doc = new Document();
doc.add("field", "this is the first");
doc.add("field", "starting the second ");
IndexWriter.addDocument(doc)
is functionally the same as
Document doc = new Document();
doc.add("field",
Hardware Environment
Dedicated machine for indexing: yes
CPU: Dual processor dual core Xeon CPU 3.00GHz;
hyperthreading ON for 8 virtual cores
RAM: 8GB
Drive configuration: Dell EMC AX150 storage array fibre
channel
Software environment
Lucene Version: 2.3.1
Java Version: Java(TM)
Hardware Environment
Dedicated machine for indexing: yes
CPU: Dual processor dual core Xeon CPU 3.00GHz;
hyperthreading ON for 8 virtual cores
RAM: 8GB
Drive configuration: Dell EMC AX150 storage array fibre
channel
Software environment
Lucene Version: 2.3.1
Java Versio
Most likely B will be somewhat faster.
There is some small overhead to each field instance.
Mike
darren wrote:
Hi,
Pardon the noob question. But which approach is going to be faster
over extremely large document sets. A or B?
A) Multiple field values, Stored.NO,TOKENIZED.
word: one
word: t
Hi,
Pardon the noob question. But which approach is going to be faster
over extremely large document sets. A or B?
A) Multiple field values, Stored.NO,TOKENIZED.
word: one
word: two
word: three
B) Single field value, Stored.NO,TOKENIZED
word: one two three
Thanks for the tip.
Darren
В сообщении от Sunday 13 April 2008 14:20:01 Grant Ingersoll написал(а):
Thanks for your reply!
> > I don't want it to work more than half second on
> > reasonable sized index. Also, I don't want to hard-code exact list
> > of fields,
> > I might add them as I develop the system. Is this doable,
I'm curious how people are building the "all" Field (for searching "all of the
terms at once").
I understand using store=NO, Index=Tokenized is generally the way to add the
field, but what if I need to basically use multiple classes to build my
Document before adding it to the index (keeping th
The default is 10,000 characters, but, as Grant says, you can change it with
IndexWriter.setMaxFieldLength().
Erick
On Tue, Apr 15, 2008 at 6:31 AM, WATHELET Thomas
<[EMAIL PROTECTED]> wrote:
> Hi my question is very simple,
> Is there a size limitation for the text to index
> Becaus I try to i
It would help a lot if you provided a couple of examples of inputs into your
index and expected outputs for queries.
For instance, you say:
<<>>
But then in your follow-up you say
<<>>
Well, if you haven't tokenized your input streams at index time and
query time, you can't get what your first s
On IndexWriter, have a look at the setMaxFieldLength() method.
On Apr 15, 2008, at 6:31 AM, WATHELET Thomas wrote:
Hi my question is very simple,
Is there a size limitation for the text to index
Becaus I try to index a long document and the content of this one is
stored correctly into the in
Hi my question is very simple,
Is there a size limitation for the text to index
Becaus I try to index a long document and the content of this one is
stored correctly into the index but it seems that the indexation stopp
at the middle of the document.I can't find any word located after the
middle.
A
Wojtek H wrote:
>Snowball stemmers are part of Lucene, but for few languages only
>But maybe there is a better way or there are people working on
>something like that?
I use Malaga (http://home.arcor.de/bjoern-beutel/malaga/)
for lemmatization and index the result.
http://joyds1.joensuu.fi/progra
What do you mean by "that's true"? That lucene does read all data
available in the index for this field into memory? In this case index
sharding should help, right?
On Sun, 13 Apr 2008, Otis Gospodnetic wrote:
Date: Sun, 13 Apr 2008 20:25:09 -0700 (PDT)
From: Otis Gospodnetic <[EMAIL PROTECTED
23 matches
Mail list logo