Indeed--this is why the associated parameter is called
maxAnalyzedChars in Solr.
-Mike
On 14-Jan-08, at 2:33 PM, Mark Miller wrote:
I think your right, and thats not the only place...the whole
handling of maxDocBytesToAnalyze in the main Highlighter class
shares this issue. I guess the id
On 7-Jan-08, at 11:49 PM, Lukas Vlcek wrote:
This would be great!
I am particularly interested how they are going about customized
search (if
they have a plan to do it). I mean if they can reorder raw search
results
based on some kind of collective knowledge (which is probably kept
outsid
On 17-Dec-07, at 11:39 AM, Beyer,Nathan wrote:
Would using Field.Index.UN_TOKENIZED be the same as tokenizing a field
into one token?
Indeed.
-Mike
-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Monday, December 17, 2007 12:53 PM
To: java-user
ext (class name)
- "org.apache.lucene.document.Document"
Queries that would match
- "org.apache", "org.apache.lucene.document"
Queries that DO NOT match
- "apache", "lucene", "document"
-Nathan
-Original Message-
From: Mike Klaas [mailto:[EMAIL PROTECTED]
Sent: Mon
On 15-Dec-07, at 3:14 PM, Beyer,Nathan wrote:
I have a few fields that use package names and class names and I've
been
looking for some suggestions for analyzing these fields.
A few examples -
Text (class name)
- "org.apache.lucene.document.Document"
Queries that would match
- "org.apache" ,
On 13-Dec-07, at 3:26 PM, Tobias Rothe wrote:
I got a quick question. I am handling hughe CSV files. They start
with a key in the first column and are followed by data.
I need to retrieve randomly this data based on the key. So it is
kind of a search where I give a unique key and ideally ac
On 22-Nov-07, at 8:49 AM, Nicolas Lalevée wrote:
Le jeudi 22 novembre 2007, Matthijs Bierman a écrit :
Hi Nicolas,
Why can't you extend the QueryParser and override the methods you
want
to modify?
Because the query parser I would like to have is a very basic user
one, ala
google. The s
On 6-Nov-07, at 3:02 PM, Paul Elschot wrote:
On Tuesday 06 November 2007 23:14:01 Mike Klaas wrote:
Wait--shouldn't the outer-most BooleanQuery provide most of this
speedup already (since it should be skipTo'ing between the nested
BooleanQueries and the outermost). Is it the indir
On 29-Oct-07, at 9:43 AM, Paul Elschot wrote:
On Friday 26 October 2007 09:36:58 Ard Schrijvers wrote:
+prop1:a +prop2:b +prop3:c +prop4:d +prop5:e
is much faster than
(+(+(+(+prop1:a +prop2:b) +prop3:c) +prop4:d) +prop5:e)
where the second one is a result from BooleanQuery in
BooleanQuery
On 5-Oct-07, at 11:27 AM, Chris Hostetter wrote:
that's what i thought first too, and it is a problem i'd eventaully
like
to tackle ... it was the part about "c" being in a differnet field
from
"a" and "b" that confused me ... i don't know what that exactly is
being
suggested here.
I'm
On 5-Oct-07, at 10:54 AM, Chris Hostetter wrote:
: I am using a hand rolled query of the following form (implemented
with
: SpanNearQuery, not a sloppy PhraseQuery):
: a b c => +(a AND b AND c) OR "a b"~5 OR "b c"~5
:
: The obvious solution, "a b c"~5, is not applicable for my issues,
becaus
On 2-Oct-07, at 3:44 PM, Peter Keegan wrote:
I have been experimenting with payloads and BoostingTermQuery,
which I think
are excellent additions to Lucene core. Currently,
BoostingTermQuery extends
SpanQuery. I would suggest changing this class to extend TermQuery and
refactor the current v
On 13-Sep-07, at 12:37 PM, Dan Luria wrote:
What I do is
Doc1 = source_doc
Doc2 = new Document()
foreach (field f in doc1.getfields) {
Doc2.Add(new Field(doc1.getField(key), doc1.getField(value));
}
but when i pull the fields from Doc1, i never get the tokenized
field..
it just doesnt appea
On 10-Sep-07, at 8:37 PM, AnkitSinghal wrote:
But i think the query like host:example* will not work in this case
Actually it was typo in my question. I want to search for above
type of
query only.
Hosts are best stored in reverse domain format:
xyz.example.com -> com.example.xyz
Then yo
On 10-Sep-07, at 5:59 AM, Laxmilal Menaria wrote:
Hello Everyone,
I have created a Index Application using Java lucene 2.0 in java and
Lucene.Net 2.0 in VB.net. Both application have same logic. But
when I have
indexed a database with 14000 rows from both application and same
machine, I
sur
On 6-Sep-07, at 11:48 AM, Grant Ingersoll wrote:
On Sep 6, 2007, at 1:32 PM, Rafael Rossini wrote:
Karl, I´m aware of IndexReader.getTermFreqVector, with this I can
get all
terms of a document, but I want all terms of a document that
matched a
query.
Grant,
Yes, I think I understand.
On 6-Sep-07, at 4:41 AM, makkhar wrote:
Hi,
I have an index which contains more than 20K documents. Each
document has
the following structure :
field : ID (Index and store) typical value
- "1000"
field : parameterName(index and store) typical value
some axillary helper classes) for the old contrib Highlighter.
Since the contrib Highlighter is pretty hardened at this point, I
figured that was the best way to go. Or do you mean something
different?
- Mark
Mike Klaas wrote:
Mark,
I'm still interested in integrating this into Solr-
Not to mention Lupy.
Hasn't it been relatively well-established that trying to create a
performant search engine in a dynamic interpreted language is a show-
stopper? After several failed ports of lucene (I can add to this my
own, unreleased, attempt) I just don't see the point, except as a
On 23-Aug-07, at 2:48 AM, Barry Forrest wrote:
Hi list,
I'm trying to estimate how long it will take to index 10 million
documents.
If I measure how long it takes to index say 10,000 documents, can I
extrapolate? Will it take roughly 1000 times longer to do the
whole set?
Segment mergin
Mark,
I'm still interested in integrating this into Solr--this is a feature
that has been requested a few times. It would be easier to do so if
it were a contrib/...
thanks for the great work,
-Mike
On 27-Aug-07, at 4:21 AM, Mark Miller wrote:
I am a bit unclear about your question. The
Note that Solr is expressedly designed for this kind of thing: every
time you commit, a new searcher is opened in the background, warmed,
and the swapped with the current one. It also support autocommit
after X updates, or after the oldest update passes X milliseconds
without being commit
On 6-Aug-07, at 5:49 PM, Chris Lu wrote:
Seems this issue,LUCENE-834, is about query payload
https://issues.apache.org/jira/browse/LUCENE-834
Can it help on indexing speed?
That should be:
https://issues.apache.org/jira/browse/LUCENE-843
On 8/6/07, testn <[EMAIL PROTECTED]> wrote:
2.
You still have a disk seek per doc if the index can't fit in memory
(usually more costly than reading the fields) .
Why not use FieldCache?
-Mike
On 2-Aug-07, at 5:41 PM, Mark Miller wrote:
If you are just retrieving your custom id and you have more stored
fields (and they are not tiny) yo
On 3-Aug-07, at 3:27 AM, Mark Miller wrote:
Also, IndexWriter probably buffers better than you would. If you
buffer a delete with IndexWriter and then add a document that would
be removed by that delete right after, when the buffered deletes
are flushed, your latest doc will not be removed
On 1-Aug-07, at 11:34 AM, Joe Attardi wrote:
On 8/1/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
Use a SpanNearQuery with a slop of 0 and specify true for ordering.
What that will do is require that the segments you specify must
appear
in order with no gaps. You have to construct this your
You can boost any clause of a query:
http://lucene.apache.org/java/docs/queryparsersyntax.html
title:foo^5 header:foo^2 body:foo
On 31-Jul-07, at 1:00 PM, Askar Zaidi wrote:
I'll have to use StringBuffer and get the Explanation in it as a
String.
Then parse StringBuffer to get the scores of
On 26-Jul-07, at 10:18 AM, Rafael Rossini wrote:
Yes, I optimized, but in the with SOLR. I don´t know why, but when
optimize
an index with SOLR, it leaves you with about 15 files, instead of
the 3...
You are probably not using the compound file format. Try setting:
true
in solrconfig
On 4-Jul-07, at 5:31 AM, Ndapa Nakashole wrote:
I am considering using Lucene in my mini Grid-based search engine.
I would
like to partition my index by term as opposed to partition by
document. From
what i have read in the mailing list so far, it seems like
partition by term
is impossible
On 3-Jul-07, at 4:43 PM, Tim Sturge wrote:
Here's the explain output I currently get for "George Bush" "George
W Bush", "John Kerry" "John Denver" and "John Bush". (there are
others in between, but they follow very much the same pattern; an
enormous score for one of "John" or "Bush" and a v
Try out: http://issues.apache.org/jira/browse/LUCENE-850
If this is useful to you, be sure to add a comment to the issue.
-Mike
On 3-Jul-07, at 10:51 AM, Tim Sturge wrote:
I'm following myself up here to ask if anyone has experience or
code with a BooleanQuery that weights the terms it encou
On 19-Jun-07, at 3:39 PM, Mark Miller wrote:
I have been working on extending the Highlighter with a new Scorer
that correctly scores phrase and span queries. The highlighter is
working great for me, but could really use some more banging on.
If you have a need or an interest in a more accu
On 18-May-07, at 1:01 PM, charlie w wrote:
So now I have the idea to invert the field name and value thusly:
foo=tag ^2
bar=tag ^1.2
foobar=tag^1.8
and search "foo:tag".
Intuitively, I would expect Lucene to be optimized for searching
the values
of fields, and not really the names
On 17-May-07, at 6:43 AM, Andreas Guther wrote:
I am actually using the FieldSelector and unless I did something
wrong it
did not provide me any load performance improvements which was
surprising to
me and disappointing at the same time. The only difference I could
see was
when I returned
On 4/30/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Thanks for you reply.
We are still using Lucene v1.4.3 and I'm not sure if upgrading is an option. Is
there another way of disabling length normalization/document boosts to get rid
of those files?
Why not raise the limit of open files
On 4/18/07, William Mee <[EMAIL PROTECTED]> wrote:
I'd like to add metadata which I get *after* indexing a document's contents to
the index. To be more specific: I'm implementing shingling (detection of
near-duplicate documents) and want to add the document fingerprint (which is
based on the s
On 4/11/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 4/11/07, Mike Klaas <[EMAIL PROTECTED]> wrote:
> Unicode characters do not map
> precisely to code points: a single character can often be represented
> via a single codepoint or a combination of two (surrogate pa
On 4/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: I have encountered a problem searching in my application because of
: inconsistant unicode normalization forms in the corpus (and the
: queries). I would like to normalize to form NFKD in an analyzer (I
: think). I was thinking about creat
On 4/10/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote:
Furthermore syntax like +(-A +B) and -(-A +B) appear to be legal to Luke,
though I have no clue what this even means in simple English.
Let me try:
+(-A +B) -> must match (-A +B) -> must contain B and must not contain A
-(-A +B) -> must
On 4/2/07, Ofer Nave <[EMAIL PROTECTED]> wrote:
I'd like to be able to boost documents at search-time, and I'm not sure
how to do it.
Example:
I'm building a search engine for products (comparison shopping). Many
queries tend to indicate a category (i.e., 'digital cameras') as opposed
to a pro
On 3/28/07, Scott Oshima <[EMAIL PROTECTED]> wrote:
So I assumed a linear decay of performance as an index got bigger.
For some reason when going from an index size of 1.89 to 1.95 gigs
dramatically increased cpu across all of our servers.
I was thinking of splitting the 1.95 index into 2 separ
On 3/8/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
if the issue is thta you want to be abel to ship an index that people can
manipulate as much as they want and you want to garuntee they can never
reconstruct the original docs you're pretty much screwed ... even if you
eliminate all of the po
On 3/8/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: If you store a hash code of the word rather then the actual word you
: should be able to search for stuff but not be able to actually retrieve
that's a really great solution ... it could even be implemented asa
TokenFilter so none of your c
On 3/1/07, Saravana <[EMAIL PROTECTED]> wrote:
Is this still hold good now ? Thanks for your reply.
Probably most of that still applies to some extent. However, it is
unclear whether it will speed up your application.
First thing is to find out what your bottleneck is. Looking at the
stats
On 2/16/07, Marvin Humphrey <[EMAIL PROTECTED]> wrote:
Solved... with fixed field definitions.
<>
Imagine a world with no search-time/index-time analyzer mismatches...
I'm sure Yonik can imagine such a world... that's what Solr provide
. Configure an analyzer for a field (or even separate
On 2/8/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:
Is there a .NET version of Solr?
Nope.
-Mike
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On 11/3/06, Patrick Turcotte <[EMAIL PROTECTED]> wrote:
>
> It will make mails list more easy to read (I am using gmail and I do
> not have client-side filters).
That is not true.
You can have labels, and, if you look at the top of the page, right beside
the "Search the Web" button, you have
On 10/27/06, Stanislav Jordanov <[EMAIL PROTECTED]> wrote:
Have the following problem with (explicitly invoked) index optimization -
it seems to always merge all existing index segments into a single huge
segment, which is undesirable in my case.
Is there a way to force index optimization to hono
On 10/20/06, Robichaud, Jean-Philippe
<[EMAIL PROTECTED]> wrote:
3- Any ideas on how else I could do this? I'm fully open to
discussion!
How about not storing the fields at all, but storing term vectors, and
reconstructing the data from termpositions + terminfo?
-Mike
-
On 10/14/06, Jong Kim <[EMAIL PROTECTED]> wrote:
Hi,
I'm looking for a stemmer that is capable of returning all morphological
variants of a query term (to be used for high-recall search). For example,
given a query term of 'cares', I would like to be able to generate 'cares',
'care', 'cared', a
On 9/27/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
Found the reason, it is a bug IMHO.
The example should be:
A: term1^5 term2^6 term3^7
B: term1^5E-4 term2^6E-4 term3^7E-4
C: term1^0.0006 term2^0.0006 term3^0.0007
A & C suppose return the same rank
B is different
Since B will be parsed
51 matches
Mail list logo