Looks far more complex than I had assumed!!!
An invariant of "non-decreasing docid per flush", if pushed to the app can
save lucene from handling the complex sparse data logic no?
Lucene can hold it's existing logic without major changes, detect any
out-of-order doc before every flush and emit an
Since no one answered this, I decided I'd answer it myself (in case anyone else
wanted the answer).
First, there are two types of filters you can use in an Analyzer -- Character
filters and token filters. Character filters get applied before tokenization
and token filters get applied after tok
I've been reading about NRT thinking it might be good to integrate it into my
code. However, I have a question.
Suppose that the index writer and the index reader run in totally different
JVMs (i.e., they are different applications and only communicate via the disk).
Am I correct in thinking
Term in my view is definitely not any more of a char buffer than a plain
String. It's a unique permutation of a particular field name and its text
value. If you look at its public API, the only way to mutate a Term
instance is by obtaining a reference to underlying BytesRef which is in
itself mutab
it's CharTermAttribute in particular but since there are many such
particular examples -- at some point it becomes Lucene in general.
perhaps the problem is on my end that I'm not familiar enough with
DSL-style, but learning DSL concepts is not a prerequisite for Lucene.
as for the Term being
Are you critiquing CharTermAttribute in particular, or Lucene in general?
It appears CharTermAttribute is DSL-style builder API, just like its
superinterface Appendable - does that not appear intentional and
self-explanatory? Further, I believe Term instances are meant to be
immutable hence no dire
I don't mean to sound critical, but is there a reason that the API is
not simpler?
for example, if I want to read/modify a CharTermAttribute's value, I
need to use toString() to get the value, which is very unintuitive, and
either copyBuffer() or setEmpty() and append().
is there a reason no
Hi there
Lucene calucaltes the string similarity between two strings s1 and s2 according
to the formula
Similarity = Levenshtein-Distance(s1,s2)/min(Length(s1),Length(s2))
I would have thought Lucene would divide by the length of the longer string. In
particular, the above formula could - in m
On Mon, Nov 5, 2012 at 4:37 AM, Ravikumar Govindarajan
wrote:
> Thanks Mike,
>
> Joins could be slower than docID based approach, no?
Yes: slower at search time but faster at update time (generally not a
good tradeoff... but it seems like in your case slow updates are the
problem).
> It would be
HTMLStripCharFilter runs first, before any tokenizer, strips all the tags, and
leaves all your text intact. If you have angle brackets in the text (ie not
tags), they will be left as is. All your other analysis code should work just
the same as if the text came from a plain text file. Which
first id see if omitting term frequencies and positions and norms did what
you need, these are all things you can disable OOB...
Best
Erick
On Mon, Nov 5, 2012 at 5:26 AM, Damian Birchler
wrote:
> Hi everyone
>
> ** **
>
> We are using Lucene to search for possible duplicates in an address
Hi everyone
We are using Lucene to search for possible duplicates in an address database.
We create an index with a document for each person in the database. Each
document has a field with one term for the first name, a field with one term
for the last name and so on. I think in this setting it
https://issues.apache.org/jira/browse/SOLR-4032
-Original message-
> From:Mark Miller
> Sent: Sat 03-Nov-2012 14:20
> To: java-user@lucene.apache.org
> Subject: Re: "read past EOF" when merge
>
> Can you file a JIRA Markus? This is probably related to the new code that
> uses Direct
Thanks Mike,
Joins could be slower than docID based approach, no?
It would be great if lucene can incorporate an external docID after
weighing the pros & cons. Many like us will be willing to trade-off search
latency to some extent, in return for the low hanging fruits
---
Ravi
On Fri, Nov 2, 2
14 matches
Mail list logo