Hi,
In my Analyzer,problem actually occurs for words which are preceded by
punctuation marks..
For Example:
If I am Indexing content",Andrey Gubarev,JingGoogle,Inc."
If I search "Andrew Gubarev" ,It is not working properly since word Andrew
is preceded by punctuation ",".
On Thu, Oct 3, 20
Hi Ian,
In Lucene Is there any Default Analyzer we can use which will ignore only
Spaces.
All other numbers,punctuation,dates everything it should preserve.
I created my analyzer with tokenizer which returns Character.isDefined(cn)
&& (!Character.isWhitespace(cn)).
My analyzer will use a lowe ca
You are correct in that I'm using a MultiReader over multiple IndexReaders
("shards") that contain one segment each to basically do what Lucene does
with a single IndexReader and multiple segments. It's done this way for two
reasons:
1) By using multiple single-segment "shards", I can completely c
Hmm, I guess your IndexSearcher is backed by a MultiReader which operates
on these "shards" you're referring to, which are supposed to be
single-segment indexes? If so, this topology sounds fairly equivalent, at
least in concept but maybe similar in performance as well, to the regular
case when you
Vitaly,
Thanks for your comments.
Unfortunately, thread pool task overload is not the problem. When I
extended the IndexSearcher class last night, I had it create one task per
shard (20 tasks) instead of the default which turned out to be somewhere
around 320 (I didn't realize it created quite so
Hello,
We would like to index some documents. Each field of a document may have
multiple values. And for each (field,value) pair there are some associated
values. These associated values are just for retrieving, not searching.
For example, a document D could have a field named A. This field has t
On Wed, Oct 2, 2013 at 2:37 PM, Steven Schlansker wrote:
>
> On Oct 2, 2013, at 11:16 AM, Michael McCandless
> wrote:
>
>> In Lucene 4.5 (coming out any day now) we've switched by default to a
>> "mostly on disk" impl for doc values.
>>
>
> Awesome! Looking forward to that then.
>
>> Before tha
Matt,
I think you are mostly on track with suspecting thread pool task overload
as the possible culprit here. First, the old school (prior to Java 7)
ThreadPoolExecutor only accepts a BlockingQueue to use internally for
worker tasks, instead of a concurrent variant (not sure why). So this
internal
On Oct 2, 2013, at 11:16 AM, Michael McCandless
wrote:
> In Lucene 4.5 (coming out any day now) we've switched by default to a
> "mostly on disk" impl for doc values.
>
Awesome! Looking forward to that then.
> Before that, you can use DiskDocValuesFormat instead.
>
> But you'll need to re-
In Lucene 4.5 (coming out any day now) we've switched by default to a
"mostly on disk" impl for doc values.
Before that, you can use DiskDocValuesFormat instead.
But you'll need to re-index (or create a new index and use
IW.addIndexes) to cutover your current index to the DiskDVFormat.
Mike McCa
Hi,
I have a search application using Lucene 4.4.0 with various BinaryDocValues and
SortedSetDocValues.
We use MMapDirectory to help keep the Java heap small / GC pause times short
and instead rely on the OS buffer cache to keep things fast, which I gather is
generally considered a "best practi
Hi again!
Here is my problem in more detail: in addition to indexing, I need the
multi-value field to be stored as-is. And if I pass it into the analyzer as
multiple atomic tokens, it stores only the first of them.
What do I need to do to my custom analyzer to make it store all the atomic
token
Thank you very much for your time sir, I follow your suggestion.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Rendexing-problem-Indexing-folder-size-is-keep-on-growing-for-same-remote-folder-tp4092835p4093136.html
Sent from the Lucene - Java Users mailing list archive at
I extended the IndexSearcher last night and set it up so it would make one
task per IndexReader instead of one per AtomicReaderContext. Performance
was pretty bad just like before, so it looks like I'm stuck merging
everything into one big segment.
I went through the documentation for the various
Yes, as I suggested, you could search on your unique id and not index
if already present. Or, as Uwe suggested, call updateDocument instead
of add, again using the unique id.
--
Ian.
On Tue, Oct 1, 2013 at 6:41 PM, gudiseashok wrote:
> I am really sorry if something made you confuse, as I sai
15 matches
Mail list logo