Re: Multiphrase Query in Lucene 4.3

2013-10-02 Thread VIGNESH S
Hi, In my Analyzer,problem actually occurs for words which are preceded by punctuation marks.. For Example: If I am Indexing content",Andrey Gubarev,JingGoogle,Inc." If I search "Andrew Gubarev" ,It is not working properly since word Andrew is preceded by punctuation ",". On Thu, Oct 3, 20

Re: Multiphrase Query in Lucene 4.3

2013-10-02 Thread VIGNESH S
Hi Ian, In Lucene Is there any Default Analyzer we can use which will ignore only Spaces. All other numbers,punctuation,dates everything it should preserve. I created my analyzer with tokenizer which returns Character.isDefined(cn) && (!Character.isWhitespace(cn)). My analyzer will use a lowe ca

Re: Query performance in Lucene 4.x

2013-10-02 Thread Desidero
You are correct in that I'm using a MultiReader over multiple IndexReaders ("shards") that contain one segment each to basically do what Lucene does with a single IndexReader and multiple segments. It's done this way for two reasons: 1) By using multiple single-segment "shards", I can completely c

Re: Query performance in Lucene 4.x

2013-10-02 Thread Vitaly Funstein
Hmm, I guess your IndexSearcher is backed by a MultiReader which operates on these "shards" you're referring to, which are supposed to be single-segment indexes? If so, this topology sounds fairly equivalent, at least in concept but maybe similar in performance as well, to the regular case when you

Re: Query performance in Lucene 4.x

2013-10-02 Thread Desidero
Vitaly, Thanks for your comments. Unfortunately, thread pool task overload is not the problem. When I extended the IndexSearcher class last night, I had it create one task per shard (20 tasks) instead of the default which turned out to be somewhere around 320 (I didn't realize it created quite so

Associated values for a field and its value

2013-10-02 Thread Alice Wong
Hello, We would like to index some documents. Each field of a document may have multiple values. And for each (field,value) pair there are some associated values. These associated values are just for retrieving, not searching. For example, a document D could have a field named A. This field has t

Re: DocValues formats hold large byte[][]s even when using MMapDirectory

2013-10-02 Thread Michael McCandless
On Wed, Oct 2, 2013 at 2:37 PM, Steven Schlansker wrote: > > On Oct 2, 2013, at 11:16 AM, Michael McCandless > wrote: > >> In Lucene 4.5 (coming out any day now) we've switched by default to a >> "mostly on disk" impl for doc values. >> > > Awesome! Looking forward to that then. > >> Before tha

Re: Query performance in Lucene 4.x

2013-10-02 Thread Vitaly Funstein
Matt, I think you are mostly on track with suspecting thread pool task overload as the possible culprit here. First, the old school (prior to Java 7) ThreadPoolExecutor only accepts a BlockingQueue to use internally for worker tasks, instead of a concurrent variant (not sure why). So this internal

Re: DocValues formats hold large byte[][]s even when using MMapDirectory

2013-10-02 Thread Steven Schlansker
On Oct 2, 2013, at 11:16 AM, Michael McCandless wrote: > In Lucene 4.5 (coming out any day now) we've switched by default to a > "mostly on disk" impl for doc values. > Awesome! Looking forward to that then. > Before that, you can use DiskDocValuesFormat instead. > > But you'll need to re-

Re: DocValues formats hold large byte[][]s even when using MMapDirectory

2013-10-02 Thread Michael McCandless
In Lucene 4.5 (coming out any day now) we've switched by default to a "mostly on disk" impl for doc values. Before that, you can use DiskDocValuesFormat instead. But you'll need to re-index (or create a new index and use IW.addIndexes) to cutover your current index to the DiskDVFormat. Mike McCa

DocValues formats hold large byte[][]s even when using MMapDirectory

2013-10-02 Thread Steven Schlansker
Hi, I have a search application using Lucene 4.4.0 with various BinaryDocValues and SortedSetDocValues. We use MMapDirectory to help keep the Java heap small / GC pause times short and instead rely on the OS buffer cache to keep things fast, which I gather is generally considered a "best practi

Re: Indexing documents with multiple field values

2013-10-02 Thread Igor Shalyminov
Hi again! Here is my problem in more detail: in addition to indexing, I need the multi-value field to be stored as-is. And if I pass it into the analyzer as multiple atomic tokens, it stores only the first of them. What do I need to do to my custom analyzer to make it store all the atomic token

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-02 Thread gudiseashok
Thank you very much for your time sir, I follow your suggestion. -- View this message in context: http://lucene.472066.n3.nabble.com/Rendexing-problem-Indexing-folder-size-is-keep-on-growing-for-same-remote-folder-tp4092835p4093136.html Sent from the Lucene - Java Users mailing list archive at

Re: Query performance in Lucene 4.x

2013-10-02 Thread Desidero
I extended the IndexSearcher last night and set it up so it would make one task per IndexReader instead of one per AtomicReaderContext. Performance was pretty bad just like before, so it looks like I'm stuck merging everything into one big segment. I went through the documentation for the various

Re: Rendexing problem: Indexing folder size is keep on growing for same remote folder

2013-10-02 Thread Ian Lea
Yes, as I suggested, you could search on your unique id and not index if already present. Or, as Uwe suggested, call updateDocument instead of add, again using the unique id. -- Ian. On Tue, Oct 1, 2013 at 6:41 PM, gudiseashok wrote: > I am really sorry if something made you confuse, as I sai