Re: Too many unique terms

2013-04-27 Thread Manuel Le Normand
Hi, real thanks for the previous reply. For now i'm not able to make a separation between these useless words, whether they contain words or digits. I liked the idea of iterating with TermsEnum. Will it also delete the occurances of these terms in the other file formats (termVectors etc.)? As i un

Re: Too many unique terms

2013-04-29 Thread Manuel Le Normand
On Mon, Apr 29, 2013 at 1:22 PM, Adrien Grand wrote: > On Sat, Apr 27, 2013 at 8:41 PM, Manuel Le Normand > wrote: > > Hi, real thanks for the previous reply. > > For now i'm not able to make a separation between these useless words, > > whether they contain wor

Profiling Solr Lucene for query

2013-09-08 Thread Manuel Le Normand
Hello all Looking on the 10% slowest queries, I get very bad performances (~60 sec per query). These queries have lots of conditions on my main field (more than a hundred), including phrase queries and rows=1000. I do return only id's though. I can quite firmly say that this bad performance is due

Expunge deleting using excessive transient disk space

2013-09-08 Thread Manuel Le Normand
Hi again, In order to delete part of my index I run a delete by query that intends to erase 15% of the docs. I added this params to the solrconfig.xml 2 2 5000.0 10.0 15.0 The extra params were added in order to promote merge of old segments but with restriction on the transient d

Re: Expunge deleting using excessive transient disk space

2013-09-08 Thread Manuel Le Normand
you have when you > try the merge? > > Is this a typo? > > 2 > Note name=| > > Best > Erick > > > On Sun, Sep 8, 2013 at 7:26 AM, Manuel Le Normand < > manuel.lenorm...@gmail.com> wrote: > > > Hi again, > > In order to delete part of my i

Re: Profiling Solr Lucene for query

2013-09-08 Thread Manuel Le Normand
al boxes? How much memory per JVM? How > many JVMs? How much physical memory per box? > > 'Cause this seems excessive time-wise for loading the info. > > Best > Erick > > > On Sun, Sep 8, 2013 at 7:03 AM, Manuel Le Normand < > manuel.lenorm...@gmail.com>

Understanding FST Prefix & CheckIndex output

2013-09-22 Thread Manuel Le Normand
Hi there, I try to deep dive into the inner LucenePostingFormat to check what might I do for improving query performance. I'm curious about the termBlock stats that I get from checkIndex -verbose. What does the followong mean: index FST bytes - the FST size, which is the field's partition of the .

segment corruption - ArrayIndexOutOfBoundsException

2013-10-22 Thread Manuel Le Normand
Hello, My lucene index contains 46 segments with a total of 4M docs. Lately, while running queries I started getting seldom exceptions from this index: java.lang.ArrayIndexOutOfBoundsException at org.apache.lucene.codecs.lucene41.ForUtil.readBlock(ForUtil.java196) at org.apache.lucene.codecs.lu

Indexing useful N-grams (phrases & entities) and adding payloads

2014-03-12 Thread Manuel Le Normand
Hi, I posted this question on the Solr mailing list but it has more to do with Lucene. I have a performance and scoring problem for phrase queries 1. Performance - phrase queries involving frequent terms are very slow due to the reading of large positions posting list. 2. Scoring - I wan

Re: Indexing useful N-grams (phrases & entities) and adding payloads

2014-03-12 Thread Manuel Le Normand
e surrounding context / document? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Mar 12, 2014 at 5:27 AM, Manuel Le Normand > wrote: > > Hi, > > I posted this question on the Solr mailing list but it has more to do > with > > Lucene. &

Re: Question about Payloads in Lucene 4.5

2014-03-23 Thread Manuel Le Normand
Hello Rohit, We had a similar query time bottleneck when attempting to map lucene's internal id's to the uniqueKey, especially as we generally return only the uniqueKey to the user we had no other use of the stored field. As you noted, every internal id --> uniqueKey id requires a disk seek and as

Controlling FuzzyQuery edit type

2014-09-28 Thread Manuel Le Normand
Hello, In the FuzzyQuery I see it is possible to control the char transposition option by a boolean (which btw seems hardcoded and not configurable ). Is it possible to control the other edit types (char insertion, deletion or substitution) that are allowed somewhere in the code? Thanks, Manuel

Re: Controlling FuzzyQuery edit type

2014-09-29 Thread Manuel Le Normand
Nevermind, I just wrote a custom function that outputs the edit type for each word. Thanks