Re: Penalize fact the searched term is within a world

2017-06-08 Thread Ahmet Arslan
Hi, You can completely ban within-a-word search by simply using WhitespaceTokenizer for example.By the way, it is all about how you tokenize/analyze your text. Once you decided, you can create a two versions of a single field using different analysers.This allows you to assign different weights

Random Index Corruption exceptions during bulk indexing

2017-06-08 Thread simon
I'm seeing a randomly occuring Index Corruption exception during a Solr data ingest. This can occur anywhere during the 7-8 hours our ingests take. I've submitted a Solr bug issue to JIRA as this is the environment I'm using, but it does look as though the error is occurring in Lucene code, so I t

Re: Lucene-6.2.1 -> impact of document removal on performance and index size

2017-06-08 Thread Erick Erickson
OK, got your concern now. Right, when docs are deleted they are only marked as deleted, the actual data is _not_ purged (yet). As you add more documents to your index, segments will get merged as part of normal processing. When segments are merged, the deleted data is expunged. So if you're contin

Penalize fact the searched term is within a world

2017-06-08 Thread Jacek Grzebyta
Hi, Apologies for repeating question from IRC room but I am not sure if that is alive. I have no idea about how lucene works but I need to modify some part in rdf4j project which depends on that. I need to use lucene to create a mapping file based on text searching and I found there is a followi

RE: Lucene-6.2.1 -> impact of document removal on performance and index size

2017-06-08 Thread Ludovic Bertin
Thanks Erick for your answer, we have huge index 700Gb, 350 millions of documents We had a case of log flooding due to a bug in an application, that generate 100 000 000 documents, so we have deleted them, but there is no impact on indexSize without optimize. I think it's normal, true ? Thanks