big data inde in lucene

2017-03-20 Thread 380382...@qq.com
is there some one can help me? i have a program with 600billion docs(about 700TB). i want to index it with mapreduce and store it with hdfs.I suppose i can index data with lucene and search in solrcloud. i want to separate docs into 400 shards. I would like to know whether I can do so. Thanks

Re: how to rebuild a index corrupted?

2017-03-20 Thread Michael McCandless
You can use Lucene's CheckIndex tool with the -exorcise option but this is quite brutal: it simply drops any segment that has corruption it detects. Mike McCandless http://blog.mikemccandless.com On Mon, Mar 20, 2017 at 4:44 PM, Marco Reis wrote: > I'm afraid it's not possible to rebuild index

Re: how to rebuild a index corrupted?

2017-03-20 Thread Marco Reis
I'm afraid it's not possible to rebuild index. It's important to maintain a backup policy because of that. On Mon, Mar 20, 2017 at 5:12 PM Cristian Lorenzetto < cristian.lorenze...@gmail.com> wrote: > lucene can rebuild index using his internal info and how ? or in have to > reinsert all in othe

how to rebuild a index corrupted?

2017-03-20 Thread Cristian Lorenzetto
lucene can rebuild index using his internal info and how ? or in have to reinsert all in other way?

is there a event before /post commit to file

2017-03-20 Thread Cristian Lorenzetto
lucene has a strategy for understadning then indexes files are not closed correctly or not? is there a way for saving a counter status when a commit is done so i can check if the maximun counter is equal to commit counter? i might insert this code after commit line.

RE: calculate term co-occurrence matrix

2017-03-20 Thread Allison, Timothy B.
I have code as part of LUCENE-5318 that counts terms that cooccur within a window of where your query terms appear. This makes a really useful query term recommender, and the math is dirt simple. INPUT Doc1: quick brown fox jumps over the lazy dog Doc2: quick green fox leaps over the lazy dog

RE: Limiting terms / field

2017-03-20 Thread Uwe Schindler
It is also in 4.10.3 as part of the analysis-common module: https://lucene.apache.org/core/4_10_3/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilter.html Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de

Limiting terms / field

2017-03-20 Thread Chris Bamford
Hello, We are using Lucene 4.10.3 and are interested in limiting the number of terms per field. In the past this was set by the IndexWriter (maxFieldLength) and the default was 10K; as I understand it this is no longer the case, in fact it is now unlimited by default? Anyway, what is the bes