Re: SOLR/LUCENE 5.2.1: Solution of CharTermAtt, StartOffset, EndOffset, Position

2015-08-07 Thread Shai Erera
I think you can just write a TokenFilter which sets the PositionIncrementAttribute of every other token to 0. Then you can use StandardTokenizer and wrap it with that filter. Shai On Aug 8, 2015 6:33 AM, "Văn Châu" wrote: > Hi, > > I'm looking a solution for the following format in solr/lucene 5

SOLR/LUCENE 5.2.1: Solution of CharTermAtt, StartOffset, EndOffset, Position

2015-08-07 Thread Văn Châu
Hi, I'm looking a solution for the following format in solr/lucene 5.2.1 version: Text eg: "fast wi fi network is down". If using solr.StandardTokenizerFactory , I have the "Position " corresponding to displayed : fast ( 1 ) - > wi ( 2 ) - > fi ( 3 ) - > Network ( 4 ) - > is ( 5 ) - - > down ( 6 )

PerFieldAnalyzerWrapper does not seem to allow use of a custom analyzer

2015-08-07 Thread Bauer, Herbert S. (Scott)
I can’t seem to detect any issues with the final custom analyzer declared in this code snippet (The one that attempts to use a PatternMatchingTokenizer and is initialized as sa), but it doesn’t seem to be hit when I run my indexing code despite being in the map. It is indexed finally but I assu

Prioritizing BooleanQueries to Improve Performance

2015-08-07 Thread markh
If I have a BooleanQuery which has two subqueries, one fast, one slow (fastQuery) && (slowQuery) Is there a way to tell Lucene to execute the fastQuery first so it can potentially skip the slowQuery if there are no results from the fastQuery? I don't think making the slow query a filter (Occur.

Re: new to Lucene

2015-08-07 Thread Erick Erickson
2. Is the "Index" saved as a file or loaded into the memory? Adding to Modassar's comments: Almost all "real" implementations save the index to disk and read selected portions back in to memory as needed, otherwise the data isn't permanent. In the Lucene world, I'd start with NRTCachingDirectory.

Re: new to Lucene

2015-08-07 Thread Modassar Ather
Please see my comments in-line. 1. For the indexing of these chapters, how many fields that need to be declared? Can I just declare only one field for the contents? This depends on what you need to search with. E.g if only plain content (chapters) are to be searched then one indexed field is requ

new to Lucene

2015-08-07 Thread Nantha Kumar Subramaniam
Good day I am new to Lucene and have started to explore Lucene. I have questions: I have a book in which all the chapters are in pdf. I plan to index all these individual chapters in Lucene using Tika for the text extraction. 1. For the indexing of these chapters, how many fields that need to b

Re: Mapping doc values back to doc ID (in decent time)

2015-08-07 Thread Adrien Grand
On Fri, Aug 7, 2015 at 8:30 AM, Trejkaz wrote: > for (int ourId = 0; ourId < count; ourId++) > { > builder.clear(); > NumericUtils.longToPrefixCoded(ourId, 0, builder); > termsEnum.seekExact(builder.get()); > postingsEnum = termsEnum.postings(null, postingsE