Re: Encryption At Rest - Using CustomAnalyzer

2018-02-05 Thread Michael Wilkowski
right now - it is query easy to understand and straightforward when looking in source code. I built my own version of FuzzyQuery some time ago based on MultiTermQuery class. MW [image: photo] *Michael Wilkowski* Chief Technology Officer, Silent Eight Pte Ltd +48 600 995 603 | m

MultiTermQuery vs multiple TermQuery'ies - is there a performance gain?

2017-05-23 Thread Michael Wilkowski
Hi, I am building an app that will create multiple term queries join with OR (>100 primitive TermQuery'ies). Is there a real performance gain implementing custom MultiTermQuery instead of simply joining multiple TermQuery with OR? Regards, MW

Re: Heavy usage of final in Lucene classes

2017-01-12 Thread Michael Wilkowski
CustomAnalyzer builder and set it there. > > Alan Woodward > www.flax.co.uk > > > > On 12 Jan 2017, at 10:57, Michael Wilkowski wrote: > > > > Hi, > > I wanted to subclass StandardTokenizer to manipulate a little with > > PositionAttribute. I wanted to i

Re: Heavy usage of final in Lucene classes

2017-01-12 Thread Michael Wilkowski
...@mikemccandless.com> wrote: > I don't think it's about efficiency but rather about not exposing > possibly trappy APIs / usage ... > > Do you have a particular class/method that you'd want to remove final from? > > Mike McCandless > > http://blog.mikemccan

Heavy usage of final in Lucene classes

2017-01-11 Thread Michael Wilkowski
Hi, I sometimes wonder what is the purpose of so heavy "final" methods and classes usage in Lucene. It makes it my life much harder to override standard classes with some custom implementation. What comes first to my mind is runtime efficiency (compiler "knows" that this class/method will not be o

Re: Lucene performance benchmark | search throughput

2017-01-03 Thread Michael Wilkowski
My guess: more conditions = less documents to score and sort to return. On Mon, Jan 2, 2017 at 7:23 PM, Rajnish kamboj wrote: > Hi > > Is there any Lucene performance benchmark against certain set of data? > [i.e Is there any stats for search throughput which Lucene can provide for > a certain d

FuzzyQuery on entire set of terms

2016-10-21 Thread Michael Wilkowski
Hi, I need to implement a function that performs fuzzy search on multiple terms in the way that a summarized distance 2 from ALL terms is allowed. For example query: Lucene Apache Group with maximum distance 2 should match: Luceni Apachi Group Lucen Apache Group Luce Apache Group but not: Luce

Re: Handling multiple locale

2016-09-26 Thread Michael Wilkowski
Hi, in my opinion your system locales have nothing to do with the analyzers that you want to apply. I would not rely on system locales as that makes application very unportable. Regarding any other way - there are none. You may apply regex query and create custom queries, but not dynamically refer

Re: Handling multiple locale

2016-09-25 Thread Michael Wilkowski
Hi, please explain I get it correctly: do you want to search your query within all possible locales? If yes then my personal pattern in such case would be to create multiple BooleanClause (with Occur.SHOULD, one clause per each locale) and add them to one BooleanQuery. MW On Sun, Sep 25, 2016 at

Re: Using Lucene to model ownership of documents

2016-06-16 Thread Michael Wilkowski
Definitely b). I would also suggest groups and expanding user groups at user sign in time. MW On Thu, Jun 16, 2016 at 12:36 PM, Ian Lea wrote: > I'd definitely go for b). The index will of course be larger for every > extra bit of data you store but it doesn't sound like this would make much >

Re: Cache Lucene based index.

2016-05-21 Thread Michael Wilkowski
I recommend reading http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html?m=1 Just use mmapdirectory and let operating system do the rest. MW Sent from Mi phone On 21 May 2016 12:42, "Prateek Singhal" wrote: > You can consider that I want to store the lucene index in some sor

Re: TermRangeQuery work not

2015-12-26 Thread Michael Wilkowski
You mixed lowerDate with upperDate. MW Sent from Mi phone On 25 Dec 2015 16:41, "kaog" wrote: > hi > I did the change of variable "ISBN, it was a mistake I did when I wrote in > the post. unfortunately still it does not work TermRangeQuery. :( > > > > -- > View this message in context: > http://

Re: Wildcard Terms and total word or phrase count

2015-11-29 Thread Michael Wilkowski
Hi Doug, your attachment is not available (likely security settings). Please put it in github or somewhere else and provide a link to download. MW On Mon, Nov 30, 2015 at 2:29 AM, Kunzman, Douglas * < douglas.kunz...@fda.hhs.gov> wrote: > > Jack - > > Thanks a lot for taking the time to try and

Re: Wildcard Terms and total word or phrase count

2015-11-29 Thread Michael Wilkowski
It is because your index does not contain term quar* and this statistics function is not a query (you have to pass exact form of the term). To count terms that meet search criteria you may run search query with custom collector and count results. Or use normal search query returning TopDocs and jus

Re: Determine whether a MatchAllQuery or a Query with atleast one Term

2015-11-27 Thread Michael Wilkowski
Instanceof? MW Sent from Mi phone On 28 Nov 2015 06:57, "Sandeep Khanzode" wrote: > Hi, > I have a question. > In my program, I need to check whether the input query is a MatchAll Query > that contains no terms, or a Query (any variant) that has at least one > term. For typical Term queries, thi

Re: Lucene auto suggest

2015-11-25 Thread Michael Wilkowski
Try some examples from stackoverflow: http://stackoverflow.com/questions/24968697/how-to-implements-auto-suggest-using-lucenes-new-analyzinginfixsuggester-api On Wed, Nov 25, 2015 at 4:18 AM, Bhaskar wrote: > Could you please some help here? > > On Mon, Nov 23, 2015 at 10:50 PM, Bhaskar wrote:

Re: does field cache support multivalue?

2015-11-19 Thread Michael Wilkowski
Yes, according to Lucene in Action book, you cannot use field cache in such situations. MW On Fri, Nov 20, 2015 at 8:41 AM, Yonghui Zhao wrote: > If I index one filed more than 1 times, it seems I can't get all values > from lucene field cache? > > right? >

Re: one large index vs many small indexes

2015-11-11 Thread Michael Wilkowski
and why separate indexes will be much more efficient. Regards, Michael Wilkowski On Wed, Nov 11, 2015 at 9:40 AM, Sascha Janz wrote: > hello, > > we must make a design decision for our system. we have many customers wich > all should use the same server. now we are thinking about to m