Re: RAMDirectory or Redis

2018-12-02 Thread Arjen van der Meijden
I doubt using Redis as directory-storage will be very good. I'd expect it to have much more latency for reads and writes compared to any of lucene's own directories. And Lucene probably won't like it if another Lucene-instance changes that database. It may be interesting as a result-level cache th

Re: Not-indexed, Stored Thumbnails or NoSQL?

2018-12-02 Thread Arjen van der Meijden
I'd think it depends on your application. If its a web-application and you're generating html, it may be better for the (client side) performance to have those images load via a webserver that can directly access the images as files (altough you could generate the images inline with base64). If it

Re: Compressing docValues with variable length bytes[] by block of 16k ?

2015-08-09 Thread Arjen van der Meijden
On 9-8-2015 16:22, Toke Eskildsen wrote: > Robert Muir wrote: >> I am tired of repeating this: >> Don't use BINARY docvalues >> Don't use BINARY docvalues >> Don't use BINARY docvalues >> Use types like SORTED/SORTED_SET which will compress the term >> dictionary and make use of ordinals in your

Re: How to handle words that stem to stop words

2014-07-10 Thread Arjen van der Meijden
based on the stopword change frequency not on the frequency of discovery of new words that stem to stopwords. -sujit On Thu, Jul 10, 2014 at 11:57 AM, Arjen van der Meijden < acmmail...@tweakers.net> wrote: I'm reluctant to apply either solution: Emitting both tokens will likely s

Re: How to handle words that stem to stop words

2014-07-10 Thread Arjen van der Meijden
p words. Best regards, Arjen On 7-7-2014 23:06 Tri Cao wrote: I think emitting two tokens for "vans" is the right (potentially only) way to do it. You could also control the dictionary of terms that require this special treatment. Any reason makes you not happy with this approach?

How to handle words that stem to stop words

2014-07-06 Thread Arjen van der Meijden
already noticed the 'KeywordRepeatFilter' to index/search both 'vans' and 'van' and the StemmerOverrideFilter to try and prevent these cases. Are there any other solutions for these kinds of problems? Best regards, Arjen van der Meijden ---

Re: NewBie To Lucene || Perfect configuration on a 64 bit server

2014-05-26 Thread Arjen van der Meijden
You don't need to worry about the 1024 maxBooleanClauses, just use a TermsFilter. https://lucene.apache.org/core/4_8_0/queries/org/apache/lucene/queries/TermsFilter.html I use it for a similar scenario, where we have a data structure that determines a subset of 1.5 million documents from outsi

Re: Indexing Huge tree structure represented in a Text file

2014-04-15 Thread Arjen van der Meijden
Given that he is already using Java, simply building a object-tree based on the text file may be also possible. Although a 300MB file may turn out to be fairly large in memory consumption (possibly caused by quite a bit of object-overhead). If that turns out to consume too much memory there ar

Re: Performance measurements

2013-07-25 Thread Arjen van der Meijden
FILTER APPROACH: List orTerms = new ArrayList(); for (int i = 0; i < orCount; ++i) { terms.add(new Term("conn", Integer.toString(connection[i]))); } TermsFilter conns = new TermsFilter(terms); TermQuery tq = new TermQuery(new Term("name", name)); FilteredQuery

Re: Performance measurements

2013-07-25 Thread Arjen van der Meijden
On 24-7-2013 21:58 Sriram Sankar wrote: On Wed, Jul 24, 2013 at 10:24 AM, Jack Krupansky wrote: Scoring has been a major focus of Lucene. Non-scored filters are also available, but the query parsers are focused (exclusively) on scored-search. When you say "filter" do you mean a step performed

Re: StandardAnalyzer class not present in Lucene 4.2.0

2013-03-25 Thread Arjen van der Meijden
Hi Guarav, There is a package 'lucene-analyzers-common-$version.jar' which contains the analyzers. So you should add that to your project. Best regards, Arjen On 25-3-2013 8:41, gaurav redkar wrote: Hi all, I am trying to write simple program to add documents to index. But am unable to do

Re: Does anyone have tips on managing cached filters?

2012-11-29 Thread Arjen van der Meijden
We have something similar with documens that can be tagged (and have many other relations). But for the matter of search we have two distinctions from your aproach: - We do actually index the relation's id (i.e. the tag's id) as part of the lucene-document and update the document if that relatio

Re: Improving search performance for forum search

2012-11-24 Thread Arjen van der Meijden
y. In your CustomScoreQuery, you can use the DocValues (available on AtmoicReader) to score your documents. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message----- From: Arjen van der Meijden [mailto:acmmail...@tweakers.net] Sen

Re: Improving search performance for forum search

2012-11-13 Thread Arjen van der Meijden
remen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message----- From: Arjen van der Meijden [mailto:acmmail...@tweakers.net] Sent: Tuesday, November 13, 2012 8:36 AM To: java-user@lucene.apache.org Subject: Improving search performance for forum search Hi List, I'm working on a search

Improving search performance for forum search

2012-11-12 Thread Arjen van der Meijden
bit far-fetched (although it would yield the most gain). Any other tips? Best regards, Arjen van der Meijden Tweakers.net B.V. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Combining The results from DB and Index Regd.,

2012-11-12 Thread Arjen van der Meijden
On 13-11-2012 4:15 selvakumar netaji wrote: Hi All, We are using lucene for searching data from the database in our enterprise application. The searches would be in a single index, whose documents are indexed from two different databases A and B. The frequency of updating the database A is li

Testing whether a document is up-to-date in Lucene 4.0

2012-10-12 Thread Arjen van der Meijden
Hello List, I'm currently trying to update my Lucene 3.6-application to 4.0. Most of it works (although your migration guide lacks a bit of aspects I had to figure out myself), but for one fairly large database I want to check whether the Document in the Lucene database is already at the late

Re: short search terms

2012-09-26 Thread Arjen van der Meijden
Shouldn't your own application-logic handle this? Or do you want complicated query-parsing where each and every token in the query is always at most 3 characters long? I don't know if there are any easier solutions, but you could subclass the QueryParser and add your requirement to all the rel