IndexWriteConfig ignored?

2012-04-16 Thread Clemens Wyss
We limit the memory consumption of our IndexWriters by setting RAMBufferSizeMB to 5MB (IndexWriterConfig.setRAMBufferSizeMB). Inspecting a heapdump unveils that I still have writers wich consume/retain more than 35MB! How come? Any help/advice appreciated Clemens ---

AW: IndexWriteConfig ignored?

2012-04-17 Thread Clemens Wyss
ings (indexed documents) If you turn on IndexWriter's infoStream, do you see output saying it's flushing a new segment because RAM is > 5.0 MB? Mike McCandless http://blog.mikemccandless.com On Mon, Apr 16, 2012 at 4:46 AM, Clemens Wyss wrote: > We limit the memory con

Limiting IndexWriters memory usage?

2012-05-02 Thread Clemens Wyss
Is there a way to limit IndexWriters memory usage? While indexing many many documents my IndexWriter occupies > 30MB in memory. Is there a way to limit this "usage"? Thx Clemens - To unsubscribe, e-mail: java-user-unsubscr...@

Best practices for multiple languages?

2011-01-18 Thread Clemens Wyss
What is the "best practice" to support multiple languages, i.e. Lucene-Documents that have multiple language content/fields? Should a) each language be indexed in a seperate index/directory or should b) the Documents (in a single directory) hold the diverse localized fields? We most often will b

AW: Best practices for multiple languages?

2011-01-18 Thread Clemens Wyss
I've dealt with both, and there are different solutions to each. Which of them > is yours? > > Shai > > On Tue, Jan 18, 2011 at 7:53 PM, Clemens Wyss > wrote: > > > What is the "best practice" to support multiple languages, i.e. > > Lucene-Document

Paging with Lucene

2011-01-19 Thread Clemens Wyss
(thanks fort he many answers to my initial lucene question "Best practices for multiple languages?") We shall be confronted with the followong problem: due to the very dynamic access rules on our content, we shall not be able to formulate these in/as Filter(s). Hence we need to first search and

AW: Paging with Lucene

2011-01-21 Thread Clemens Wyss
d return the second or subsequent chunk > of hits. Would that not work in your case? > > An alternative is to read and cache hits from the initial search but that is > generally more complex. > > > -- > Ian. > > On Thu, Jan 20, 2011 at 7:36 AM, Clemens Wyss

Suggest search terms

2011-02-21 Thread Clemens Wyss
I'd like to suggest search terms to my users. My naïve approach would have been: After at least n characters have been typed (asynchronously) find terms in IndexReader.terms() which "match" Is there a (even) more straight forward (and possible faster) approach to get "search term suggestions"?

AW: Suggest search terms

2011-02-22 Thread Clemens Wyss
Fernando, Uwe thanks for your suggestions. Is it possible to get the number of "hits" per term? ferrari (125) lamborghini (34) ... > -Ursprüngliche Nachricht- > Von: Fernando Wasylyszyn [mailto:ferw...@yahoo.com.ar] > Gesendet: Montag, 21. Februar 2011 21:11 > An: java-user@lucene.apache.

Searching within all fields...

2011-03-02 Thread Clemens Wyss
looking at the Term and QueryParser class, I always have to provide a field name. MultiFieldQueryParser requires a list of fields. But what if I just want to search withing "all fields", not enumerating them? Any advices? - To

Analyzer which creates terms of one to n words

2011-04-07 Thread Clemens Wyss
Is there an analyzer which takes a text and creates search terms based on the following rules: - all single words - "two words in a row" - "three word in a row" - ... - "n words in a row" The reason is the following: I have an index which is now being analyzed using WhitespaceAnalyzer. Besides

German*Filter, Analyzer "cutting" off letters from (french) words...

2011-04-12 Thread Clemens Wyss
I try to apply German*Filter and or Analyzer on my index. My index contains wine names such as "Petite Arvine" ( I know, that's french ;) ). Whenever one oft he German*Filter or German*Analyzer is in play the terms for "Petite Arvine" are reduced to "Petit" and "Arvin" Why so? Where have the e'

GermanFilter, Analyzer cutting off letters...

2011-04-12 Thread Clemens Wyss
-- hopefully not a double post, I retried sending this post after 5h -- I try to apply German*Filter and or Analyzer on my index. My index contains wine names such as "Petite Arvine" ( I know, that's french ;) ). Whenever one oft he German*Filter or German*Analyzer is in play the terms for "Pe

AW: German*Filter, Analyzer "cutting" off letters from (french) words...

2011-04-13 Thread Clemens Wyss
ne.apache.org > Betreff: Re: German*Filter, Analyzer "cutting" off letters from (french) > words... > > On Tue, Apr 12, 2011 at 8:46 AM, Clemens Wyss > wrote: > > Why so? Where have the e's gone? > > > > the e is being stemmed as its a german suffix.

AW: German*Filter, Analyzer "cutting" off letters from (french) words...

2011-04-13 Thread Clemens Wyss
h) > words... > > If you only want to ignore german stopwords, you don't need to use the > german analyzer with german stemming. you can just use StandardAnalyzer > with your own stopwords set! > > On Wed, Apr 13, 2011 at 3:51 AM, Clemens Wyss > wrote: > > What

AW: German*Filter, Analyzer "cutting" off letters from (french) words...

2011-04-14 Thread Clemens Wyss
Does the StandardAnalyzer lowercase its terms? > -Ursprüngliche Nachricht- > Von: Clemens Wyss [mailto:clemens...@mysign.ch] > Gesendet: Mittwoch, 13. April 2011 13:34 > An: java-user@lucene.apache.org > Betreff: AW: German*Filter, Analyzer "cutting" off lett

AW: German*Filter, Analyzer "cutting" off letters from (french) words...

2011-04-18 Thread Clemens Wyss
> Betreff: Re: German*Filter, Analyzer "cutting" off letters from (french) > words... > > On Fri, Apr 15, 2011 at 8:48 AM, Clemens Wyss > wrote: > > Does the StandardAnalyzer lowercase its terms? > yes! > > simon > > > >> -Ursprüngliche Nach

"Umlaute" getting lost

2011-04-21 Thread Clemens Wyss
I keep my search terms in a dedicated RAMDirectory (the termIndex). In there I palce all the term of my real index. When putting the terms into the termIndex I can still see [using the debugger] the Umlaute (äöü). Unfortunately when searching the termIndex the documents no more contain these Um

AW: "Umlaute" getting lost

2011-04-25 Thread Clemens Wyss
pril 2011 12:13 > An: java-user@lucene.apache.org > Betreff: Re: "Umlaute" getting lost > > On Sun, Apr 24, 2011 at 8:30 AM, Grant Ingersoll > wrote: > > > > On Apr 21, 2011, at 5:02 PM, Clemens Wyss wrote: > > > >> I keep my search terms in a

AW: "Umlaute" getting lost

2011-04-25 Thread Clemens Wyss
ens > -Ursprüngliche Nachricht- > Von: Grant Ingersoll [mailto:gsing...@apache.org] > Gesendet: Sonntag, 24. April 2011 08:30 > An: java-user@lucene.apache.org > Betreff: Re: "Umlaute" getting lost > > > On Apr 21, 2011, at 5:02 PM, Clemens Wyss wrot

"fuzzy prefix" search

2011-05-02 Thread Clemens Wyss
I'd like to search fuzzily but not on a full term. E.g. I have a text "Merlot del Ticino" I'd like "mer", "merr", "melo", ... to match. If I use FuzzyQuery only "merlot, "merlott" hit. What Query-combination should I use? Thx Clemens --

AW: "fuzzy prefix" search

2011-05-02 Thread Clemens Wyss
rms must match this prefix and > the > rest of each term is matched using fuzzy. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Clemens Wyss

AW: "fuzzy prefix" search

2011-05-02 Thread Clemens Wyss
Is it the combination of FuzzyQuery and Term which makes the search to go for "word boundaries"? > -Ursprüngliche Nachricht- > Von: Clemens Wyss [mailto:clemens...@mysign.ch] > Gesendet: Montag, 2. Mai 2011 14:13 > An: java-user@lucene.apache.org > Betreff:

AW: "fuzzy prefix" search

2011-05-03 Thread Clemens Wyss
archer( indexReader ); Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f, 0, 10 ); // or Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.5f); TopDocs result = searcher.search( q, 10 ); Assert.assertEquals( 1, result.totalHits ); - Clemens &

AW: "fuzzy prefix" search

2011-05-03 Thread Clemens Wyss
" search > > Mer != mer. The latter will be what is indexed because StandardAnalyzer > calls LowerCaseFilter. > > -- > Ian. > > > On Tue, May 3, 2011 at 9:56 AM, Clemens Wyss > wrote: > > Sorry for coming back to my issue. Can anybody exp

AW: "fuzzy prefix" search

2011-05-03 Thread Clemens Wyss
; and "merlot"? Would it be > less that 1.5 which I reckon would be the value of length(term)*0.5 as > detailed in the javadocs? Seems unlikely, but I don't really know anything > about the Levenshtein (edit distance) algorithm as used by FuzzyQuery. > Wouldn'

AW: "fuzzy prefix" search

2011-05-03 Thread Clemens Wyss
Betreff: AW: "fuzzy prefix" search > > Have you tried > > Query q = new FuzzyQuery( new Term( "test", "Mer" ), 0.499f); > > > Sven > > > -Ursprüngliche Nachricht- > Von: Clemens Wyss [mailto:clemens...@mysign.ch] > Gesendet

AW: AW: "fuzzy prefix" search

2011-05-03 Thread Clemens Wyss
/examples closely enough, but you may want to look at > this if you haven't done so yet. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem > search :: http://search-lucene.com/ > > > > - Original Message &g

AW: AW: AW: "fuzzy prefix" search

2011-05-03 Thread Clemens Wyss
ter.Side.FRONT, 1, 4); } > > > Check out page 265 of Lucene in Action 2. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message > > F

AW: AW: AW: AW: "fuzzy prefix" search

2011-05-03 Thread Clemens Wyss
just an example. Stick another tokenizer in there, like > WhitespaceTokenizer in there, for example. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem > search :: http://search-lucene.com/ > > > > - Original Message &g

Higher scoring if term is at the beginning of a field/document

2011-05-04 Thread Clemens Wyss
Given the I have 3 documents with exactly one field and the fields have the following contents: This is a moon The moon is bright moon If I analyze these documents they all hit on "moon". But how do I need to analyze/search my index in order to have the following "sort order": moon The moon is b

AW: Higher scoring if term is at the beginning of a field/document

2011-05-04 Thread Clemens Wyss
ing of a field/document > > What is the problem you're trying to solve? I'm wondering if this is an XY > problem. See: > http://people.apache.org/~hossman/#xyproblem > > Best > Erick > > On Wed, May 4, 2011 at 3:16 AM, Clemens Wyss > wrote: > > G

AW: Higher scoring if term is at the beginning of a field/document

2011-05-04 Thread Clemens Wyss
want to do this? > What is the use-case you're trying to solve? Is relevance not what you want? > Are you just experimenting? > > The statement of *what* you want to do is clear, but I don't know an easy to > do that. Perhaps there's a better approach to solving the un

AW: AW: AW: AW: AW: "fuzzy prefix" search

2011-05-04 Thread Clemens Wyss
are after. > See how Solr uses it here: > http://search- > lucene.com/c/Solr:/src/java/org/apache/solr/analysis/EdgeNGramTokenizer > Factory.java||EdgeNGramTokenizer > > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem > search ::

Using Solr's (Auto)suggest with plain lucene

2011-05-05 Thread Clemens Wyss
I have implemented my index (in fact it's a plugable indexing API) in "plain Lucene". It tried to implement a term suggestion mechanism on my own, being not to happy so far. At http://search-lucene.com/m/0QBv41ssGlh/suggestion&subj=Auto+Suggest I have seen Solr's auto suggestion for search terms.

AW: Using Solr's (Auto)suggest with plain lucene

2011-05-06 Thread Clemens Wyss
> FSTLookupTest -- you can populate FSTLookup manually with terms/ phrases > from your index and then use the resulting automaton for suggestions. > > Dawid > > On Thu, May 5, 2011 at 2:54 PM, Clemens Wyss > wrote: > > > I have implemented my index (in fact it'

AW: Using Solr's (Auto)suggest with plain lucene

2011-05-06 Thread Clemens Wyss
> normalization at the time you query for suggestions. > 3. "http://search-lucene.com/m/586gA4ccL11";. I have no idea. > > Dawid > > On Fri, May 6, 2011 at 11:06 AM, Clemens Wyss > wrote: > > > I have come across TSTLookup. > > In which jar Do I find FSTL

AW: Using Solr's (Auto)suggest with plain lucene

2011-05-06 Thread Clemens Wyss
ll get (cased) suggestions back. > If you need cased suggestions, but provide normalized (lowercased) prefixes > you'll get nothing, although such a feature would be relatively easy to > implement based on the automaton code currently in the SVN. > > Dawid > > On Fri, May

AW: AW: AW: AW: AW: "fuzzy prefix" search

2011-05-06 Thread Clemens Wyss
sition, and found after a while that I didn't > really > want to go back FWIW. > > Best > Erick > > On Thu, May 5, 2011 at 2:26 AM, Clemens Wyss > wrote: > > What I am looking for is the autosuggestion implemented here (@solr) > > > > http://search-

Is there kind of a "NullAnalyzer" ?

2011-05-09 Thread Clemens Wyss
i.e. an analyzer which takes the field to be analyzed as is into the index...? The fields I am trying to index have a max length of 3 words and I don't want to match sub terms of these fields. - To unsubscribe, e-mail: java-user

AW: Is there kind of a "NullAnalyzer" ?

2011-05-09 Thread Clemens Wyss
Thx! > -Ursprüngliche Nachricht- > Von: Federico Fissore [mailto:feder...@fissore.org] > Gesendet: Montag, 9. Mai 2011 09:52 > An: java-user@lucene.apache.org > Betreff: Re: Is there kind of a "NullAnalyzer" ? > > Clemens Wyss, il 09/05/2011 09:42, ha scri

AW: Is there kind of a "NullAnalyzer" ?

2011-05-09 Thread Clemens Wyss
gt; eMail: u...@thetaphi.de > > > -Original Message- > > From: Clemens Wyss [mailto:clemens...@mysign.ch] > > Sent: Monday, May 09, 2011 9:43 AM > > To: java-user@lucene.apache.org > > Subject: Is there kind of a "NullAnalyzer" ? > > > > i.

boosting fields

2011-06-02 Thread Clemens Wyss
I have a minimal unit test in which I add three documents to an index. The documents have two fields "year" and "descritpion". doc1(year = "2007", desc = "text with 2007 and 2009") doc2(year = "2009", desc = "text with 2007 and 2009") doc3(year = "2008", desc = "text with 2007 and 2009") To searc

negative wildcard query

2011-06-29 Thread Clemens Wyss
Say I have a document with field "f1". How can I search Documents which have not "test" in field "f" I tried: -f: *test* f: -*test* f: NOT *test* but no luck. Using WildCardQuery class... Any advices? Thx Clemens - To unsubscri

AW: negative wildcard query

2011-06-29 Thread Clemens Wyss
rom, eg a MatchAllDocsQuery. > > karl > > 29 jun 2011 kl. 17.25 skrev Clemens Wyss: > > > Say I have a document with field "f1". How can I search Documents which > have not "test" in field "f" > > I tried: > > -f: *test* &

AW: negative wildcard query

2011-06-30 Thread Clemens Wyss
), filter, 10 ); The filter never ever lets any documents through...when calling result = indexSearcher.search( new WildcardQuery( new Term( "description", "*happy*" ) ), 10 ); I have hits... > -Ursprüngliche Nachricht- > Von: Clemens Wyss [mailto:clemens...

AW: negative wildcard query

2011-06-30 Thread Clemens Wyss
; H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Clemens Wyss [mailto:clemens...@mysign.ch] > > Sent: Thursday, June 30, 2011 9:44 AM > > To: java-user@lucene.apache.org > >

implicit closing of an IndexWriter

2011-07-26 Thread Clemens Wyss
Under which circumstances is an IndexWriter "implcitly" closed? I have an IndexWriter member in one of my helper classes which ist openened in the constructor. I never ever close this member explicitly. Nevertheless I encounter AlreadyClosedException's when writing through the IndexWriter ...

AW: implicit closing of an IndexWriter

2011-07-26 Thread Clemens Wyss
I am using Lucene 3.3 > -Ursprüngliche Nachricht- > Von: Mark Miller [mailto:markrmil...@gmail.com] > Gesendet: Dienstag, 26. Juli 2011 16:05 > An: java-user@lucene.apache.org > Betreff: Re: implicit closing of an IndexWriter > > > On Jul 26, 2011, at 9:5

AW: implicit closing of an IndexWriter

2011-07-26 Thread Clemens Wyss
ot;, > the original JVM problem itself is still there and cannot be fixed (if you > interrupt threads). > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > >

AW: implicit closing of an IndexWriter

2011-07-26 Thread Clemens Wyss
Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -Original Message- > > From: Clemens Wyss [mailto:clemens...@mysign.ch] > > Sent: Tuesday, July 26, 2011 4:27 PM > > To: java-user@lucene.apache.org > > Subject: AW: implicit closing of an IndexWr

AW: implicit closing of an IndexWriter

2011-07-26 Thread Clemens Wyss
Ok, I just read the java doc ... Is there a possibility to just revert the pending writes of an IR? > -Ursprüngliche Nachricht- > Von: Clemens Wyss [mailto:clemens...@mysign.ch] > Gesendet: Dienstag, 26. Juli 2011 17:25 > An: java-user@lucene.apache.org > Betreff: AW: impl

AlreadySetException ?

2011-10-24 Thread Clemens Wyss
I am seeing this stack trace in my logs: org.apache.lucene.util.SetOnce$AlreadySetException: The object cannot be set twice! at org.apache.lucene.util.SetOnce.set(SetOnce.java:69) at org.apache.lucene.index.MergePolicy.setIndexWriter(MergePolicy.java:271) at org.apache.luc

AW: AlreadySetException ?

2011-10-24 Thread Clemens Wyss
Writers. > > Clone it before, if you want to use it multiple times. > > > > Uwe > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > >> --

AW: AlreadySetException ?

2011-10-24 Thread Clemens Wyss
>Chuck Norris ... with his swiss army knife ... ;) Greetings from Switzerland - Clemens-having-a-swiss-army-knife-too > -Ursprüngliche Nachricht- > Von: Dawid Weiss [mailto:dawid.we...@gmail.com] > Gesendet: Montag, 24. Oktober 2011 17:01 > An: java-user@lucene.apache.org > Betreff: Re: A

Alternative for WildcardQuery with leading *

2012-12-07 Thread Clemens Wyss DEV
In order to provide suggestions our query also includes a "WildcardQuery with a leading *", which, of course, has a HUGE performance impact :-( E.g. Say we have indexed "vacancyplan", then if a user typed "plan" he should also be offered "vacancyplan" ... How can this feature be implemented wit

AW: Alternative for WildcardQuery with leading *

2012-12-07 Thread Clemens Wyss DEV
d) and then convert the query *plan to nalp* :). You can also index the suffixes of words, e.g. vacancyplan, acancyplan, cancyplan and so forth, and then convert the query *plan to plan. Note that it increases the lexicon ! Shai On Fri, Dec 7, 2012 at 11:16 AM, Clemens Wyss DEV wrote: > In

porting a cutsom Analyzer from 3.6 -> 4.0

2012-12-09 Thread Clemens Wyss DEV
I have a CustomAnalyzer which overrides "public final TokenStream tokenStream ( String fieldName, Reader reader )": @Override public final TokenStream tokenStream ( String fieldName, Reader reader ) { boolean fieldRequiresExactMatching = IndexManager.getInstance().isExactMatchField( fieldName );

Lucene (4.0), junit, failed to delete _0_nrm.cfs

2012-12-09 Thread Clemens Wyss DEV
I am (also) running lucene unit tests. In the teardown-method(@After) I (try to) delete the complete directory-folder. Unfortunately this does not always work. If not, the file _0_nrm.cfs (or _0.fdx) is the first to cause problems, i.e. is being "locked"... I do explicitly close the writers/read

AW: Lucene (4.0), junit, failed to delete _0_nrm.cfs

2012-12-09 Thread Clemens Wyss DEV
@lucene.apache.org Betreff: Re: Lucene (4.0), junit, failed to delete _0_nrm.cfs Can you post the source code for your test case? Mike McCandless http://blog.mikemccandless.com On Sun, Dec 9, 2012 at 11:45 AM, Clemens Wyss DEV wrote: > I am (also) running lucene unit tests. > > In the teardo

AW: Lucene (4.0), junit, failed to delete _0_nrm.cfs

2012-12-09 Thread Clemens Wyss DEV
, and extends LuceneTestCase, using newDirectory and so on. if you have files still open this will fail the test and give you a stacktrace of where you initially opened the file. On Sun, Dec 9, 2012 at 12:28 PM, Clemens Wyss DEV wrote: > Hi Mike, > unfortunately not. When I run the unit t

[lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-13 Thread Clemens Wyss DEV
I am facing the following stacktrace: java.lang.NullPointerException: null at java.io.File.(File.java:305) ~[na:1.6.0_26] at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:80) ~[lucene-core.jar:4.6.0 1543363 - simon - 2013-11-19 11:05:50]

AW: Issue with Lucene 3.6.1 and MMapDirectory

2014-05-14 Thread Clemens Wyss DEV
> But if I close if given that it is share by multiple threads I will need to > check each time >before doing the search if IndexReader is still open correct? You can make use of IndexReader#incRef/#decRef , i.e. ir.incRef(); try { Or maybe SearcherManager http://blog.mikemccandless.com/2011/09

AW: Issue with Lucene 3.6.1 and MMapDirectory

2014-05-14 Thread Clemens Wyss DEV
Not closing an IndexReader most probably (to say the least) results in a mem-leak -> OOM > But if I close if given that it is share by multiple threads I will >need to check each time before doing the search if IndexReader is still open >correct? You can make use of IndexReader#incRef/#decRef ,

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-15 Thread Clemens Wyss DEV
e Nachricht----- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Dienstag, 13. Mai 2014 18:23 An: java-user@lucene.apache.org Betreff: [lucene 4.6] NPE when calling IndexReader#openIfChanged I am facing the following stacktrace: java.lang.NullPointerException: null at jav

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-18 Thread Clemens Wyss DEV
ttp://blog.mikemccandless.com On Wed, May 14, 2014 at 2:16 AM, Clemens Wyss DEV wrote: > Tackled this down a little bit more: > Lucene40LiveDocsFormat#readLiveDocs calls > IndexFileNames#fileNameForGeneration > If I get this right, param 'gen' seems to be -1. > Gen is being gat

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-18 Thread Clemens Wyss DEV
g, 18. Mai 2014 16:51 An: Lucene Users Betreff: Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged But what is the output of "java -fullversion"? Mike McCandless http://blog.mikemccandless.com On Sun, May 18, 2014 at 5:24 AM, Clemens Wyss DEV wrote: >> What java versio

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-18 Thread Clemens Wyss DEV
e :) Possibly a concurrency/timing issue? -Ursprüngliche Nachricht----- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Montag, 19. Mai 2014 07:37 An: java-user@lucene.apache.org Betreff: AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged Sorry for being imprecise java ve

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-19 Thread Clemens Wyss DEV
ing files directly from the index directory yourself between reopens? Mike McCandless http://blog.mikemccandless.com On Mon, May 19, 2014 at 1:36 AM, Clemens Wyss DEV wrote: > Sorry for being imprecise > java version "1.6.0_26" > Java(TM) SE Runtime Environment (build 1

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-19 Thread Clemens Wyss DEV
ommit in order to see changes. What if I were to search right after deleteAll? -Ursprüngliche Nachricht- Von: Michael McCandless [mailto:luc...@mikemccandless.com] Gesendet: Montag, 19. Mai 2014 11:05 An: Lucene Users Betreff: Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged O

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-05-21 Thread Clemens Wyss DEV
cene 4.6] NPE when calling IndexReader#openIfChanged On Mon, May 19, 2014 at 6:14 AM, Clemens Wyss DEV wrote: > Mike, > first of all thanks for all your input, I really appreciate (as much as I > like reading your blog). You're welcome! >> Hmm, but you swap these files over w

AW: Analyzing suggester for many fields

2014-06-11 Thread Clemens Wyss DEV
Unfortunately the link provided by Goutham is no more valid. Anybody still got the code? -Ursprüngliche Nachricht- Von: Goutham Tholpadi [mailto:gtholp...@gmail.com] Gesendet: Donnerstag, 29. August 2013 06:21 An: java-user@lucene.apache.org Betreff: Re: Analyzing suggester for many fiel

AW: Analyzing suggester for many fields

2014-06-12 Thread Clemens Wyss DEV
liche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Mittwoch, 11. Juni 2014 12:57 An: java-user@lucene.apache.org Betreff: AW: Analyzing suggester for many fields Unfortunately the link provided by Goutham is no more valid. Anybody still got the code? -Ursprüng

AW: Analyzing suggester for many fields

2014-06-12 Thread Clemens Wyss DEV
enter InputIteratorWrapper ;) i.e. new InputIteratorWrapper(tfit ) -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Donnerstag, 12. Juni 2014 16:01 An: java-user@lucene.apache.org Betreff: AW: Analyzing suggester for many fields trying to re-build

AW: AW: Analyzing suggester for many fields

2014-06-12 Thread Clemens Wyss DEV
27;m doing something similar, adding weighting as some function of doc freq (and using Scala). Cheers, Neil On 13/06/14 00:19, Clemens Wyss DEV wrote: > enter InputIteratorWrapper ;) i.e. new InputIteratorWrapper(tfit ) > > -Ursprüngliche Nachricht----- > Von: Clemens Wyss DEV [mailt

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-06-13 Thread Clemens Wyss DEV
anged On Wed, May 21, 2014 at 3:17 AM, Clemens Wyss DEV wrote: >> Can you just decrease IW's ramBufferSizeMB to relieve the memory pressure? > +1 > Is there something alike for IndexReaders? No, although you can take steps during indexing to reduce the RAM required during sear

fuzzy/case insensitive AnalyzingSuggester )

2014-06-13 Thread Clemens Wyss DEV
Looking for an AnalyzingSuggester which supports - fuzzyness - case insensitivity - small (in memors) footprint (*) (*)Just tried to "hand" my big IndexReader (see oher post " [lucene 4.6] NPE when calling IndexReader#openIfChanged") into JaspellLookup. Got an OOM. Is there any (Jaspell)Lookup im

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-06-13 Thread Clemens Wyss DEV
| --- Does this help? -----Ursprüngliche Nachricht- Von: Michael McCandless [mailto:luc...@mikemc

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-06-15 Thread Clemens Wyss DEV
rsprüngliche Nachricht- Von: Michael McCandless [mailto:luc...@mikemccandless.com] Gesendet: Freitag, 13. Juni 2014 15:48 An: Lucene Users Betreff: Re: [lucene 4.6] NPE when calling IndexReader#openIfChanged On Fri, Jun 13, 2014 at 8:53 AM, Clemens Wyss DEV wrote: > Thanks a lot! >>

AW: [lucene 4.6] NPE when calling IndexReader#openIfChanged

2014-06-16 Thread Clemens Wyss DEV
- does this mean 5485824 bytes, or ~5.2 MB? This is probably "correct", meaning this is the RAM to hold the terms index. But I can't see from your heap dump output where the other ~51.3 MB is being used by StandardDirectoryReader.

IndexWriter#updateDocument(Term, Document)

2014-06-18 Thread Clemens Wyss DEV
I would like to perform a batch update on an index. In order to omit duplicate entries I am making use of IndexWriter#updateDocument(Term, Document) open an IndexWriter; foreach( element in elementsToBeUpdatedWhichHaveDuplicates ) { doc = element.toDoc(); indexWriter.updateDocument( uniqueTermFor

AW: IndexWriter#updateDocument(Term, Document)

2014-06-19 Thread Clemens Wyss DEV
case is supposed to work; if it doesn't it's a bad bug :) Can you reduce it to a small example? Mike McCandless http://blog.mikemccandless.com On Wed, Jun 18, 2014 at 10:08 AM, Clemens Wyss DEV wrote: > I would like to perform a batch update on an index. In order to omit >

AW: IndexWriter#updateDocument(Term, Document)

2014-06-19 Thread Clemens Wyss DEV
http://blog.mikemccandless.com On Thu, Jun 19, 2014 at 12:54 PM, Clemens Wyss DEV wrote: > directory = new SimpleFSDirectory( indexLocation ); IndexWriterConfig > config = new IndexWriterConfig(Version.LUCENE_47, new > WhitespaceAnalyzer( Version.LUCENE_47 )); indexWriter = new > Index

AW: fuzzy/case insensitive AnalyzingSuggester )

2014-06-20 Thread Clemens Wyss DEV
Sorry for re-asking. Has anyone implemented an AnalyzingSuggester which - is fuzzy - is case insensitive (or must/should this be implemented by the analyzer?) - does infix search [- has a small memory footprint] -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens

AW: fuzzy/case insensitive AnalyzingSuggester )

2014-06-22 Thread Clemens Wyss DEV
Having control over which suggester is used when, and how its specific suggestions are merged into the final result list, helps improving the user experience, at least with our use cases. Cheers, Oli -Original Message- From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Sent: Friday, J

QueryParserUtil, big query with wildcards -> runs endlessly and produces heavy load

2014-06-26 Thread Clemens Wyss DEV
The following "testcase" runs endlessly and produces VERY heavy load. ... String query = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut " + "labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et

[suggestions] fetch terms from a FilterAtomicReader(subclass)?

2014-10-26 Thread Clemens Wyss DEV
Is it possible to fetch the terms of a FilterAtomicReader in order to provide suggestions from a subset of all documents in an index? So my target is to "provide suggestions from a subset of all documents in an index". Note: I have an "equal" discussion ongoing in the solr-mailinglist. But I th

AW: [suggestions] fetch terms from a FilterAtomicReader(subclass)?

2014-10-27 Thread Clemens Wyss DEV
Betreff: Re: [suggestions] fetch terms from a FilterAtomicReader(subclass)? On 10/27/2014 07:32 AM, Clemens Wyss DEV wrote: > Is it possible to fetch the terms of a FilterAtomicReader in order to provide > suggestions from a subset of all documents in an index? Yes, it is possible. I do it by f

"batch-update"-pattern, NoMergeScheduler?

2014-12-22 Thread Clemens Wyss DEV
One of our indexes is updated completely quite frequently -> "batch update" or "re-index". If so more than 2million documents are added/updated to/in the very index. This creates an immense IO load on our system. Does it make sense to set merge scheduler to NoMergeScheduler (and/or MergePolicy

Looking for docs that have certain fields empty (an/or not set)

2015-01-07 Thread Clemens Wyss DEV
Say I wanted to find documents which have no content in "field1" (or dosuments that have no field 'field1'), wouldn't that be the following query? -field1:[* TO *] Thanks for you help Clemens - To unsubscribe, e-mail: java-user-

AW: Looking for docs that have certain fields empty (an/or not set)

2015-01-07 Thread Clemens Wyss DEV
query processing to handle this case, but not in the main query parser. Best, Erick On Wed, Jan 7, 2015 at 8:14 AM, Clemens Wyss DEV wrote: > Say I wanted to find documents which have no content in "field1" (or > dosuments that have no field 'field1'), wouldn&

AW: Looking for docs that have certain fields empty (an/or not set)

2015-01-07 Thread Clemens Wyss DEV
TO *] (That's asterisk:asterisk -field1:[* TO *] in case the silly list interprets the asterisks as markup) There's some special magic in filter query processing to handle this case, but not in the main query parser. Best, Erick On Wed, Jan 7, 2015 at 8:14 AM, Clemens Wyss DEV wrote: > Sa

howto: handle temporal visibility of a document?

2015-01-12 Thread Clemens Wyss DEV
We have documents that are not always visible (visiblefrom-visibleto). In order to not have to query the originating object of the document whether it is currently visible (after the query), we'd like to put metadata into the documents, so that the visibility can be determined at query-time (by

AW: howto: handle temporal visibility of a document?

2015-01-12 Thread Clemens Wyss DEV
) OR ( visiblefrom:[* TO ] AND visibleto:[ TO *]) -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Montag, 12. Januar 2015 09:40 An: java-user@lucene.apache.org Betreff: howto: handle temporal visibility of a document? We have documents that are not always vi

AW: AW: howto: handle temporal visibility of a document?

2015-01-12 Thread Clemens Wyss DEV
me, use maxlong. -Mike On 1/12/15 4:23 AM, Clemens Wyss DEV wrote: > I'll add/start with my proposal ;) > > Document-meta fields: > + visiblefrom [long] > + visibleto [long] > > Query or query filter: > (*:* -visiblefrom:[* TO *] AND -visibleto:[* TO *]) OR (*:* > -

RE: RE: howto: handle temporal visibility of a document?

2015-01-12 Thread Clemens Wyss DEV
ry would then be: visiblefrom:[0 TO ] AND visibleto:[ TO ] And a rather Solr'y question, nevertheless I ask it here: I intended to use this very query as query filter (qf), but I guess it doesn't make sense because '' changes at every call ;) -Ursprüngliche Nachrich

AW: fuzzy/case insensitive AnalyzingSuggester )

2015-01-24 Thread Clemens Wyss DEV
e there are good (but less popular) prefix hits. Having control over which suggester is used when, and how its specific suggestions are merged into the final result list, helps improving the user experience, at least with our use cases. Cheers, Oli -Original Message- From: Clemens Wyss D

LowercaseFilter, preserveOriginal?

2015-01-27 Thread Clemens Wyss DEV
Why does the LowecaseFilter, opposed to the ASCIIFoldingFilter, have no preserveOriginal-argument? I very much preserveOriginal="true" when applying the ASCIIFoldingFilter for (german)suggestions

AW: LowercaseFilter, preserveOriginal?

2015-01-27 Thread Clemens Wyss DEV
ereas I'd like to have the original only ... -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Dienstag, 27. Januar 2015 09:08 An: java-user@lucene.apache.org Betreff: LowercaseFilter, preserveOriginal? Why does the LowecaseFilter, opposed to

[tika] ForkParser, Lost connection to a forked server process

2015-02-17 Thread Clemens Wyss DEV
Sorry for cross-posting, but the tika-ml does not seem to be too "lively": I am trying to make use of the ForkParser. Unfortunately I am getting „Lost connection to a forked server process“ for an (encrypted) pdf which I can extract „in-process“. Extracting the document "in-process" takes appro

  1   2   >