Re: synonym question

2022-03-14 Thread Bernd Fehling
Hello, just a guess, have you tried escaping the space in your multi-word terms with backslash? isoweek,iso\ week Regards Bernd Am 14.03.22 um 15:54 schrieb Trevor Nicholls: I have technical data which I am querying with Lucene; one of the features of the content is that a large number of te

IndexWriter updateDocument is removing doc from index

2018-03-15 Thread Bernd Fehling
While writing some tools to build and maintain lucene indexes I noticed some strange behavior during testing. A doc disappears from lucene index while using IndexWriter updateDocument. The API of lucene 6.4.2 states: "Updates a document by first deleting the document(s) containing term and then ad

Re: howto get LongPoint stored

2017-10-25 Thread Bernd Fehling
ongField, isn't it? Regards Bernd Am 25.10.2017 um 12:17 schrieb Alan Woodward: > Hi Bernd, > > You add a separate StoredField with the same name. > >> On 25 Oct 2017, at 11:11, Bernd Fehling >> wrote: >> >> With Lucene 6.6.2 I'm tryin

howto get LongPoint stored

2017-10-25 Thread Bernd Fehling
With Lucene 6.6.2 I'm trying to get a LongPoint value indexed and stored. Old code: LegacyLongField dateField = new LegacyLongField("modified", lastModified, Field.Store.YES); Because LegacyLongField is deprecated I tried LongPoint. New code: LongPoint dateField = new LongPoint("modified", last

Re: Issue with installing PyLucene 6.5.0

2017-10-24 Thread Bernd Fehling
les (of the downloaded > conda package and the one in the source package) and they are identical. > So, I think the issue should be somewhere else, otherwise I would face the > same error while trying with conda-forge. No? > > Amin > > > On Tue, Oct 24, 2017 at 8:05 AM, Bern

Re: Issue with installing PyLucene 6.5.0

2017-10-23 Thread Bernd Fehling
Hi Amin, PRIxMAX is a "C" conversion specifier macro for integer type of uintmax_t. It looks like a bug in jcc3. The original code is: sprintf(buffer, "%0*"PRIxMAX, (int) hexdig, hash); Could be that a space between '"' and PRIxMAX is missing. A quick fix for testing could be either enter a spac

questions about xxxGraphFilter

2017-02-27 Thread Bernd Fehling
Now we have a SynonymGraphFilter but also need other filters to be graph-aware. I was already thinking about a ShingleGraphFilter. But if a ShingleGraphFilter outputs a graph and is located before SynonymGraphFilter where is the advantage? The SynonymGraphFilter cannot consume arbitrary graphs. Do

Re: SynonymFilterFactory deprecated since 6.4.0

2017-02-13 Thread Bernd Fehling
, the API docs say "...Injecting synonyms – here, synonyms of a token should be added after that token..." But as I already mentioned the synonyms are added before the token. Are the docs outdated? Regards Bernd Am 13.02.2017 um 17:31 schrieb Michael McCandless: > On Mon,

Re: SynonymFilterFactory deprecated since 6.4.0

2017-02-13 Thread Bernd Fehling
moving the SPF filters in your test? Or otherwise > simplify your test so it's closer to what my test case is doing? > > Mike McCandless > > http://blog.mikemccandless.com > > On Mon, Feb 13, 2017 at 7:52 AM, Michael McCandless > wrote: >> Thanks Bernd; I'

Re: SynonymFilterFactory deprecated since 6.4.0

2017-02-13 Thread Bernd Fehling
If you use only > whitespace tokenizer and SGF does the issue reproduce? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Feb 10, 2017 at 10:07 AM, Bernd Fehling > wrote: >> Example for position end and positionLength of SGF. >> >&g

Re: SynonymFilterFactory deprecated since 6.4.0

2017-02-10 Thread Bernd Fehling
cCandless: > On Thu, Feb 9, 2017 at 2:40 AM, Bernd Fehling > wrote: >> I tried SynonymGraphFilter with my setup and it works right away. >> It payed of that I did some modifications on my filters while >> testing 6.3 with my setup. > > Good! > >>

Re: SynonymFilterFactory deprecated since 6.4.0

2017-02-08 Thread Bernd Fehling
; SynonymGraphFilter will produce a correct graph (unlike SynonymFilter) > and the Lucene query parsers (not sure about Solr's query parser fork) > will correctly detect the graph and create the right query. > > Mike McCandless > > http://blog.mikemccandless.com > > >

Re: SynonymFilterFactory deprecated since 6.4.0

2017-02-07 Thread Bernd Fehling
ep doing what you are doing today, you should switch > to SynonymGraphFilter followed by FlattenGraphFilter: it will make the > same tokens as the current SynonymFilter, but will necessarily be > buggy in the multi-token case. > > Mike McCandless > > http://blog.mikemccandless.

SynonymFilterFactory deprecated since 6.4.0

2017-02-07 Thread Bernd Fehling
I just tried Solr 6.4.1 and noticed that SynonymFilterFactory is deprecated, as reported in the logs. I hope that this is just to note that there is also an alternative SynonymGraphFilterFactory now available. And _not_ that SynonymFilterFactory will disappear, because it runs my multi-word Synon

Re: enhancement for SynonymFilter

2016-11-18 Thread Bernd Fehling
Am 18.11.2016 um 08:58 schrieb Bernd Fehling: > Hi Mike, > > let me explain. > > First, after looking deeper inside I noticed that the Filters are used > like a stack and called backwards. So the first incrementToken goes > to the last filter in the chain. That one also us

Re: enhancement for SynonymFilter

2016-11-18 Thread Bernd Fehling
ou know it > spanned "wow", "that's", "funny". > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, Nov 17, 2016 at 10:22 AM, Bernd Fehling > wrote: >> Currently I'm tackling a problem with SynonymFilter wh

enhancement for SynonymFilter

2016-11-17 Thread Bernd Fehling
Currently I'm tackling a problem with SynonymFilter while going from 4.10.4 to 6.3.0. For a special solution I need to know if a word (or multiword) is producing synonyms in SynonymFilter. Therefore I suggest the enhancement of "hasSynonyms" for SynonymFilter. A workaroud would be to buffer all

Re: no concurrent merging?

2016-08-04 Thread Bernd Fehling
why. I think you should ask >> on the solr-user list? >> >> Or maybe try to change your deletes to be by Term instead of Query? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Thu, Aug 4, 2016 at 7:03 AM, Bernd Fehling

no concurrent merging?

2016-08-04 Thread Bernd Fehling
While increasing the indexing load of version 5.5.3 I see threads where one merging thread is blocking other merging threads. But is this concurrent merging? Bernd "Lucene Merge Thread #6" - Thread t@40280java.lang.Thread.State: BLOCKED at org.apache.lucene.index.IndexWriter.mergeMiddle(I

Re: BufferedUpdateStreams breaks high performance indexing

2016-08-04 Thread Bernd Fehling
eted queries are when you delete by query, but I don't think DIH would > be doing that unless you asked it to ... maybe a Solr user/dev knows better? > > Mike McCandless > > http://blog.mikemccandless.com > > On Fri, Jul 29, 2016 at 3:21 AM, Bernd Fehling < > bern

Re: BufferedUpdateStreams breaks high performance indexing

2016-07-29 Thread Bernd Fehling
Can you revert that and re-test? > > I'm not sure why DIH is using updateDocument instead of addDocument ... > maybe ask on the solr-user list? > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Jul 28, 2016 at 10:07 AM, Bernd Fehling < > bernd.feh

Re: BufferedUpdateStreams breaks high performance indexing

2016-07-28 Thread Bernd Fehling
g > https://issues.apache.org/jira/browse/LUCENE-6161 > > Have you changed any IndexWriterConfig settings from defaults? > > What are your unique id fields like? How many bytes in length? > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Jul 28,

BufferedUpdateStreams breaks high performance indexing

2016-07-28 Thread Bernd Fehling
While trying to get higher performance for indexing it turned out that BufferedUpdateStreams is breaking indexing performance. public synchronized ApplyDeletesResult applyDeletesAndUpdates(...) At IndexWriterConfig I have setRAMBufferSizeMB=1024 and the Lucene 4.10.4 API states: "Determines the a

remove duplicates from MuliPhraseQuery

2014-07-29 Thread Bernd Fehling
Hi list, can anyone give some hints about removing duplicates from a MultiPhraseQuery? I have the list with: List termarray = (MultiPhraseQuery) myquery).getTermArrays(); But the lucene javadocs have only add, no remove or delete. Only idea so far is to build a temporary MultiPhraseQuery and it

why different type attributes?

2013-09-27 Thread Bernd Fehling
This question might be stupid, but why are there different type attributes? We have , , , ... but also "word", "shingle", ... Why not , , ...??? Is there a deeper logic behind this or just historically grown and not yet unified? Regards Bernd --

Re: Lucene 4.3.1 CheckIndex limitation 100 trillion tokens?

2013-08-08 Thread Bernd Fehling
Hi Tom, I just see that you have Linux with 2.6 kernel. Have you already -XX:+UseLargePages as performance option enabled and in use? Solaris 9 has it on by default but with Linux HugePages must be enabled. http://www.oracle.com/technetwork/java/javase/tech/largememory-jsp-137182.html Just an id

Lucene version naming of index files

2013-03-14 Thread Bernd Fehling
Hi list, a stupid question about the naming of the index files. While using lucene (and solr) 4.2 I still see files with "Lucene41" in the name. This is somewhat confusing if lucene 4.x produces files with "Lucene4y". This also means indexes built with 4.2 or 4.3 are fully compatible with 4.1 ? R

Re: com.sun.jdi.InvocationException occurred invoking method

2012-11-14 Thread Bernd Fehling
/4123628/com-sun-jdi-invocationexception-occurred-invoking-method Regards Bernd Am 14.11.2012 14:19, schrieb Robert Muir: > On Wed, Nov 14, 2012 at 4:04 AM, Bernd Fehling > wrote: >> Hi list, >> while walking through the code with debugger (eclipse juno) I get

Re: com.sun.jdi.InvocationException occurred invoking method

2012-11-14 Thread Bernd Fehling
-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] >> Sent: Wednesday, November 14, 2012 1:18 PM >> To: java-user@lucene.apache.org >> Su

Re: com.sun.jdi.InvocationException occurred invoking method

2012-11-14 Thread Bernd Fehling
While inspecting the content of topDocs.ScoreDoc I see 4 variables: - doc - fields - score - shardIndex But ScoreDoc knows only about 3 (doc, score, shardIndex) is this the problem? Regards Bernd Am 14.11.2012 13:04, schrieb Bernd Fehling: > Hi list, > while walking through the cod

com.sun.jdi.InvocationException occurred invoking method

2012-11-14 Thread Bernd Fehling
Hi list, while walking through the code with debugger (eclipse juno) I get the following: com.sun.jdi.InvocationException occurred invoking method. This is while trying to see org.apache.lucene.search.ScoreDoc So the debugger seams to have a problem with the toString() of ScoreDoc.java which looks

Re: content disappears in the index

2012-11-13 Thread Bernd Fehling
gt; easy custom filter to create though >> >> FWIW, >> Erick >> >> >> On Tue, Nov 13, 2012 at 7:02 AM, Robert Muir wrote: >> >>> On Mon, Nov 12, 2012 at 10:47 PM, Bernd Fehling >>> wrote: >>>> By the way, why does Tri

Re: content disappears in the index

2012-11-12 Thread Bernd Fehling
what I want. > > Found in a fortune cookie according to legend: > "A programmer had a problem. He solved it with regular expressions. Now he > has two problems". > > > > > On Mon, Nov 12, 2012 at 9:04 AM, Bernd Fehling < > bernd.fehl...@uni-bielefeld.de> wr

Re: content disappears in the index

2012-11-12 Thread Bernd Fehling
moen, Ingar ; Hauklien, Øystein ; Hedalen, Trond ; Kvam, Erik" --> "brennmoeningarhauk" Now this explains the sorting (shit in --> shit out). But why is the first string reduced to "a", wrong regular expression? Bernd Am 12.11.2012 14:51, schrieb Bernd Feh

Re: content disappears in the index

2012-11-12 Thread Bernd Fehling
box and you should see which of the > steps does the translation. Although changing it to "a" is really weird, > it's almost certainly something you've defined in the indexing analysis > chain. > > FWIW, > Erick > > > On Mon, Nov 12, 2012 at 8:19 A

content disappears in the index

2012-11-12 Thread Bernd Fehling
Hi list, a user reported wrong sorting of our search service running on solr. While chasing this issue I traced it back through lucene into the index. I have a text field for sorting (stored,indexed,tokenized,omitNorms,sortMissingLast) and three docs with author names. If I trace at org.apache.lu

Re: howto run CheckIndex on huge index size

2012-08-15 Thread Bernd Fehling
post from docs or create a copy of the > page inside lucene's distribution! > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> Fr

Re: howto run CheckIndex on huge index size

2012-08-15 Thread Bernd Fehling
se-lucenes-mmapdirectory-on-64bit.html > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] >

howto run CheckIndex on huge index size

2012-08-15 Thread Bernd Fehling
I'm trying to run CheckIndex as seperate tool on a large index to get nice infos about number of terms, number of tokens, ... but always get OOM exception. Already have JAVA_OPTS -d64 -Xmx25g -Xms25g -Xmn6g Any idea how to use CheckIndex on huge index size? Opening index @ /srv/www/solr/sol

Re: change of API Javadoc interface funtionality in 4.0.x

2012-07-19 Thread Bernd Fehling
LUCENE-4237 - add ant task to generate optionally ALL javadocs https://issues.apache.org/jira/browse/LUCENE-4237 Am 19.07.2012 07:59, schrieb Robert Muir: > On Thu, Jul 19, 2012 at 1:53 AM, Bernd Fehling > wrote: >> ... >> Robert Muir added a comment - 12/Apr/12 16:24 >

Re: change of API Javadoc interface funtionality in 4.0.x

2012-07-18 Thread Bernd Fehling
... Robert Muir added a comment - 12/Apr/12 16:24 We can save 10MB with this patch, which nukes the 'index'. I guarantee you nobody will miss it. Just click this thing and see how useless it is (since its every method etc in all of lucene). ... Yeah, "nobody will miss it" and "see how useless it i

change of API Javadoc interface funtionality in 4.0.x

2012-07-18 Thread Bernd Fehling
Dear developers, while upgrading from 3.6.x to 4.x I have to rewrite some of my code and search for the new methods and/or classes. In 3.6.x and older versions the API Javadoc interface had an "Index" which made it easy to find the appropriate methods. The button to call the "Index" was located in

query switched filters

2011-10-04 Thread Bernd Fehling
Dear list, I'm in the need of query switched filters (to turn filters on and off by query parameter). I've already send my idea to the solr list and asked for opinions, but no complains from there. http://lucene.472066.n3.nabble.com/skipping-parts-of-query-analysis-for-some-queries-td3382239.h

Re: Lucene Architecture/Documentation Site

2011-08-30 Thread Bernd Fehling
Hi Vineet, nice site and documentation, but what is the sence of "sign-up" and "login"? Regards Bernd Am 30.08.2011 22:28, schrieb Vineet Sinha: Hey guys, We have been working hard on building a helpful site for Lucene Architecture and Documentation. We have been updating the content and work

Re: questions about fieldCache

2011-06-22 Thread Bernd Fehling
. You could try attaching to the Solr instance with jConsole and use that to trigger garbage collections to see what that could tell you... Best Erick On Tue, Jun 21, 2011 at 8:39 AM, Bernd Fehling wrote: Currently I'm using version 3.2. I used already 4.x some month ago but there was to

Re: questions about fieldCache

2011-06-21 Thread Bernd Fehling
c). Best Erick On Tue, Jun 21, 2011 at 5:32 AM, Bernd Fehling wrote: I'm trying to understand the logic of/behind fieldCache. Who has written this peace of code or has good knowledge about it? Why is it under the hood of jetty? I see FieldCache$StringIndex with - f_dccollection - f_d

questions about fieldCache

2011-06-21 Thread Bernd Fehling
I'm trying to understand the logic of/behind fieldCache. Who has written this peace of code or has good knowledge about it? Why is it under the hood of jetty? I see FieldCache$StringIndex with - f_dccollection - f_dcyear - f_dctype but also - dctitle --> f_dctitle --> f_dccreator - title --> f_

Re: QueryValidator

2011-05-05 Thread Bernd Fehling
punctuation, and try again 3) rewrite the query, quoting all punctuation, and try again would that work for you? On 5/5/2011 3:26 AM, Bernd Fehling wrote: Dear list, I need a QueryValidator and don't mind writing one but don't want to reinvent the wheel in case there is already some

QueryValidator

2011-05-05 Thread Bernd Fehling
Dear list, I need a QueryValidator and don't mind writing one but don't want to reinvent the wheel in case there is already something. Is this the right list for talking about a QueryValidator or should it belong to SOLR? What do I mean with a QueryValidator? I think about something like valida

Re: questions about the index

2011-05-03 Thread Bernd Fehling
o removing replicateAfter startup removes the write.lock when starting with an optimized index and replication on a master. To solve this tiny issue I would recommend to also send an optimize after sending a commit if the index has state optimize=true. Bernd Am 03.05.2011 09:22, schrieb Bernd Fehling

Re: questions about the index

2011-05-03 Thread Bernd Fehling
be somewhere around the DeletionPolicy... Regards, Bernd Am 02.05.2011 17:45, schrieb Michael McCandless: On Mon, May 2, 2011 at 9:17 AM, Bernd Fehling wrote: Dear list, some questions about the index. (questions go to the lucene list because it is more about the index itself) First my

questions about the index

2011-05-02 Thread Bernd Fehling
Dear list, some questions about the index. (questions go to the lucene list because it is more about the index itself) First my results from CheckIndex: Segments file=segments_l6 numSegments=1 version=FORMAT_3_1 [Lucene 3.1] Checking only these segments: _79s: 1 of 1: name=_79s docCount=28146

RE: which unicode version is supported with lucene

2011-02-27 Thread Bernd Fehling
rt the bugs in Jetty to Jetty itself > > Thanks, > Uwe! > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Bernd Fehling [mailto:bernd.fehl...@uni-b

Re: which unicode version is supported with lucene

2011-02-27 Thread Bernd Fehling
Hi Robert, thanks to you and Yonik for looking into this. As soon as Apache jira is back online I will try your jetty version and give feedback. Regards, Bernd > On Fri, Feb 25, 2011 at 9:09 AM, Bernd Fehling > wrote: > > Hi Yonik, > > > > good point, yes we are using J

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
:09 AM, Bernd Fehling > wrote: >> Hi Yonik, >> >> good point, yes we are using Jetty. >> Do you know if Tomcat has this limitation? > > Tomcat's defaults are worse - you need to configure it to use UTF-8 by > default for URLs. > Once you do, it passes

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
Hi Yonik, good point, yes we are using Jetty. Do you know if Tomcat has this limitation? Regards, Bernd Am 25.02.2011 14:54, schrieb Yonik Seeley: > On Fri, Feb 25, 2011 at 8:48 AM, Bernd Fehling > wrote: >> So Solr trunk should already handle Unicode above BMP for field

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
3 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] >> Sent: Friday, February 25, 2011 2:19 PM >> To: simon.willna...@gmail.com >> Cc: java-user@lucene.apache.org &

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
utf-8 code. Regards, Bernd Am 25.02.2011 13:43, schrieb Simon Willnauer: > On Fri, Feb 25, 2011 at 1:02 PM, Bernd Fehling > wrote: >> Hi Simon, >> >> thanks for the details. >> >> My platform supports and uses code above BMP (0x1 and up). >> So t

Re: which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
ll be available? Regards, Bernd Am 25.02.2011 12:04, schrieb Simon Willnauer: > Hey Bernd, > > On Fri, Feb 25, 2011 at 11:23 AM, Bernd Fehling > wrote: >> Dear list, >> >> a very basic question about lucene, which version of >> unicode can be handled (indexed and s

which unicode version is supported with lucene

2011-02-25 Thread Bernd Fehling
Dear list, a very basic question about lucene, which version of unicode can be handled (indexed and searched) with lucene? It looks like lucene can only handle the very old Unicode 2.0 but not the newer 3.1 version (4 byte utf-8 unicode). Is that true? Regards, Bernd --

Re: index files naming

2011-01-03 Thread Bernd Fehling
Hi Simon, thanks a lot for your good explanation. Best wishes, Bernd Am 03.01.2011 13:51, schrieb Simon Willnauer: > Hey Bernd, > > On Mon, Jan 3, 2011 at 1:35 PM, Bernd Fehling > wrote: >> Dear list, >> >> some questions about the names of the index files. &g

index files naming

2011-01-03 Thread Bernd Fehling
Dear list, some questions about the names of the index files. With an older Lucene/Solr 4.x version from trunk my index looks like: _2t1.fdt _2t1.fdx _2t1.fnm _2t1.frq _2t1.nrm _2t1.prx _2t1.tii _2t1.tis segments_2 segments.gen With a most recent version from trunk it looks like: _3a9.fdt _3a9.fd

Re: not indexing analyzed field

2010-11-28 Thread Bernd Fehling
t any result! > > I'd suggest to read a book about Lucene/Solr first :-) > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Bernd

Re: not indexing analyzed field

2010-11-26 Thread Bernd Fehling
ng the encoded value, > just don't > store it. > You can still search on the encoded value if in that case > > Which is a way of saying that I don't know, off the top of my > head, how > you'd > index one thing and store the result of analysis... > >

Re: not indexing analyzed field

2010-11-26 Thread Bernd Fehling
y", > your > display would be something like "run on empti". > > And if you're doing pure lucene, you can see this by enumerating the terms > in your > dcdocid field. > > Best > Erick > > On Fri, Nov 26, 2010 at 2:10 AM, Bernd Fehling < &

Re: not indexing analyzed field

2010-11-25 Thread Bernd Fehling
tokens go of course through your analyzer and the returned tokens > are indexed as terms. Where is the problem? > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message--

Re: not indexing analyzed field

2010-11-25 Thread Bernd Fehling
irst, I'd be sure the value in question is in the document just before > sending it to be added to your index to see if the value you think > is in there really is. Something like Document.get() and see if > > Best > Erick > > On Thu, Nov 25, 2010 at 8:08 AM, Bernd Feh

not indexing analyzed field

2010-11-25 Thread Bernd Fehling
I used KeywordAnalyzer and KeywordTokenizer as templates for a new analyzer. The analyzer works fine but the result never reaches the index. My analyzer is called in "DocInverterPerField.processFields" with "stream.incrementToken()". ... try { boolean hasMoreTokens = stream.incrementToken();