Hi again,
I found the difference causing the slow down.
It's NRTCachingDirectory#doCacheWrite method.
With the implementation of 8.8 it's slow. With the version of 8.3 it's fast.
Hope it helps,
Markus
-Original Message-
From: Gietzen, Markus
Sent: Wednesday, 19
fine. Now 8.8 performs as fast as 8.3!
I will check the differences and put them in step by step to find out which
change causes the slow-down.
I’ll report here.
Bye,
Markus
From: Michael McCandless
Sent: Wednesday, 19 May 2021 13:39
To: Lucene Users ; Gietzen, Markus
Subject: Re
WindowsNativeDispatcher.CreateFile0
Add the end of the mail I added two example-stacktraces that show this behavior.
Has someone an idea what change might cause this or if I need to do something
different in 8.8 compared to 8.3?
Thanks for any help,
Markus
Here is an example stacktrace that is causing such a try
Hello Michael,
For the case of normalizing ü to ue, take a look at the german normalizer [1].
Regards,
Markus
[1]
https://lucene.apache.org/core/7_6_0/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html
-Original message-
> From:Ralf Heyde
>
Hello Adrian,
I opened LUCENE-8741 ClassCastException in ValueSource$ScoreAndDoc.
Thanks,
Markus
https://issues.apache.org/jira/browse/LUCENE-8741
-Original message-
> From:Adrien Grand
> Sent: Tuesday 26th March 2019 18:58
> To: Lucene Users Mailing List
> Subjec
his a known issue?
Thanks!
Markus
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Please let me know.
Thanks,
Markus
-Original message-
> From:Markus Jelsma
> Sent: Friday 8th February 2019 11:08
> To: java-user@lucene.apache.org
> Subject: Query-of-Death Lucene/Solr 7.6
>
> Hello,
>
> While working on SOLR-12743, using 7.6 on two nodes
produces just a 9 MB toString() for the query.
I could not find anything like this in Jira. I did think of LUCENE-8479 and
LUCENE-8531 but they were about graphs, this problem looked related though.
Existing issue? New bug?
Many thanks,
Markus
ps. in Solr i even got an
Hello Baris,
The expand parameter defaults to true, so you should not have to add both
rules. If you are using Solr, you can easily check it in the analysis tab. If
not, printing the resulting Query object works as well.
Regards,
Markus
-Original message-
> From:baris
Hello Baris,
Check out the filter factory and the map parser for a more low level example:
https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymGraphFilterFactory.java
https://github.com/apache/lucene-solr/blob/master/lucene/a
larissues" with "similar issues" (and vice
versa) you might want to check out DictionaryCompoundWordTokenFilter and/or
HyphenationCompoundWordTokenFilter. Although English hardly uses compound
words, the token filters still do their job quite nicely.
Regards,
Markus
-Origin
Query could i borrow, and what not?
Or, if there is a better way, should i instead try to add payload support to an
extended SynonymQuery, would that be easier? And how should i do that?
What would be the best to tackle this issue?
Many thanks,
Markus
-Original message-
> From:Al
would also cause both clauses
to score if they match.
So, how can i transform a SynonymQuery into something that i can wrap into
PayloadScoreQuery on Lucene/Solr 7.x?
Many thanks,
Markus
-
To unsubscribe, e-mail: java-user
ere any real solutions to this problem? Removing the
RemoveDuplicates filter looks really silly.
Many thanks!
Markus
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Sorry, i would if i were on Github, but i am not.
Thanks again!
Markus
-Original message-
> From:Uwe Schindler
> Sent: Saturday 16th September 2017 12:45
> To: java-user@lucene.apache.org
> Subject: RE: German decompounding/tokenization with Lucene?
>
> Send a pull re
Hello Uwe,
Thanks for getting rid of the compounds. The dictionary can be smaller, it
still has about 1500 duplicates. It is also unsorted.
Regards,
Markus
-Original message-
> From:Uwe Schindler
> Sent: Saturday 16th September 2017 12:16
> To: java-user@lucene.apache.org
which
CharFilter provides. But that won't allow you to set TypeAttribute. Perhaps i
am missing something completely and am stupid, probably :)
Thanks,
Markus
-Original message-
> From:Tommaso Teofili
> Sent: Wednesday 14th June 2017 23:49
> To: java-user@lucene.apache.org
&
Hello Erick, no worries, i recognize you two.
I will take a look at your references tomorrow. Although i am still fine with
eight bits, i cannot spare any more but one. If Lucene allows us to pass longer
bitsets to the BytesRef, it would be awesome and easy to encode.
Thanks!
Markus
mited to 8 bits. Although we can
easily fit our reduced treebank in there, we also use single bits to signal for
compound/subword, and stemmed/unstemmed and some others.
Hope this helps.
Regards,
Markus
-Original message-
> From:Erik Hatcher
> Sent: Wednesday 14th June 2017
use
spans and phrase queries to find chunks of multiple POS-tags.
This would be the first approach i can think of. Treating them as regular
tokens enables you to use regular search for them.
Regards,
Markus
-Original message-
> From:José Tomás Atria
> Sent: Wednesday 14t
Ok, we decided not to implement PositionLengthAttribute for now due to, it
either is a bad applied (how could one even misapply that attribute?) or Solr's
QueryBuilder has a weird way of dealing with it or.. well.
Thanks,
Markus
-Original message-
> From:Markus Jelsma
> S
Hello again, apologies for cross-posting and having to get back to this
unsolved problem.
Initially i thought this is a problem i have with, or in Lucene. Maybe not, so
is this problem in Solr? Is here anyone who has seen this problem before?
Many thanks,
Markus
-Original message
query
time seems to be a problem.
Any thoughts on this issue? Is it a bug? Do i not understand
PositionLengthAttribute? Why does it affect term/document matching? At query
time but not at index time?
Many thanks,
Markus
---
official, second is old but maybe still relevant. Please not this is
usually not to be used in production.
Regards,
Markus
-Original message-
> From:Anthony Van
> Sent: Wednesday 8th February 2017 22:51
> To: java-user@lucene.apache.org
> Subject: Lucene
>
> Good
Yes, they should be the same unless the field is indexed with shingles, in that
case order matters.
Markus
-Original message-
> From:Julius Kravjar
> Sent: Monday 16th January 2017 18:20
> To: java-user@lucene.apache.org
> Subject: question
>
> May I have one que
ays
the length of the original term. So if a user queries for a sigular term, the
whole plural (original) is highlighted.
Am i missing something? Bug?
Thanks,
Markus
-
To unsubscribe, e-mail: java-user-unsubscr...@
s T and Z are somehow lowercased by the query
parser.
I feel incredible stupid so many thanks in advance!
Markus
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
does?
In what class has CharacterUtils changed its name to? Is it still usable for
extending parties?
Thanks,
Markus
-Original message-
> From:Uwe Schindler
> Sent: Wednesday 21st September 2016 13:30
> To: java-user@lucene.apache.org
> Subject: RE: Upgrade 6.2
.util
Is there a Jira a have missed?
Many thanks,
Markus
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Thanks for pointing to that issue. It also explains other errors.
Markus
-Original message-
> From:Uwe Schindler
> Sent: Wednesday 31st August 2016 11:32
> To: java-user@lucene.apache.org
> Cc: 'Michael McCandless'
> Subject: RE: LowerCaseFilter gone in 6.2
Hello - i'm upgrading a project that uses Lucene to 6.2.0 and get the compile
error that LowerCaseFilter does not exists. And, so it seems, the JavaDoc is
gone too. I've checked CHANGES.txt and there is no mention of it, not even in
the API changes section.
Any ideas?
Thanks,
Mar
doesn't exceed docCount.
I'd like to try DFISimilarity and ClassicSimilarity as well, but for some
reason the unit tests do not accept the similarity defined in the test's
schema.xml?!
Thanks!
Markus
-Original message-
> From:Ahmet Arslan
> Sent: Tuesday
uted from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
2.0 = avgFieldLength
2.56 = fieldLength
What am i doing wrong? Or did i catch a bug?
Thanks,
Markus
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Hi - if you don't want specific words passed through a stemmer, you need to
supply a CharArraySet with exclusions as the second argument to its constructor.
Markus
-Original message-
> From:Dwaipayan Roy
> Sent: Monday 14th March 2016 15:31
> To: java-user@lucene.apache
Thanks, i missed that! Glad its already resolved.
Markus
-Original message-
> From:Ishan Chattopadhyaya
> Sent: Thursday 21st January 2016 12:01
> To: java-user@lucene.apache.org
> Subject: Re: Jira issue for possibly transient resource issue, or a Lucene or
> JVM b
Hi - we get the above issue as well some times. I've noticed Lucene-dev mails
on this issue [1] but i couldn't find a corresponding Jira issue? Any pointer
to that one?
Many thanks,
Markus
[1]
http://mail-archives.apache.org/mod_mbox/lucene-dev/201601.mbox/%3CCAPsWd+OWZpRLXCyX
ition()? Is
rewrite going to be called at some point where i can return a new Query object
with decreased boost?
Thanks,
Markus
-Original message-
> From:Adrien Grand
> Sent: Thursday 17th December 2015 14:40
> To: solr-user ; java-user@lucene.apache.org
> Subject: Re: propag
unit test at the
point i want to retrieve docs and assert their positions in the result set:
ScoreDoc[] docs = searcher.search(spanfirstquery, 10).scoreDocs;
I am probably missing something but any ideas to share?
Many thanks!
Markus
Mit freundlichen Grüßen,
Markus Boese
> Hi Markus,
what is the logic behind your query parser?
How the query is expected to be rewritten ?
I've never seen that kind of rewritten query, but if you tell us what you
are expecting to rewrite, maybe would be easier to hel
abcd[ , 1] +f:1'
Could anyone explain what lucene whats to tell me with '[ ,1]' ?
I know lucene supports range queries but there are contains something like
this '[1 TO 4]', thus no comma included...
--
Regards,
Markus Boese
Hi,
I sometimes get FileNotFoundExceptions from the recovery of a core in my
log. Does anyone know the reason for this? As I understand Solr this may
(or should) not happen.
Markus
2015-08-04
15:06:07,646|INFO|mpKPXpbUwp|org.apache.solr.update.UpdateLog|Starting to
buffer updates. FSUpdateLog
uot; - scoring works fine
"Tetra*" - here, I get all the same scores.
I am building an auto-suggest, based on ontology terms. Scoring is crucial
there, and also, that I find parts of words.
Markus
Simplified test code:
public void simple(String inp) throws IOException
{
try
{
t an
identical score of:
1.4142135
What could be the problem?
Some of my code:
...
FieldType ft=new FieldType();
ft.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
ft.setStored(true);
ft.setTokenized(true);
Field f=new Field(name, value, ft);
f.setBoost(0.001f);
doc.add(f);
...
Markus
Hi Uwe,
You're right. Although using the analysis package won't hurt the index, this
case is evidence that it's a bad thing, especially if no backport is made. I'll
port my code to use the updated API of 5.0.
Thanks guys,
Markus
-Original message-
> Fro
Maven against the most recent release of Solr and/or Lucene. If that
stays a problem we may have to build stuff against branch_4x instead.
Thanks,
Markus
-Original message-
> From:Uwe Schindler
> Sent: Thursday 30th January 2014 11:18
> To: java-user@lucene.apache.org
> Su
.x we must override
that specific method: analyzer is not abstract and does not override abstract
method createComponents(String,Reader) in Analyzer :)
So, any hints on how to deal with this thing? Wait for 4.x backport of 5388, or
do something clever like <...> fill in the blanks.
Man
nyone here that can
shed some light on this?
Thanks,
Markus
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
bits without copying
the rest of the stuff around?
Thanks,
Markus
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
https://issues.apache.org/jira/browse/SOLR-4032
-Original message-
> From:Mark Miller
> Sent: Sat 03-Nov-2012 14:20
> To: java-user@lucene.apache.org
> Subject: Re: "read past EOF" when merge
>
> Can you file a JIRA Markus? This is probably related
No this is not using NFS but EXT3 on SSD.
Thanks
-Original message-
> From:Michael McCandless
> Sent: Fri 02-Nov-2012 16:22
> To: java-user@lucene.apache.org
> Subject: Re: "read past EOF" when merge
>
> On Fri, Nov 2, 2012 at 6:53 AM, Markus Jelsma
&
nHandler$3.write(ReplicationHandler.java:932)
Markus
-Original message-
> From:Michael McCandless
> Sent: Fri 02-Nov-2012 11:46
> To: java-user@lucene.apache.org
> Subject: Re: "read past EOF" when merge
>
> Are you able to reproduce the corruption?
&
-
> From:Thomas Matthijs
> Sent: Thu 04-Oct-2012 15:55
> To: java-user@lucene.apache.org
> Subject: Re: Highlighter IOOBE with modified
> HyphenationCompoundWordTokenFilter
>
> And to include the code
>
> On Thu, Oct 4, 2012 at 3:52 PM, Markus Jelsma
> wrote:
> &
eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> .
>
> Anyone to point me in the right direction? I've checked the LIA book on how
> to manipulate the tokenstream and thought it
to
manipulate the tokenstream and thought it should be alright. My analysis tests
also yield good results, nothing strange to be found. Or could it be an error
in the highlighter that only now shows up?
Thanks,
Markus
-
To
You should ask on the Droids list but there's some activity in Jira. And did
you consider Apache Nutch?
On Tuesday 23 August 2011 10:17:50 Li Li wrote:
> hi all
> I am interested in vertical crawler. But it seems this project is not
> very active. It's last update time is 11/16/2009
---
> [X] ASF Mirrors (linked in our release announcements or via the Lucene
> website)
>
> [] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [X] I/we build them from source via an SVN/Git checkout.
>
> [] Other (someone in your company mirrors them internally or via a
> downst
and greetings,
Markus
Ian Lea
An
java-user
First of all, thanks for your response.
But how can that be true if a search-term without a wildcard (and the
highlighting of the results) works fine?
Greetings,
Markus
Ian Lea
adClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:336)
... 19 more
Anyone got
?
Thanks,
Markus
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Am 12.06.2010 13:57, schrieb Ahmet Arslan:
I am using lucene 3.0.1. I use a MultiFieldQueryParser with
a GermanAnalyzer. In my index are some values among others
one document with the title "bauer". I append to every word
in my query a ~0.8 (here I am not sure if this is the way to
do it). If I
the "er" if I am not using the fuzzy
parameter. Can someone please tell my in a few words why? How can I do a
fuzzy search which also finds exact matches?
Thanks,
Markus
-
To unsubscribe, e-mail: java-user-unsu
index was searched.
I tried to get all allowed document ids (there's a field for the id) and
put them into a BooleanQuery (id1 or id2, ...), but then I get a
BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
So how can I restrict my search results with lucene?
index was searched.
I tried to get all allowed document ids (there's a field for the id) and
put them into a BooleanQuery (id1 or id2, ...), but then I get a
BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
So how can I restrict my search results with lucene?
Hi Guy,
I think that isn't a problem related to fields. I experienced this kind of
error caused by an limitation of the underlying file system. The problem was
that I had too much InputStreams open that had never been closed. Please
check that in your code and tell us if it worked.
M
;z4" at indexing
time. There may also be several other characters that could be deleted in a
new token.
How could I manage that? Is there any predefined Tokenizer/Filter for this?
Or am I wrong and there is a better way to get this done?
Thanks.
--
Markus
'm also providing the
second best results as alternative (did you mean x or y?).
The results have been very good so far, thanks again!
- Markus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
r I should properly start a separate thread ...
Has someone an advice how to approach this kind of problems? Is it
appropriate/can it be solved with Lucene? Am I right here on this list
anyway? :)
thanks for any feedback,
- Markus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
eadings in HTML
documents, e.g. title:(term)^8 h1:(term)^7 ... h6:(term)^2
content:(term)^1 . I was wondering if this is actually necessary. The
number of existing h1 to h6 fields with content decreases with the
amount of documents. To give the fields title and h1, which are the most
used ones anyway,
Jonathan,
what should I say, I'm feeling like an idiot now. Of course you're
right. This actually solves the issue ;)
thanks and sorry for wasting time,
- Markus
Jonathan O'Connor wrote:
Markus,
As I'm sure you know, "sucht" is also an inflection of "suc
eanings in german (Suche = the
Search, Sucht => addicttion).
Is there a way to tune the stemmer or are there alternatives available
or should I look for another stemmer for the german language?
thanks for any pointers,
- Markus
---
and "AND" all queries for that key too.
I'm just wondering whether all in all that's a good idea or not and
would else I could do.
- Markus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
would only add a new parameter to the Vector and and then
disatch it to the method based on its signature.
thanks,
- Markus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
which ones.
- Markus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
is
key and the client can only access documents with his key.
The goal is not about the ultimate security solution but not to have run
multiple Lucene instances on the machines.
I this a good idea to do it that way or would someone recommend another
practice?
t
is site again and I can't find an example
on how it works to actually create the tokens myself and pass them to
the searcher.
Any help would be appriciated.
thanks,
- Markus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
er idea and maybe I just overlooked a
public interface to the stemmer output? Or I'm approaching the whole
highlight search term from the wrong direction?
thanks for any pointers,
- Markus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
> >
> Sure. Simply index reversed words.
>
Since I do not have much experience with lucene can you explain it more
exactly for me? THX!
--
Weitersagen: GMX DSL-Flatrates mit Tempo-Garantie!
Ab 4,99 Euro/Monat: http://www.gmx.net/de/go/dsl
---
There is a possibility for searching with the "*" and "?" wildcard at the
end and in the middle of a search string, but not at the beginning, is there
way to do this?
--
Geschenkt: 3 Monate GMX ProMail gratis + 3 Ausgaben stern gratis
++ Jetzt anmelden & testen ++ http://www.gmx.net/de/go/promail
I am looking for a SearchEngine for our Intranet and so i deal with Lucene.
I have read the FAQ and some Postings and i got first experiences with it
and now i have some questions.
1. Is lucene a suitable SearchEngine for a Intranetsearch? I've experienced
with poi and pdfbox for indexing Word/Ex
On 6/13/05, Andy Roberts <[EMAIL PROTECTED]> wrote:
> On Monday 13 Jun 2005 13:18, Markus Wiederkehr wrote:
> > I see, the list of exceptions makes this a lot more complicated than I
> > thought... Thanks a lot, Erik!
> >
>
> I expect you'll need to do some
I see, the list of exceptions makes this a lot more complicated than I
thought... Thanks a lot, Erik!
Markus
On 6/13/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>
> On Jun 13, 2005, at 7:08 AM, Markus Wiederkehr wrote:
> > I work on an application that has to index OCR text
may be completely wrong...
Markus
On 6/13/05, Stanislav Jordanov <[EMAIL PROTECTED]> wrote:
> High guys,
> Building some huge index (about 500,000 docs totaling to 10megs of plain
> text) we've run into the following problem:
> Most of the time the IndexWriter process
not
stored get lost.
So is there any way to preserve fields that were not stored?
Reconstructing these fields is to expensive in my application.
Thanks in advance,
Markus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For
something like that at http://www.lucenebook.com/.
Thanks in advance,
Markus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
f you add the documents in the same order to both indexes and perform
> the same deletions on both indexes then they'll have the same numbers.
Would it be possible to write an IndexReader that combines two indexes
by a common field, for example a document ID? And how performant wo
ot; IndexReader subclass that generates termDoc
> lists on the fly by looking in an external database. This would require
> a mapping between Lucene document ids and external document IDs. A
> FieldCache, as described above, could serve that purpo
not in the other(s) the link between them
gets lost. How do I prevent this?
Markus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
in advance,
Markus
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
89 matches
Mail list logo