Now it is getting more clear.
"pos" (aka position) starts at "-1" and its highest number is the
last "node id" of the graph.
"pos" minus "positionLength" is the starting "node id" of the arc.
Is the tokenStream after each filter always a valid graph?
E.g. ShingleFilter with query "natural fores
Hi Mike,
Thanks for the response,
Sounds like I was using it incorrectly by specifying the SynonymGraphFilter
at query time AND SynonymGraphFilter followed by FlattenGraphFilter at
index time.
I need to specify one or the other.
J.D.
J.D. Corbin
Senior Research Engineer
Advanced Computing &
Hi J.D.,
First you need to decide if it's OK to do all your syns at search
time. It results in slower queries, and different scoring, yet
correct multi-token results, vs. index time.
If that is OK, then you should not use any syn filter at index time,
and use only SynonymGraphFilter at search ti
Hi, we have 4 solr instances running
we are using solr cloud for indexing hbase table column names.
each column in hbase will end up as a document in solr, which resulted in
over 2 billion documents in solr.
primary goal is to search the column names.
we have 4 shards for the collection, queries a
Hi,
I am looking for some guidance on the proper use of the SynonymGraphFilter
in Lucene (6.4.1).
Below is how I am implementing the analyzers for the index and query sides.
I don't see a lot of examples on the proper usage of the SynonymGraphFilter
so was hoping that someone (Michael McCandless?
Hi,
I cannot reproduce this with Solr 5.5.4 (coming out soon). With a completely
empty ~/.ivy/cache dir it builds and is able to download everything.
This error is in most cases caused by stale lock files (*.lck) in the IVY Cache.
Uwe
-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://ww
Attach &debug=query to the URL when you fire this query and you'll see
exactly how it parses which should help you diagnose the problem. Some
places to look:
1> There are options that treat lower case operators as valid.
Normally, Solr only treats 'AND' as an operator not 'and' but this can
be ove
I have noticed odd behavior in the query
"Conceal and Carry"
I have legal customers who need to find exactly this phrase because, as
you know, it refers to a specific set of gun laws.
However, this query is not behaving like a traditional quoted query - my
assumption is that a quoted string is
On Mon, Feb 13, 2017 at 9:04 AM, Bernd Fehling
wrote:
> Am I confused by the naming of pos, positionIncrement, offset, positionLength,
> start and end between Lucene and Solr?
"pos" is just accumulating the positionIncrement values, starting from
-1. I don't think Solr's analysis UI would chang
After drawing the graph I must admit it looks correct, including all values.
Am I confused by the naming of pos, positionIncrement, offset, positionLength,
start and end between Lucene and Solr?
OK, the SynonymGraphFilter is ONLY for Lucene, right?
But how are you going to build the multi-word s
On Mon, Feb 13, 2017 at 6:39 AM, Oliver Mannion wrote:
> I'd like to construct an Automaton to prefix match against a large set of
> strings. I gather a RunAutomation is immutable, thread safe and faster than
> Automaton.
That's correct.
> Are there any other differences between the three Autom
Unfortunately, I cannot reproduce the problem with a straight Lucene
test case. I added a this test case to TestSynonymGraphFilter.java:
https://gist.github.com/mikemccand/318459ca507742052688e2fe800a10dd
And when I run it, it produces the correct token graph:
TOKEN: naturwald
offset: 0-1
Thanks Bernd; I'll see if I can make a test case from this.
Mike McCandless
http://blog.mikemccandless.com
On Mon, Feb 13, 2017 at 5:00 AM, Bernd Fehling
wrote:
> My very simple and small sysonym_test.txt has only one line:
> naturwald, natural\ forest, forêt\ naturelle, natürlicher\ wald
>
>
Hi there,
I'd like to construct an Automaton to prefix match against a large set of
strings. I gather a RunAutomation is immutable, thread safe and faster than
Automaton. Are there any other differences between the three Automaton
classes, for example, in memory usage?
Would the general approach
My very simple and small sysonym_test.txt has only one line:
naturwald, natural\ forest, forêt\ naturelle, natürlicher\ wald
If I only use WT (WhitespaceTokenizer) and SGF (with WhitespaceTokenizer)
the result is:
WT text start end positionLength type position
natural 0
15 matches
Mail list logo