Oh I was simply explaining that the Lucene Automaton API uses "int" labels,
and so if you want an automaton operating in byte space, you just need to
ensure those ints only use the range supported by unsigned bytes (0 - 255).
Mike McCandless
http://blog.mikemccandless.com
On Mon, Oct 2, 2017 at
Mike, could you clarify what you meant by the int comment at the end of
your last message? I fail to see the significance of having multibyte
transition labels for the format of the payloads the automation will run
on...
Thanks!
Jta
On Mon, Oct 2, 2017, 12:41 Cristian Lorenzetto <
cristian.lorenz
It sounds a good way :) Maybe the code to develop it is not so huge. Thanks
for the suggestions :)
2017-10-02 12:27 GMT+02:00 Michael McCandless :
> I'm not sure this is exactly what you are asking, but Lucene's terms are
> already byte[] (default UTF-8 encoded from char[] terms), and the automat
I'm not sure this is exactly what you are asking, but Lucene's terms are
already byte[] (default UTF-8 encoded from char[] terms), and the automata
that are created for searching (e.g. by WildcardQuery, PrefixQuery,
FuzzyQuery, AutomatonQuery) are also byte based (see the crazy
UTF32ToUTF8.java con
> Preface: I dont know how automaton is implemented deeply inside lucene ,
Well, you can take a look, it's open source. :) There are two
different finite state automata inside Lucene: one is pretty much a
"read-only" transducer from unique input seqences (of bytes) into an
output. This is the FST
*to @Uwe Schindler *
thanks , it is very interesting :)
*to @Dawid*
Preface: I dont know how automaton is implemented deeply inside lucene ,
but (considering automaton is built on the fly when index is already
present) i imagine that the automaton is scanning the lexicons/tokens
present in th
> Hi , it is possible to create a Automaton in lucene parsing not a string
> but a byte array?
Can you state what problem are you trying to solve? This seems to be a
question stripped of a more general context -- why do you need those
byte-based automata?
Dawid
--
Hi,
You can create your own parser and create the Automaton out of it. There are
many APIs to add different types of accept states/ You can then execute it
using AutomatonQuery. E.g., the Regex or Wildcard parsers are creating
automatons programmatically, first parsing the string and then c