Re: Binary Automaton

2017-10-04 Thread Michael McCandless
Oh I was simply explaining that the Lucene Automaton API uses "int" labels, and so if you want an automaton operating in byte space, you just need to ensure those ints only use the range supported by unsigned bytes (0 - 255). Mike McCandless http://blog.mikemccandless.com On Mon, Oct 2, 2017 at

Re: Binary Automaton

2017-10-02 Thread José Tomás Atria
Mike, could you clarify what you meant by the int comment at the end of your last message? I fail to see the significance of having multibyte transition labels for the format of the payloads the automation will run on... Thanks! Jta On Mon, Oct 2, 2017, 12:41 Cristian Lorenzetto < cristian.lorenz

Re: Binary Automaton

2017-10-02 Thread Cristian Lorenzetto
It sounds a good way :) Maybe the code to develop it is not so huge. Thanks for the suggestions :) 2017-10-02 12:27 GMT+02:00 Michael McCandless : > I'm not sure this is exactly what you are asking, but Lucene's terms are > already byte[] (default UTF-8 encoded from char[] terms), and the automat

Re: Binary Automaton

2017-10-02 Thread Michael McCandless
I'm not sure this is exactly what you are asking, but Lucene's terms are already byte[] (default UTF-8 encoded from char[] terms), and the automata that are created for searching (e.g. by WildcardQuery, PrefixQuery, FuzzyQuery, AutomatonQuery) are also byte based (see the crazy UTF32ToUTF8.java con

Re: Binary Automaton

2017-09-30 Thread Dawid Weiss
> Preface: I dont know how automaton is implemented deeply inside lucene , Well, you can take a look, it's open source. :) There are two different finite state automata inside Lucene: one is pretty much a "read-only" transducer from unique input seqences (of bytes) into an output. This is the FST

Re: Binary Automaton

2017-09-30 Thread Cristian Lorenzetto
*to @Uwe Schindler * thanks , it is very interesting :) *to @Dawid* Preface: I dont know how automaton is implemented deeply inside lucene , but (considering automaton is built on the fly when index is already present) i imagine that the automaton is scanning the lexicons/tokens present in th

Re: Binary Automaton

2017-09-30 Thread Dawid Weiss
> Hi , it is possible to create a Automaton in lucene parsing not a string > but a byte array? Can you state what problem are you trying to solve? This seems to be a question stripped of a more general context -- why do you need those byte-based automata? Dawid --

RE: Binary Automaton

2017-09-30 Thread Uwe Schindler
Hi, You can create your own parser and create the Automaton out of it. There are many APIs to add different types of accept states/ You can then execute it using AutomatonQuery. E.g., the Regex or Wildcard parsers are creating automatons programmatically, first parsing the string and then c