Re: Spaces in regular expressions

2016-02-15 Thread Jack Krupansky
You can have two parallel fields, one tokenized as a programming language would (identifiers, operators) and one using the keyword tokenizer for each line. You have to decide whether to treat each line as a separate Lucene document or treat each source file as a multivalued field, one value per sou

Re: Spaces in regular expressions

2016-02-15 Thread Kudrettin Güleryüz
Since documents are source code, I am considering matching on operators too. Using whitespace analyzer, A=foo(){ would be a single term, A = foo () { would be five terms. Different documents can have a different combination of the identifiers and operators in the example. A regexp query like /A\s