You can have two parallel fields, one tokenized as a programming language
would (identifiers, operators) and one using the keyword tokenizer for each
line. You have to decide whether to treat each line as a separate Lucene
document or treat each source file as a multivalued field, one value per
sou
Since documents are source code, I am considering matching on operators too.
Using whitespace analyzer, A=foo(){ would be a single term, A = foo () {
would be five terms. Different documents can have a different combination
of the identifiers and operators in the example. A regexp query like
/A\s