RE: Spaces in regular expressions

2016-02-26 Thread Uwe Schindler
@gmail.com] > Sent: Thursday, February 25, 2016 7:58 PM > To: java-user@lucene.apache.org > Subject: Re: Spaces in regular expressions > > Thank you, I had looked at that article a little, some time ago. I was > thinking I may have to change some lower level Lucene classes to be

Re: Spaces in regular expressions

2016-02-25 Thread Kudrettin Güleryüz
Thank you, I had looked at that article a little, some time ago. I was thinking I may have to change some lower level Lucene classes to be able to work like that. Plus I don't have much clue if that would break things. I am primarily looking for a Lucene solution at this point. On Thu, Feb 25, 20

Re: Spaces in regular expressions

2016-02-25 Thread Greg Bowyer
Possibly not helpful but some time ago Russ Cox implemented a code search at Google. His design is documented here https://swtch.com/~rsc/regexp/regexp4.html On Wed, Feb 24, 2016, at 08:01 AM, Kudrettin Güleryüz wrote: > I appreciate the pointers Jack. More on that, where can I read more on > ena

Re: Spaces in regular expressions

2016-02-24 Thread Kudrettin Güleryüz
I appreciate the pointers Jack. More on that, where can I read more on enabling full regexp support on indexed source code documents using Lucene? Any suggestions regarding cases where developers implemented this kind of capability using Lucene/Solr/ElasticSearch/... would be more than welcome. T

Re: Spaces in regular expressions

2016-02-15 Thread Jack Krupansky
You can have two parallel fields, one tokenized as a programming language would (identifiers, operators) and one using the keyword tokenizer for each line. You have to decide whether to treat each line as a separate Lucene document or treat each source file as a multivalued field, one value per sou

Re: Spaces in regular expressions

2016-02-15 Thread Kudrettin Güleryüz
Since documents are source code, I am considering matching on operators too. Using whitespace analyzer, A=foo(){ would be a single term, A = foo () { would be five terms. Different documents can have a different combination of the identifiers and operators in the example. A regexp query like /A\s

Re: Spaces in regular expressions

2016-02-13 Thread Jack Krupansky
Just to be clear, the whitespace tokenizer would treat "A=foo(){" as a single token. I presume you want "A" and "foo" to be separate terms. You still haven't indicated what regex you were considering. Try explaining your query in plain English. I mean, do you want to search for two keywords with a

Re: Spaces in regular expressions

2016-02-13 Thread Kudrettin Güleryüz
As mentioned, document is a source code. As you know all below statments are equal: A = foo() { A=foo(){ A= foo(){ ... With standard whitespace analyzer in action statements wanted to match can be on one to five terms in this case. If spacing is definite, I could go either a phrase search or rege

Re: Spaces in regular expressions

2016-02-13 Thread Jack Krupansky
Obviously you wouldn't need to do a regex for simply terms like foo and bar - just use simple terms and quoted phrase to match "foo bar". If you really do need to do complex pattern regexes and match across adjacent terms, your best bet is to keep a copy of the source text in a separate string (not

RE: Spaces in regular expressions

2016-02-13 Thread Uwe Schindler
Hi, That's very easy to explain: Regexp queries only work on terms, you already said it in your introduction. There is no phrase query in Lucene that accepts regular expressions. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > ---