@gmail.com]
> Sent: Thursday, February 25, 2016 7:58 PM
> To: java-user@lucene.apache.org
> Subject: Re: Spaces in regular expressions
>
> Thank you, I had looked at that article a little, some time ago. I was
> thinking I may have to change some lower level Lucene classes to be
Thank you, I had looked at that article a little, some time ago. I was
thinking I may have to change some lower level Lucene classes to be able to
work like that. Plus I don't have much clue if that would break things.
I am primarily looking for a Lucene solution at this point.
On Thu, Feb 25, 20
Possibly not helpful but some time ago Russ Cox implemented a code
search at Google.
His design is documented here https://swtch.com/~rsc/regexp/regexp4.html
On Wed, Feb 24, 2016, at 08:01 AM, Kudrettin Güleryüz wrote:
> I appreciate the pointers Jack. More on that, where can I read more on
> ena
I appreciate the pointers Jack. More on that, where can I read more on
enabling full regexp support on indexed source code documents using Lucene?
Any suggestions regarding cases where developers implemented this kind of
capability using Lucene/Solr/ElasticSearch/... would be more than welcome.
T
You can have two parallel fields, one tokenized as a programming language
would (identifiers, operators) and one using the keyword tokenizer for each
line. You have to decide whether to treat each line as a separate Lucene
document or treat each source file as a multivalued field, one value per
sou
Since documents are source code, I am considering matching on operators too.
Using whitespace analyzer, A=foo(){ would be a single term, A = foo () {
would be five terms. Different documents can have a different combination
of the identifiers and operators in the example. A regexp query like
/A\s
Just to be clear, the whitespace tokenizer would treat "A=foo(){" as a
single token. I presume you want "A" and "foo" to be separate terms.
You still haven't indicated what regex you were considering. Try explaining
your query in plain English. I mean, do you want to search for two keywords
with a
As mentioned, document is a source code. As you know all below statments
are equal:
A = foo() {
A=foo(){
A= foo(){
...
With standard whitespace analyzer in action statements wanted to match can
be on one to five terms in this case. If spacing is definite, I could go
either a phrase search or rege
Obviously you wouldn't need to do a regex for simply terms like foo and bar
- just use simple terms and quoted phrase to match "foo bar". If you really
do need to do complex pattern regexes and match across adjacent terms, your
best bet is to keep a copy of the source text in a separate string (not
Hi,
That's very easy to explain: Regexp queries only work on terms, you already
said it in your introduction. There is no phrase query in Lucene that accepts
regular expressions.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
> ---
10 matches
Mail list logo