We are still relying on a kind of "Apache ORO" in OFBiz, exactly 
jakarta-oro-2.0.8.jar http://jakarta.apache.org/site/news/news-2010-q3.html#20100901.2

While working on https://issues.apache.org/jira/browse/OFBIZ-5395 (out of scope 
here)
I noted the same reference than Bruno for the benchmark.
See the end of the OFBIZ-5395 description for this note and a remark about ORO cache: "with its cache, CompilerMatcher is more than an interesting alternative to regular Java regex."

Jacques

Le 31/01/2015 18:29, Benson Margulies a écrit :
On Sat, Jan 31, 2015 at 12:22 PM, Bruno P. Kinoshita
<brunodepau...@yahoo.com.br> wrote:
Hi Benson!
I wouldn't be able to help at the moment, but some years ago I had a 
performance issue in a Nutch crawler with regexes [1] and found about this 
other library that you mentioned I think. Are you talking about ORO?
Yes, I believe I'm referring to ORO. I'm not really even looking for
help. I am looking to see if there is enough interest to justify
exploring pushing the code into the ASF. We did benchmarks, it's
faster than built-in Java in a variety of cases. We are precluded from
using GPL, so we didn't look seriously at OpenRegex. We want to have a
system where outsiders can supply any regex they like and we don't
have to worry about one of our servers being eaten by it.


I ended up changing the regex and never had a chance to play with ORO or other 
libraries to see if there was any advantage over not using JRE's regex API. 
Recently I had another performance problem with Apache Hive SerDe and 
performance problems and fixed it by changing the storage format and 
simplifying the regex.
Have you done any performance comparison with your code and other libraries? 
More or less like this [2]? Maybe this library could be used as an alternative 
in Nutch, Commons Crawl or in other projects when performance was important.
Lastly, I'm using OpenRegex (GPLv3) [3] in a project, in combination with Apache OpenNLP. 
It is a "regular expression language and engine" that users can use to match 
string and NLP tags. For example:
<string='My Company'> <lemma='be'> <postag='RB'>* (<adjective>: <postag='JJ'>))
Where <lemma='be'> will match any form of be/is/was/were/etc, <postag='RB'>* one or more 
adverbs and the last part of the expression will find a named token "adjective" (JJ is the 
Penn Tree Bank part of speech tag for adjectives).
Not sure if your library will work only with text or will support any other 
approaches too. OpenRegex has some TODO's in the GitHub Wiki but hasn't been 
updated in a while. Maybe if your library could work similarly to OpenRegex, it 
could be incorporated in Apache OpenNLP too. Even the LanguageTool team 
demonstrated some interest in experimenting it [4].
Just food for thought :-)Bruno
[1] https://issues.apache.org/jira/browse/NUTCH-1014[2] 
http://tusker.org/regex/regex_benchmark.html[3] 
https://github.com/knowitall/openregex[4] 
http://sourceforge.net/p/languagetool/mailman/languagetool-devel/thread/69f229c0a58d3245d511dafaa82feafc%40danielnaber.de/#msg31280519

       From: Benson Margulies <bimargul...@gmail.com>
  To: Commons Developers List <dev@commons.apache.org>
  Sent: Saturday, January 31, 2015 1:58 PM
  Subject: Anyone interested in regular expressions, again?

So, once upon a time, there was a regex library here. It was retired,
presumably on the grounds that it was rendered obsolete by the JRE's
native support.

However, the JRE's regular expressions have a pretty severe problem;
they have unbounded (or at least, very, very, bad) execution time for
some combinations of data and regex.

To cope with this, we ported the Henry Spencer regular expression
library (as found in TCL) from C to Java.

Thus: https://github.com/basis-technology-corp/tcl-regex-java

Is anyone interested in this? Give or take the possible IP muddle of
the original C Code, I could grant it easily.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to