Re: StandardTokenizer is slowing down highlighting a lot

2007-07-26 Thread Stanislaw Osinski
> If anyone is interested, I could prepare a JFlex based Analyzer > equivalent > (to the extent possible) to current StandardAnalyzer, which might > offer nice > indexing and highlighting speed-ups. +1. I think a lot of people would be interested in a faster StandardAnalyzer. I've attached a

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-25 Thread Stanislaw Osinski
On 25/07/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 7/25/07, Stanislaw Osinski <[EMAIL PROTECTED]> wrote: > JavaCC is slow indeed. JavaCC is a very fast parser for a large document... the issue is small fields and JavaCC's use of an exception for flow control at the end of a value. As JVMs

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-25 Thread Yonik Seeley
On 7/25/07, Stanislaw Osinski <[EMAIL PROTECTED]> wrote: JavaCC is slow indeed. JavaCC is a very fast parser for a large document... the issue is small fields and JavaCC's use of an exception for flow control at the end of a value. As JVMs have advanced, exception-as-control-flow as gotten com

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-25 Thread Grant Ingersoll
On Jul 25, 2007, at 7:19 AM, Stanislaw Osinski wrote: Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really limited by JavaCC speed. You cannot shave much more performance out of the grammar as it is already about as simple as it gets. JavaCC is slow indeed. We used it for

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-25 Thread Stanislaw Osinski
I am sure a faster StandardAnalyzer would be greatly appreciated. I'm increasing the priority of that task then :) StandardAnalyzer appears widely used and horrendously slow. Even better would be a StandardAnalyzer that could have different recognizers enabled/disabled. For example, dropping

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-25 Thread Mark Miller
I would be very interested. I have been playing around with Antlr to see if it is any faster than JavaCC, but haven't seen great gains in my simple tests. I had not considered trying JFlex. I am sure a faster StandardAnalyzer would be greatly appreciated. StandardAnalyzer appears widely used a

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-25 Thread Stanislaw Osinski
Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really limited by JavaCC speed. You cannot shave much more performance out of the grammar as it is already about as simple as it gets. JavaCC is slow indeed. We used it for a while for Carrot2, but then (3 years ago :) switched to JF

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-19 Thread Michael Stoppelman
On 7/19/07, Mark Miller <[EMAIL PROTECTED]> wrote: I think it goes without saying that a semi-complex NFA or DFA is going to be quite a bit slower than say, breaking on whitespace. Not that I am against such a warning. This is true to those very familiar with the code base and the Tokenizer s

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-19 Thread Mark Miller
I think it goes without saying that a semi-complex NFA or DFA is going to be quite a bit slower than say, breaking on whitespace. Not that I am against such a warning. To support my point on writing a custom solution that is more exact towards your needs: If you just remove the recognizer i

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-18 Thread Michael Stoppelman
Might be nice to add a line of documentation to the highlighter on the possible performance hit if one uses StandardAnalyzer which probably is a common case. Thanks for the speedy response. -M On 7/18/07, Mark Miller <[EMAIL PROTECTED]> wrote: Unfortunately, StandardAnalyzer is slow. StandardA

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-18 Thread Mark Miller
Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really limited by JavaCC speed. You cannot shave much more performance out of the grammar as it is already about as simple as it gets. You should first see if you can get away without it and use a different Analyzer, or if you can re-