Re: Extracting tokens from an expression and matching an object against that expression without parsing twice

Andrus Adamchik Mon, 17 Nov 2014 04:15:13 -0800

> It's not easy to explain properly why I need the tokens; the general reason 
> is that the preexisting application, written long ago by several other 
> persons, is designed to use them, and changing its design would be too big an 
> undertaking.


Yeah, I still don't understand why would the code care to poke inside the 
parser and deal directly with tokens.

> I will see if I can use Andrus' pointers to extract the tokens from the 
> Expression instance.

I am afraid you won't find any *tokens* in an Expression instance. Expression 
is just a tree of objects that can be used to evaluate stuff. If you need it to 
match something, you can. But a parsed expression is devoid of any links to the 
original lexical structure. 

Andrus



> On Nov 17, 2014, at 11:46 AM, Davide Vecchi <d...@amc.dk> wrote:
> 
> Thanks for your inputs.
> 
> I'm probably showing my technological age here, but I certainly admit that I 
> have this tendency to avoid repeating complex operations as a matter of 
> principle when it's known in advance that the second process will produce 
> exactly the same result as the first one. When I catch myself doing that I 
> always feel that my design is not OK.
> 
> However in this case I am quite sure I need to get rid of the double parsing, 
> although I did not demonstrate in a particularly strict way that that's the 
> cause of the slowdown. It's more like a qualified (in my opinion) guess, 
> reinforced by the fact that method Expression.fromString(String) has a TODO 
> saying "TODO: cache expression strings, since this operation is pretty slow" 
> (I'm using version 3.0.2). So it looks like the Cayenne coders too had 
> reasons to worry to some extent about optimization in this area.
> 
> I just used JVisualVM to profile the execution and two of the methods where 
> by far most of the time is spent are Expression.fromString(String) and 
> ExpressionParser.getNextToken() . Since I have to cut down the processing 
> time I do have to focus on them first.
> 
> The situation here is that I modified a preexisting application which was 
> doing some basic parsing, and after creating the tokens from the parsing it 
> was using them to match the expression against objects. That parsing is basic 
> in that it can only parse simple expressions, f.ex. it doesn't support 
> parentheses grouping.
> 
> My changes consisted of removing that parsing code from the application and 
> replacing it with calls to Cayenne, because we need real parsing. Of course 
> the parsing done by Cayenne is way more powerful and that might be the real 
> and fair reason why it takes longer, but even if this is the case it's 
> important for me not to do that parsing twice.
> 
> It's not easy to explain properly why I need the tokens; the general reason 
> is that the preexisting application, written long ago by several other 
> persons, is designed to use them, and changing its design would be too big an 
> undertaking. Since all that needs to be improved is the parsing and matching 
> I thought I'd just use a powerful tool to replace only those parts.
> 
> I will see if I can use Andrus' pointers to extract the tokens from the 
> Expression instance.
> 
> 
> 
> -----Original Message-----
> From: Andrus Adamchik [mailto:and...@objectstyle.org] 
> Sent: Sunday, November 16, 2014 14:57
> To: user@cayenne.apache.org
> Subject: Re: Extracting tokens from an expression and matching an object 
> against that expression without parsing twice
> 
> I second John's assessment. 
> 
> BTW, what are the tokens for? Do you actually need to have access to the 
> lexical structure of the String? As of course parsed Expression object is a 
> tree itself and gives you access to its own structure either directly 
> ('getOperand(int)') or via 'traverse' and 'transform' methods.
> 
> Andrus
> 
>> On Nov 14, 2014, at 9:54 PM, John Huss <johnth...@gmail.com> wrote:
>> 
>> This looks like a serious micro optimization.  Is the performance for 
>> this really that critical?  Have you demonstrated that this is your 
>> application's crucial hot spot?
>> 
>> On Fri, Nov 14, 2014 at 7:35 AM, Davide Vecchi <d...@amc.dk> wrote:
>> 
>>> Hi all,
>>> 
>>> I have an expression in a string, and I use Cayenne to parse the 
>>> expression into tokens, which are needed for a specific purpose.
>>> 
>>> However in addition to having the tokens I also need to evaluate an 
>>> object against that expression, to see if that object matches the 
>>> expression.
>>> 
>>> My problem is that the way I'm doing it causes the parsing to be done 
>>> twice on the same expression, and I would like to avoid to parse the 
>>> same expression twice.
>>> 
>>> The token creation I'm doing it like this:
>>> 
>>> -----------------------------------
>>> String where = "myField=0";
>>> 
>>> Reader reader = new StringReader(where);
>>> 
>>> ExpressionParser parser = new ExpressionParser(reader);
>>> 
>>> List<Token> tokens = new ArrayList<>();
>>> 
>>> Token token = parser.getNextToken();
>>> 
>>> while (token != null) {
>>> 
>>>    tokens.add(token);
>>> 
>>>    token = parser.getNextToken();
>>> }
>>> -----------------------------------
>>> 
>>> The object matching I'm doing it like this:
>>> 
>>> -----------------------------------
>>> String where = "myField=0";
>>> 
>>> Expression expression = Expression.fromString(where);
>>> 
>>> boolean matches = expression.match(object);
>>> -----------------------------------
>>> 
>>> The call to Expression.fromString made in the object matching 
>>> operation performs a parsing, but the parsing of the same expression 
>>> had already been done in the token creation operation.
>>> 
>>> Is there a way to redesign this process in order to get the tokens 
>>> and also match an object against the expression without parsing the 
>>> same expression twice ?
>>> 
>>> For example, I believe that the call to Expression.fromString must 
>>> have created the tokens, because it has parsed the string. So I 
>>> thought I could reverse the order and do the object matching first, 
>>> keep the Expression instance created in that process and use it to 
>>> extract the tokens. But I can't see how to extract the tokens from an 
>>> Expression instance instead of from an ExpressionParser instance as I'm 
>>> currently doing.
>>> 
>>> Or another possibility could be that I keep creating the tokens 
>>> first, and then I match my object against them, instead of against 
>>> the string expression that generated those tokens. But I can't see 
>>> how to match an object against tokens.
>>> 
>>> So I'm looking for some ideas.
>>> 
>>> Thanks in advance.
>>> 
>>> Davide Vecchi
>>> 
> 
>

Re: Extracting tokens from an expression and matching an object against that expression without parsing twice

Reply via email to