"Andrew E" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
Hi all

I've written a python program that adds orders into our order routing
simulation system. It works well, and has a syntax along these lines:

 ./neworder --instrument NOKIA --size 23 --price MARKET --repeats 20

etc

However, I'd like to add a mode that will handle, say:

 ./neworder buy 23 NOKIA at MKT x 20

I could enter several orders either by running multiple times, or use a
comma-separated approach, like:

 ./neworder buy 23 NOKIA at MKT on helsinki, sell 20 NOKIA at market on
helsinki

The thing about this is that its a "tolerant" parser, so all of these
should also work:

 # omit words like "at", "on"
 ./neworder buy 23 NOKIA mkt helsinki

 # take any symbol for helsinki
 ./neworder buy 23 mkt helsinki

 # figure out that market=helsinki
 ./neworder buy 23 NOKIA at market price


I've started writing a simple state-based parser, usage like:

 class NaturalLanguageInsructionBuilder:

   def parse( self, arglist ):
     """Given a sequence of args, return an Instruction object"""
     ...
     return Instruction( instrument, size, price, ... )


class Instruction: """encapsulate a single instruction to buy, sell, etc"""

   def __init__( self, instrument, size, price, ... ):
     ...


This doesn't work yet, but I know with time I'll get there.

Question is - is there a module out there that will already handle this
approach?

Thanks for any suggestions :)

There's NLTK (on Sourceforge) which has already been mentioned. However, it's a teaching tool, not a real production natural language parser.

I'd suggest you step back from the problem and take a wider
view. Parsing natural language, in all its variations, is an unsolved
research problem that is part of what has given Artificial Intelligence
somewhat of a black eye.

Your problem is, however, much simpler than the general one:
you've got a limited number of commands which pretty much
all follow the VO (verb operands) pattern.

You've also got a lot of words from limited and disjunct
vocabularies that can be used to drive the parse. In your example,
at least one of 'buy' and 'sell' is required to start a clause,
MKT is one of maybe a half dozen
qualifiers that specify other information that must be present,
there are a limited number of exchanges, and the number of
shares seems to be the only number present.

I'd also take a bit of advice from the XP community: don't
write the library first, wait until you've got at least three
working examples so you know the services that the
library really needs to support.

John Roth

Andrew

-- http://mail.python.org/mailman/listinfo/python-list

Reply via email to