Hi all
I've written a python program that adds orders into our order routing simulation system. It works well, and has a syntax along these lines:
./neworder --instrument NOKIA --size 23 --price MARKET --repeats 20
etc
However, I'd like to add a mode that will handle, say:
./neworder buy 23 NOKIA at MKT x 20
I could enter several orders either by running multiple times, or use a comma-separated approach, like:
./neworder buy 23 NOKIA at MKT on helsinki, sell 20 NOKIA at market on helsinki
The thing about this is that its a "tolerant" parser, so all of these should also work:
# omit words like "at", "on" ./neworder buy 23 NOKIA mkt helsinki
# take any symbol for helsinki ./neworder buy 23 mkt helsinki
# figure out that market=helsinki ./neworder buy 23 NOKIA at market price
I've started writing a simple state-based parser, usage like:
class NaturalLanguageInsructionBuilder:
def parse( self, arglist ): """Given a sequence of args, return an Instruction object""" ... return Instruction( instrument, size, price, ... )
class Instruction: """encapsulate a single instruction to buy, sell, etc"""
def __init__( self, instrument, size, price, ... ): ...
This doesn't work yet, but I know with time I'll get there.
Question is - is there a module out there that will already handle this approach?
Thanks for any suggestions :)
There's NLTK (on Sourceforge) which has already been mentioned. However, it's a teaching tool, not a real production natural language parser.
I'd suggest you step back from the problem and take a wider view. Parsing natural language, in all its variations, is an unsolved research problem that is part of what has given Artificial Intelligence somewhat of a black eye.
Your problem is, however, much simpler than the general one: you've got a limited number of commands which pretty much all follow the VO (verb operands) pattern.
You've also got a lot of words from limited and disjunct vocabularies that can be used to drive the parse. In your example, at least one of 'buy' and 'sell' is required to start a clause, MKT is one of maybe a half dozen qualifiers that specify other information that must be present, there are a limited number of exchanges, and the number of shares seems to be the only number present.
I'd also take a bit of advice from the XP community: don't write the library first, wait until you've got at least three working examples so you know the services that the library really needs to support.
John Roth
Andrew
-- http://mail.python.org/mailman/listinfo/python-list