I'm stealing knowledge left and right (just ask me :-) to design me an
AIML pattern matcher. I've compiled a draft list of objects and
behaviors, which I would like to see reviewed for plausibility:

startup
        - opens configuration file (e.g. startup.xml)
        - passes configuration file to bot-loader object
bot-loader
        - loads general input substitutions (spelling, person)
        - loads sentence splitters (.;!?)
        - looks for enabled bot, takes first one it finds
        - reads bot-id; uses it as key (for saving/loading variables and
chatlogs)
        - loads bot properties (global constants, e.g. name)
        - passes control to aiml-loader object
aiml-loader
        - loads list of AIML files to load, and for each file
                - opens file
                - reads AIML categories (XML) one by one as they appear in the 
file
                        - parses and stores the content of the match path 
(e.g."BOTID *
INPUTPATTERN * CONTEXT1 * CONTEXT2 *")
                        - when it reaches the end of the category - the 
template, or leaf
of this branch of the tree
                                - calls a method to store the elements of the 
match path, together
with the template, in the
pattern-matcher-tree

; First thing to learn is XML parsing with Clojure.

; Though it is probably the easiest thing to do, it is not necessary
for the templates to be stored along with the paths in the tree. They
might as well be left on disc or in a database.

; A function like parser/scan must advance the parse to the next part
of the document (element - element content - processing
instruction...) and tokenize it. I can then use case/switch/if (must
look at what Clojure offers) to make decisions/set variables/call
methods.

; The whole path, with all components, gets created at load time. The
loader combines all elements of the path (e.g. INPUTPATTERN * CONTEXT1
* CONTEXT2 *) into one string, seperating the components using special
context-id strings (e.g. <input>, <context1>, <context2>)

; The idea of the AIML graphmaster is: take this string, seperate it
into words, then store these words as nodes in a tree.

; A variation of this idea: instead of keying the nodes by their
values, key them first by context, then by value.

; Now that the bot is up and running, the user types something into
the input box and hits Enter. The

pre-processor
        - protects sentences
        - blocks common attack vectors, e.g. code injection, flooding
        - eliminates common spelling mistakes
                - for each loaded substitution
                        - finds and replaces it in the input string
                - alternatively, uses a tree to search for them
        - removes redundant whitespace
        - splits input into sentences (everything that follows is for each
sentence)
pattern-matcher
        - combines INPUTPATTERN * CONTEXT1 * CONTEXT2 * into one string
        - tokenizes the "path to be matched" into the individual words
(nodes)
        - traverses the tree from the root; first
                - tries matching underscore (_)wildcards
                        - matching of wildcards is recursive
                                - match one word of the current path component
                                - try remainder against child node
                                - if the whole remaining input matches
                                - and if the last node is a leaf
                                        - return the template
                                - else try 2 words, then 3
                                - if all words in the string are used up and 
the current node is a
leaf
                                        - return the template
                                - else stop matching underscores, and
                - tries matching exact words in alphabetical order
                        - if there is a childnode that equals to the input 
word, recurse a
level deeper
                                - if at the next level there is a leaf, return 
the template
                                - else
                - tries matching the star (*) wildcard
        - when a complete path was matched, creates a
match-object
        - holds information about the match
                - the input (sentence)
                - the template
                - the strings matched to the wildcards

This first project should end there, with the template just returning
the values in the match-object. From there, the non-AIML aspects - the
new stuff - of the concept would be foregrounded.

Does this make sense to the casual observer?

Which known Clojure libraries should I be learning first?

Other comments, tips, disses?

Dirk
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to