I'm stealing knowledge left and right (just ask me :-) to design me an AIML pattern matcher. I've compiled a draft list of objects and behaviors, which I would like to see reviewed for plausibility:
startup - opens configuration file (e.g. startup.xml) - passes configuration file to bot-loader object bot-loader - loads general input substitutions (spelling, person) - loads sentence splitters (.;!?) - looks for enabled bot, takes first one it finds - reads bot-id; uses it as key (for saving/loading variables and chatlogs) - loads bot properties (global constants, e.g. name) - passes control to aiml-loader object aiml-loader - loads list of AIML files to load, and for each file - opens file - reads AIML categories (XML) one by one as they appear in the file - parses and stores the content of the match path (e.g."BOTID * INPUTPATTERN * CONTEXT1 * CONTEXT2 *") - when it reaches the end of the category - the template, or leaf of this branch of the tree - calls a method to store the elements of the match path, together with the template, in the pattern-matcher-tree ; First thing to learn is XML parsing with Clojure. ; Though it is probably the easiest thing to do, it is not necessary for the templates to be stored along with the paths in the tree. They might as well be left on disc or in a database. ; A function like parser/scan must advance the parse to the next part of the document (element - element content - processing instruction...) and tokenize it. I can then use case/switch/if (must look at what Clojure offers) to make decisions/set variables/call methods. ; The whole path, with all components, gets created at load time. The loader combines all elements of the path (e.g. INPUTPATTERN * CONTEXT1 * CONTEXT2 *) into one string, seperating the components using special context-id strings (e.g. <input>, <context1>, <context2>) ; The idea of the AIML graphmaster is: take this string, seperate it into words, then store these words as nodes in a tree. ; A variation of this idea: instead of keying the nodes by their values, key them first by context, then by value. ; Now that the bot is up and running, the user types something into the input box and hits Enter. The pre-processor - protects sentences - blocks common attack vectors, e.g. code injection, flooding - eliminates common spelling mistakes - for each loaded substitution - finds and replaces it in the input string - alternatively, uses a tree to search for them - removes redundant whitespace - splits input into sentences (everything that follows is for each sentence) pattern-matcher - combines INPUTPATTERN * CONTEXT1 * CONTEXT2 * into one string - tokenizes the "path to be matched" into the individual words (nodes) - traverses the tree from the root; first - tries matching underscore (_)wildcards - matching of wildcards is recursive - match one word of the current path component - try remainder against child node - if the whole remaining input matches - and if the last node is a leaf - return the template - else try 2 words, then 3 - if all words in the string are used up and the current node is a leaf - return the template - else stop matching underscores, and - tries matching exact words in alphabetical order - if there is a childnode that equals to the input word, recurse a level deeper - if at the next level there is a leaf, return the template - else - tries matching the star (*) wildcard - when a complete path was matched, creates a match-object - holds information about the match - the input (sentence) - the template - the strings matched to the wildcards This first project should end there, with the template just returning the values in the match-object. From there, the non-AIML aspects - the new stuff - of the concept would be foregrounded. Does this make sense to the casual observer? Which known Clojure libraries should I be learning first? Other comments, tips, disses? Dirk --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---