Writer turned programmer seeks string processing advice
Hi, my name is Dirk Scheuring, I'm a writer based in Cologne, Germany, and I want to write something about AI that can only be expressed by using AI technology and technique. You could say that what I aim for is a chatbot, but it's a bot that works quiet different from the norm; at it's core, there is a functional program, and on the fringe, there's lots of state information. I have a working prototype, written in AIML, the language in which the famous Alicebot was written. Most bots written in AIML use a simple stimulus-response conception of dialogue, without using (sequences of) state, self-reflection, or other advanced concepts. So some people who have only cursory knowledge of this language think that it's too simple to be doing anything with that might be computationally interesting. I know this to be false. AIML is sort of a micro-version of a Lisp, with String being the only type, and recursion over everything (powerful and dangerous). You can write serious functions with it [1], but you have to abuse it. And I heavily abused the language to make it do what I want. I managed to build a prototype that does something interesting, but only does it for like ten conversational strokes, because then the interpreter's stack overflows, causing an infinite loop. I need to implement my ideas in a different language - one that I don't have to abuse to do complex stuff. I've looked at various functional languages, and have now installed two, Scala and Clojure, doing a couple tutorials at the respective REPLs. I have a hunch that Clojure might turn out to be closer to what I already have in AIML, but I'm not sure yet. Maybe you can help me decide. AIML is an XML dialect; its most important construct is the , which has a and a . This is a default , which matches any string: * SOMEFUNCTION The element in the means that I invoke the pattern matcher recursivly to match SOMEFUNCTION. How can I express this in Clojure? The next primitive construction I need to translate is substitution: * RAN * RUNNING This code matches the substring RAN in the middle of a set of other substrings, and substitutes it for RUNNING, leaving the values before and after it untouched for the next recursion. And I need to know how to do: * MADONNA * Madonna The above category "extracts" a substring from the input and saves it to a (global) variable; the other substrings are up for another round of pattern matching. Another important low-level one is: * SOMEOTHERFUNCTION This is again the catch-all <*> pattern, but in a context, represented by the substring MYTOPIC in between these other substrings. So in this case, the default has a that calls SOMEOTHERFUNCTION. Is this something I would do with Multimethods in Clojure? There's lots more, but this is the primitive stuff that I need to comprehend first. My model does what I call "semantic compression" - it transforms and analytically stores (parts of) the input string -, then finds a matching object which has methods to generate facts and rules, the core of an output, which might then be extended front and back, depending on what's in the input and what's in the current state (that's "semantic expansion")...and finally, the first letter of the output is cast to uppercase, a dot or bang or questionmark is put at the end, and a special object writes the formated output (say, to a webpage, or to an IRC channel). I know how to do this in AIML, at least up to the point when the interpreter crashes, because I used the language in a way it wasn't designed to be used (but it's valid AIML - the spec allows some unmentioned things; plus it is functional, with continuation-passing to boot). I want to learn how to do this in Clojure, and appreciate any pointers and tips. Best regards, Dirk Scheuring [1] Simple FP example in AIML: http://list.alicebot.org/pipermail/alicebot-style/2002-September/000231.html --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Writer turned programmer seeks string processing advice
Thanks for your suggestion, Stuart, and yes, that's one obvious chice: the AIML interpreter in question - Program D - is written in Java, so why not just learn enough Java to fix the stack, and be done? In fact, this was my first consideration. However, that would be myopic. The reason I have this problem is that the original AIML pattern matching algorithm is not suited for what I want to do (conversations with lots of state, and lots of references to past states to compute present and future output), so I changed it, using the language itself. While it was fun and amazing to learn that I could do this - from what I've read, this seems like a quite "lispy" thing to do -, I have reason to suspect that it's very inefficient (I do about 30-40 recursions and 10 topic/context switches on an average input, simply because recursion and one contextual extension of the match path is all I have to work with). So I know that I can't "fake it" by somewhat modifying an existing program, and I'm up for a lot of learning. My current questions are: 1. What would be a good way to learn about how to do matching and string processing in Clojure? 2. Would it be better (or even possible) to learn about matching and string processing in general, independent of the programming language? I know about regex, but that's not enough: I need to learn about "matching in context", where "context" means "more matching", or even something like "explicit non-matches" (hope you can divine my meaning here). Dirk On 6 Mai, 03:29, Stuart Sierra wrote: > Hi Dirk, welcome to Clojure! > > I don't know much about Scala, but I know that Lisp-like languages > have long been popular for this sort of language manipulation, so > Clojure may be a good one to look at. > > Some caveats: Clojure does not have a direct equivalent to the pattern/ > template style of AIML. Clojure also does not support the structural > pattern-matching style found in some other functional languages like > Haskell and ML. Multimethods cannot dispatch on composite arguments > like [* foo *], although this is an open research area. On the other > hand, Clojure has good support for regular expressions, which might be > an adequate alternative for text processing. Clojure does not support > continuations natively, although Clojure code can still be written in > a continuation-passing style. > > And don't forget Java! There are implementations of AIML (and > doubtless other pattern-matching libraries) in Java that you could use > from Clojure. > > -Stuart Sierra > > On May 5, 12:16 pm, dhs827 wrote: > > > Hi, > > > my name is Dirk Scheuring, I'm a writer based in Cologne, Germany, and > > I want to write something about AI that can only be expressed by using > > AI technology and technique. You could say that what I aim for is a > > chatbot, but it's a bot that works quiet different from the norm; at > > it's core, there is a functional program, and on the fringe, there's > > lots of state information. > > > I have a working prototype, written in AIML, the language in which the > > famous Alicebot was written. Most bots written in AIML use a simple > > stimulus-response conception of dialogue, without using (sequences of) > > state, self-reflection, or other advanced concepts. So some people who > > have only cursory knowledge of this language think that it's too > > simple to be doing anything with that might be computationally > > interesting. I know this to be false. > > > AIML is sort of a micro-version of a Lisp, with String being the only > > type, and recursion over everything (powerful and dangerous). You can > > write serious functions with it [1], but you have to abuse it. And I > > heavily abused the language to make it do what I want. I managed to > > build a prototype that does something interesting, but only does it > > for like ten conversational strokes, because then the interpreter's > > stack overflows, causing an infinite loop. > > > I need to implement my ideas in a different language - one that I > > don't have to abuse to do complex stuff. I've looked at various > > functional languages, and have now installed two, Scala and Clojure, > > doing a couple tutorials at the respective REPLs. I have a hunch that > > Clojure might turn out to be closer to what I already have in AIML, > > but I'm not sure yet. Maybe you can help me decide. > > > AIML is an XML dialect; its most important construct is the , > > which has a and a . This is a default > > > , which matches any string: > > > > > * > > > > SOMEFUN
Re: Writer turned programmer seeks string processing advice
Luke VanderHart wrote: > It actually sounds very like the classic exercise of building a logic- > based language similar to Prolog in Scheme or Lisp, only with an AI/ > pattern matching functionality instead of a logic resolution engine. Exactly - I'm doing much of the logic directly in the pattern matching, using a subset of the English language where words and phrases are computational objects which have methods that enable them to be parts of sentences. I realize now that there is no quick fix, and I'll have to learn a lot to do this properly. But are there already enough resources so that I can learn how to do it in Clojure? For example, would there be enough about string processing in "Programming Clojure" to learn it from theere? Dirk --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Writer turned programmer seeks string processing advice
Adrian Cuthbertson wrote: > There are two excellent clojure > tutorials on monads which would be good starting points; Thanks, I bet that'll be useful, too. I already have a rough understanding of what monads do, so having them presented in the context of Clojure may help me. Dirk --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Writer turned programmer seeks string processing advice
Daniel Lyons wrote: > I hope I misunderstood the phrase "explicit non-matches", because I > believe that problem is intractable, or at least leads to > unpleasantries like negation of the expression "foo" being "[^f]|[^f] > [^o]|[^f][^o][^o]|f[^o]|fo[^o]|f$|fo$|^$", which I'm not even sure > would really work and I doubt looks more tractable or pleasant from > context-free or context-sensitive languages. Imagine what that would > be like for a complicated expression. Also there's a difference > between the negation of a match and the match of a negated expression; > negating "I have a match" doesn't seem to be the same as negating "I > have no match"—what's the location, length and content of the negated > non-match? (Am I high?) "Explicit non-match" means that nil can mean well-defined things in particular contexts. No voodoo goin' on here. It means that, if the bot gets an input string that it can't even partially match, it looks at the current context - like, in which mode ist the conversation currently? -, and computes an output from that. So an "explicit non- match" for a particular nil might be (list OS B C), a memory address at which there is a function which computes the output for this nil in this context. The problem of negation is different, and has to be dealt with explicitly (the "meaningful nil" is something the bot does, but doesn't necessarily discuss, so it's an implicit thing). And my solution to negation is that, in fact, for every "object" the bot can discuss, there must be an equaly-dimensioned "non-object", and every mention of "verbing" behavior necessitates the existence of an explicitly mentionable "non-verbing" behavior. The user must be able to negate everything (because they will!). If you have two distinct objects, then double-negation simply can self-eliminate, and even if you get input where there's 17 times negation prefixed to the object, the worst that can happen is you have to throw them off, pair-wise, recursively (of course, there'll be larger patterns for frequently called bullshit). Gee, this seems like a live bunch of people here. Dirk Dirk --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Writer turned programmer seeks string processing advice
Thanks, everybody. The buzz at Hacker News is that the Clojure community is awesome, and the buzz is right. Now, to me, it follows from the advice you gave that I should do two projects: 1. Learn Clojure by implementing (some of) AIML (about half of the language is of no interest to me) 2. Implement what I prototyped in AIML (context, objects, processes) in Clojure Does this sound right? Dirk Luke VanderHart schrieb: > On May 6, 4:39 am, dhs827 wrote: > > I realize now that there is no quick fix, and I'll have to learn a > > lot to do this properly. But are there already enough resources so > > that I can learn how to do it in Clojure? For example, would there be > > enough about string processing in "Programming Clojure" to learn it > > from theere? > > Clojure itself is very well documented for a language that's been out > for less than two years, and as you can see, there is an active > community to turn to for help. With persistence, it's very possible to > become a Clojure expert in a short amount of time. I don't imagine > you'll have any problem with the language itself. > > You'll probably have to get some books or look elsewhere for help with > the actual algorithms specific to this problem domain, though - I > doubt there's any Clojure tutorials dedicated specifically to pattern > matching or AI. But anything that can be implemented in another > language can be implemented (probably better) in Clojure, so if you > know Clojure and you can at least read the examples in the AI > literature, you should be good to go. > > -Luke --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
Re: Writer turned programmer seeks string processing advice
Laurent PETIT wrote: > For 2., you could even consider, rather than manually doing the > conversion, write (in clojure of course, with the help of the xml > parsing tools already available) a AIML to clojure-AIML converter :-) Most of the work will be about figuring out how to map the functional structure from an implementation in a small and specialized language to one in a large and general language - how do I express this or that idea? There are two parts to my prototype; there's a small one I call the "grammar", ~200 AIML categories which do all the algorithmic heavy lifting, making use of AIML specialties like conditional branching by recursively matching arbitrary variables, where you can just pass to make magic happen - how can I translate this into Clojure? That's not going to be much code, but its likely to be very dense (there are eight years of learning in those 200 AIML categories). So I suspect it'll take me a while until I understand what I need, and I'll not have much code to show for it. And then there's the "lexicon", where the content, the words, are encoded as pseudo-objects. These are just files of simple AIML categories, doing mostly substitution and value returns. But lots of them, because AIML was never meant to have "objects", so a lot of boilerplate goes into simulating them. However, it worked, basically, but when the whole hootenanny was finally running with a 200 word lexicon, it became clear that the interpreter would blow the lid after 10-12 strokes. Again, translating the ideas will be the challenge; there's just not much code relative to the amount of ideas cristallized there, because AIML is so specialized on this, and the code was developed and optimized over the course of eight years. Once I "got" it, I expect the Clojure implementation to take a number of parameters from the programmer/writer and create the appropriate object automatically. But that's the vision - I've not even learned the basics yet. Dirk --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---
AIML pattern matcher design
I'm stealing knowledge left and right (just ask me :-) to design me an AIML pattern matcher. I've compiled a draft list of objects and behaviors, which I would like to see reviewed for plausibility: startup - opens configuration file (e.g. startup.xml) - passes configuration file to bot-loader object bot-loader - loads general input substitutions (spelling, person) - loads sentence splitters (.;!?) - looks for enabled bot, takes first one it finds - reads bot-id; uses it as key (for saving/loading variables and chatlogs) - loads bot properties (global constants, e.g. name) - passes control to aiml-loader object aiml-loader - loads list of AIML files to load, and for each file - opens file - reads AIML categories (XML) one by one as they appear in the file - parses and stores the content of the match path (e.g."BOTID * INPUTPATTERN * CONTEXT1 * CONTEXT2 *") - when it reaches the end of the category - the template, or leaf of this branch of the tree - calls a method to store the elements of the match path, together with the template, in the pattern-matcher-tree ; First thing to learn is XML parsing with Clojure. ; Though it is probably the easiest thing to do, it is not necessary for the templates to be stored along with the paths in the tree. They might as well be left on disc or in a database. ; A function like parser/scan must advance the parse to the next part of the document (element - element content - processing instruction...) and tokenize it. I can then use case/switch/if (must look at what Clojure offers) to make decisions/set variables/call methods. ; The whole path, with all components, gets created at load time. The loader combines all elements of the path (e.g. INPUTPATTERN * CONTEXT1 * CONTEXT2 *) into one string, seperating the components using special context-id strings (e.g. , , ) ; The idea of the AIML graphmaster is: take this string, seperate it into words, then store these words as nodes in a tree. ; A variation of this idea: instead of keying the nodes by their values, key them first by context, then by value. ; Now that the bot is up and running, the user types something into the input box and hits Enter. The pre-processor - protects sentences - blocks common attack vectors, e.g. code injection, flooding - eliminates common spelling mistakes - for each loaded substitution - finds and replaces it in the input string - alternatively, uses a tree to search for them - removes redundant whitespace - splits input into sentences (everything that follows is for each sentence) pattern-matcher - combines INPUTPATTERN * CONTEXT1 * CONTEXT2 * into one string - tokenizes the "path to be matched" into the individual words (nodes) - traverses the tree from the root; first - tries matching underscore (_)wildcards - matching of wildcards is recursive - match one word of the current path component - try remainder against child node - if the whole remaining input matches - and if the last node is a leaf - return the template - else try 2 words, then 3 - if all words in the string are used up and the current node is a leaf - return the template - else stop matching underscores, and - tries matching exact words in alphabetical order - if there is a childnode that equals to the input word, recurse a level deeper - if at the next level there is a leaf, return the template - else - tries matching the star (*) wildcard - when a complete path was matched, creates a match-object - holds information about the match - the input (sentence) - the template - the strings matched to the wildcards This first project should end there, with the template just returning the values in the match-object. From there, the non-AIML aspects - the new stuff - of the concept would be foregrounded. Does this make sense to the casual observer? Which known Clojure libraries should I be learning first? Other comments, tips, disses? Dirk --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+uns
Re: AIML pattern matcher design
I'm completely engulfed in all this material, but I wanted to come back and say that I'm stunned by the enthusiasm with which you share your knowledge here. Many thanks, again. Dirk Parth Malwankar schrieb: > On Fri, 08 May 2009 22:20:13 +0530, dhs827 wrote: > > > > > > > > ; First thing to learn is XML parsing with Clojure. > > > > > > > Other comments, tips, disses? > > > > Dirk > > In case you don't expect end users or other languages > to access the configuration, one option you have is > to save the configuration directly as Clojure data. > > As Clojure is a lisp, you have access to the reader and > you could read the data (maps, vectors, etc.) > directly from the file. > > E.g.: > > user=> (def x (read-string "{:a 1 :b 2}")) > #'user/x > user=> x > {:a 1, :b 2} > user=> > > See also: (doc read) > > If you decide to go ahead with xml, you can use > the xml support in clojure core: > > http://clojure.org/api#toc673 > > Regards, > Parth --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~--~~~~--~~--~--~---