Writer turned programmer seeks string processing advice

2009-05-05 Thread dhs827

Hi,

my name is Dirk Scheuring, I'm a writer based in Cologne, Germany, and
I want to write something about AI that can only be expressed by using
AI technology and technique. You could say that what I aim for is a
chatbot, but it's a bot that works quiet different from the norm; at
it's core, there is a functional program, and on the fringe, there's
lots of state information.

I have a working prototype, written in AIML, the language in which the
famous Alicebot was written. Most bots written in AIML use a simple
stimulus-response conception of dialogue, without using (sequences of)
state, self-reflection, or other advanced concepts. So some people who
have only cursory knowledge of this language think that it's too
simple to be doing anything with that might be computationally
interesting. I know this to be false.

AIML is sort of a micro-version of a Lisp, with String being the only
type, and recursion over everything (powerful and dangerous). You can
write serious functions with it [1], but you have to abuse it. And I
heavily abused the language to make it do what I want. I managed to
build a prototype that does something interesting, but only does it
for like ten conversational strokes, because then the interpreter's
stack overflows, causing an infinite loop.

I need to implement my ideas in a different language - one that I
don't have to abuse to do complex stuff. I've looked at various
functional languages, and have now installed two, Scala and Clojure,
doing a couple tutorials at the respective REPLs. I have a hunch that
Clojure might turn out to be closer to what I already have in AIML,
but I'm not sure yet. Maybe you can help me decide.

AIML is an XML dialect; its most important construct is the , which has a  and a . This is a default
, which matches any string:


  *
  
SOMEFUNCTION
  


The  element in the  means that I invoke the pattern
matcher recursivly to match SOMEFUNCTION. How can I express this in
Clojure?

The next primitive construction I need to translate is substitution:


  * RAN *
  
 RUNNING 
  


This code matches the substring RAN in the middle of a set of other
substrings, and substitutes it for RUNNING, leaving the values before
and after it untouched for the next recursion. And I need to know how
to do:


  * MADONNA *
  

  Madonna

 
  


The above category "extracts" a substring from the input and saves it
to a (global) variable; the other substrings are up for another round
of pattern matching. Another important low-level one is:


  
*

  SOMEOTHERFUNCTION

  


This is again the catch-all <*> pattern, but in a context, represented
by the substring MYTOPIC in between these other substrings. So in this
case, the default  has a  that calls
SOMEOTHERFUNCTION. Is this something I would do with Multimethods in
Clojure?

There's lots more, but this is the primitive stuff that I need to
comprehend first. My model does what I call "semantic compression" -
it transforms and analytically stores (parts of) the input string -,
then finds a matching object which has methods to generate facts and
rules, the core of an output, which might then be extended front and
back, depending on what's in the input and what's in the current state
(that's "semantic expansion")...and finally, the first letter of the
output is cast to uppercase, a dot or bang or questionmark is put at
the end, and a special object writes the formated output (say, to a
webpage, or to an IRC channel). I know how to do this in AIML, at
least up to the point when the interpreter crashes, because I used the
language in a way it wasn't designed to be used (but it's valid AIML -
the spec allows some unmentioned things; plus it is functional, with
continuation-passing to boot). I want to learn how to do this in
Clojure, and appreciate any pointers and tips.

Best regards,

Dirk Scheuring

[1] Simple FP example in AIML:
http://list.alicebot.org/pipermail/alicebot-style/2002-September/000231.html

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Writer turned programmer seeks string processing advice

2009-05-06 Thread dhs827

Thanks for your suggestion, Stuart, and yes, that's one obvious chice:
the AIML interpreter in question - Program D - is written in Java, so
why not just learn enough Java to fix the stack, and be done? In fact,
this was my first consideration.

However, that would be myopic. The reason I have this problem is that
the original AIML pattern matching algorithm is not suited for what I
want to do (conversations with lots of state, and lots of references
to past states to compute present and future output), so I changed it,
using the language itself. While it was fun and amazing to learn that
I could do this - from what I've read, this seems like a quite "lispy"
thing to do -, I have reason to suspect that it's very inefficient (I
do about 30-40 recursions and 10 topic/context switches on an average
input, simply because recursion and one contextual extension of the
match path is all I have to work with). So I know that I can't "fake
it" by somewhat modifying an existing program, and I'm up for a lot of
learning. My current questions are:

1. What would be a good way to learn about how to do matching and
string processing in Clojure?
2. Would it be better (or even possible) to learn about matching and
string processing in general, independent of the programming language?

I know about regex, but that's not enough: I need to learn about
"matching in context", where "context" means "more matching", or even
something like "explicit non-matches" (hope you can divine my meaning
here).

Dirk

On 6 Mai, 03:29, Stuart Sierra  wrote:
> Hi Dirk, welcome to Clojure!
>
> I don't know much about Scala, but I know that Lisp-like languages
> have long been popular for this sort of language manipulation, so
> Clojure may be a good one to look at.
>
> Some caveats: Clojure does not have a direct equivalent to the pattern/
> template style of AIML.  Clojure also does not support the structural
> pattern-matching style found in some other functional languages like
> Haskell and ML.  Multimethods cannot dispatch on composite arguments
> like [* foo *], although this is an open research area.  On the other
> hand, Clojure has good support for regular expressions, which might be
> an adequate alternative for text processing.  Clojure does not support
> continuations natively, although Clojure code can still be written in
> a continuation-passing style.
>
> And don't forget Java!  There are implementations of AIML (and
> doubtless other pattern-matching libraries) in Java that you could use
> from Clojure.
>
> -Stuart Sierra
>
> On May 5, 12:16 pm, dhs827  wrote:
>
> > Hi,
>
> > my name is Dirk Scheuring, I'm a writer based in Cologne, Germany, and
> > I want to write something about AI that can only be expressed by using
> > AI technology and technique. You could say that what I aim for is a
> > chatbot, but it's a bot that works quiet different from the norm; at
> > it's core, there is a functional program, and on the fringe, there's
> > lots of state information.
>
> > I have a working prototype, written in AIML, the language in which the
> > famous Alicebot was written. Most bots written in AIML use a simple
> > stimulus-response conception of dialogue, without using (sequences of)
> > state, self-reflection, or other advanced concepts. So some people who
> > have only cursory knowledge of this language think that it's too
> > simple to be doing anything with that might be computationally
> > interesting. I know this to be false.
>
> > AIML is sort of a micro-version of a Lisp, with String being the only
> > type, and recursion over everything (powerful and dangerous). You can
> > write serious functions with it [1], but you have to abuse it. And I
> > heavily abused the language to make it do what I want. I managed to
> > build a prototype that does something interesting, but only does it
> > for like ten conversational strokes, because then the interpreter's
> > stack overflows, causing an infinite loop.
>
> > I need to implement my ideas in a different language - one that I
> > don't have to abuse to do complex stuff. I've looked at various
> > functional languages, and have now installed two, Scala and Clojure,
> > doing a couple tutorials at the respective REPLs. I have a hunch that
> > Clojure might turn out to be closer to what I already have in AIML,
> > but I'm not sure yet. Maybe you can help me decide.
>
> > AIML is an XML dialect; its most important construct is the , 
> > which has a  and a . This is a default
>
> > , which matches any string:
>
> > 
> >   *
> >   
> >     SOMEFUN

Re: Writer turned programmer seeks string processing advice

2009-05-06 Thread dhs827

Luke VanderHart wrote:

> It actually sounds very like the classic exercise of building a logic-
> based language similar to Prolog in Scheme or Lisp, only with an AI/
> pattern matching functionality instead of a logic resolution engine.

Exactly - I'm doing much of the logic directly in the pattern
matching, using a subset of the English language where words and
phrases are computational objects  which have methods that enable them
to be parts of sentences.

 I realize now that there is no quick fix, and I'll have to learn a
lot to do this properly. But are there already enough resources so
that I can learn how to do it in Clojure? For example, would there be
enough about string processing in "Programming Clojure" to learn it
from theere?

Dirk
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Writer turned programmer seeks string processing advice

2009-05-06 Thread dhs827

Adrian Cuthbertson wrote:

> There are two excellent clojure
> tutorials on monads which would be good starting points;

Thanks, I bet that'll be useful, too. I already have a rough
understanding of what monads do, so having them presented in the
context of Clojure may help me.

Dirk
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Writer turned programmer seeks string processing advice

2009-05-06 Thread dhs827

Daniel Lyons wrote:

> I hope I misunderstood the phrase "explicit non-matches", because I  
> believe that problem is intractable, or at least leads to  
> unpleasantries like negation of the expression "foo" being "[^f]|[^f]
> [^o]|[^f][^o][^o]|f[^o]|fo[^o]|f$|fo$|^$", which I'm not even sure  
> would really work and I doubt looks more tractable or pleasant from  
> context-free or context-sensitive languages. Imagine what that would  
> be like for a complicated expression. Also there's a difference  
> between the negation of a match and the match of a negated expression;  
> negating "I have a match" doesn't seem to be the same as negating "I  
> have no match"—what's the location, length and content of the negated  
> non-match? (Am I high?)

"Explicit non-match" means that nil can mean well-defined things in
particular contexts. No voodoo goin' on here. It means that, if the
bot gets an input string that it can't even partially match, it looks
at the current context - like, in which mode ist the conversation
currently? -, and computes an output from that. So an "explicit non-
match" for a particular nil might be (list OS B C), a memory address
at which there is a function which computes the output for this nil in
this context.

The problem of negation is different, and has to be dealt with
explicitly (the "meaningful nil" is something the bot does, but
doesn't necessarily discuss, so it's an implicit thing). And my
solution to negation is that, in fact, for every "object" the bot can
discuss, there must be an equaly-dimensioned "non-object", and every
mention of "verbing" behavior necessitates the existence of an
explicitly mentionable "non-verbing" behavior. The user must be able
to negate everything (because they will!). If you have two distinct
objects, then double-negation simply can self-eliminate, and even if
you get input where there's 17 times negation prefixed to the object,
the worst that can happen is you have to throw them off, pair-wise,
recursively (of course, there'll be larger patterns for frequently
called bullshit).

Gee, this seems like a live bunch of people here.

Dirk

Dirk


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Writer turned programmer seeks string processing advice

2009-05-07 Thread dhs827

Thanks, everybody. The buzz at Hacker News is that the Clojure
community is awesome, and the buzz is right.

Now, to me, it follows from the advice you gave that I should do two
projects:

1. Learn Clojure by implementing (some of) AIML (about half of the
language is of no interest to me)
2. Implement what I prototyped in AIML (context, objects, processes)
in Clojure

Does this sound right?

Dirk


Luke VanderHart schrieb:
> On May 6, 4:39 am, dhs827  wrote:
> >  I realize now that there is no quick fix, and I'll have to learn a
> > lot to do this properly. But are there already enough resources so
> > that I can learn how to do it in Clojure? For example, would there be
> > enough about string processing in "Programming Clojure" to learn it
> > from theere?
>
> Clojure itself is very well documented for a language that's been out
> for less than two years, and as you can see, there is an active
> community to turn to for help. With persistence, it's very possible to
> become a Clojure expert in a short amount of time. I don't imagine
> you'll have any problem with the language itself.
>
> You'll probably have to get some books or look elsewhere for help with
> the actual algorithms specific to this problem domain, though - I
> doubt there's any Clojure tutorials dedicated specifically to pattern
> matching or AI. But anything that can be implemented in another
> language can be implemented (probably better) in Clojure, so if you
> know Clojure and you can at least read the examples in the AI
> literature, you should be good to go.
>
> -Luke
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: Writer turned programmer seeks string processing advice

2009-05-07 Thread dhs827

Laurent PETIT wrote:

> For 2., you could even consider, rather than manually doing the
> conversion, write (in clojure of course, with the help of the xml
> parsing tools already available) a AIML to clojure-AIML converter :-)

Most of the work will be about figuring out how to map the functional
structure from an implementation in a small and specialized language
to one in a large and general language - how do I express this or that
idea? There are two parts to my prototype; there's a small one I call
the "grammar", ~200 AIML categories which do all the algorithmic heavy
lifting, making use of AIML specialties like conditional branching by
recursively matching arbitrary variables, where you can just pass
 to make magic happen - how can I
translate this into Clojure? That's not going to be much code, but its
likely to be very dense (there are eight years of learning in those
200 AIML categories). So I suspect it'll take me a while until I
understand what I need, and I'll not have much code to show for it.

And then there's the "lexicon", where the content, the words, are
encoded as pseudo-objects. These are just files of simple AIML
categories, doing mostly substitution and value returns. But lots of
them, because AIML was never meant to have "objects", so a lot of
boilerplate goes into simulating them. However, it worked, basically,
but when the whole hootenanny was finally running with a 200 word
lexicon, it became clear that the interpreter would blow the lid after
10-12 strokes. Again, translating the ideas will be the challenge;
there's just not much code relative to the amount of ideas
cristallized there, because AIML is so specialized on this, and the
code was developed and optimized over the course of eight years. Once
I "got" it, I expect the Clojure implementation to take a number of
parameters from the programmer/writer and create the appropriate
object automatically. But that's the vision - I've not even learned
the basics yet.

Dirk


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



AIML pattern matcher design

2009-05-08 Thread dhs827

I'm stealing knowledge left and right (just ask me :-) to design me an
AIML pattern matcher. I've compiled a draft list of objects and
behaviors, which I would like to see reviewed for plausibility:

startup
- opens configuration file (e.g. startup.xml)
- passes configuration file to bot-loader object
bot-loader
- loads general input substitutions (spelling, person)
- loads sentence splitters (.;!?)
- looks for enabled bot, takes first one it finds
- reads bot-id; uses it as key (for saving/loading variables and
chatlogs)
- loads bot properties (global constants, e.g. name)
- passes control to aiml-loader object
aiml-loader
- loads list of AIML files to load, and for each file
- opens file
- reads AIML categories (XML) one by one as they appear in the 
file
- parses and stores the content of the match path 
(e.g."BOTID *
INPUTPATTERN * CONTEXT1 * CONTEXT2 *")
- when it reaches the end of the category - the 
template, or leaf
of this branch of the tree
- calls a method to store the elements of the 
match path, together
with the template, in the
pattern-matcher-tree

; First thing to learn is XML parsing with Clojure.

; Though it is probably the easiest thing to do, it is not necessary
for the templates to be stored along with the paths in the tree. They
might as well be left on disc or in a database.

; A function like parser/scan must advance the parse to the next part
of the document (element - element content - processing
instruction...) and tokenize it. I can then use case/switch/if (must
look at what Clojure offers) to make decisions/set variables/call
methods.

; The whole path, with all components, gets created at load time. The
loader combines all elements of the path (e.g. INPUTPATTERN * CONTEXT1
* CONTEXT2 *) into one string, seperating the components using special
context-id strings (e.g. , , )

; The idea of the AIML graphmaster is: take this string, seperate it
into words, then store these words as nodes in a tree.

; A variation of this idea: instead of keying the nodes by their
values, key them first by context, then by value.

; Now that the bot is up and running, the user types something into
the input box and hits Enter. The

pre-processor
- protects sentences
- blocks common attack vectors, e.g. code injection, flooding
- eliminates common spelling mistakes
- for each loaded substitution
- finds and replaces it in the input string
- alternatively, uses a tree to search for them
- removes redundant whitespace
- splits input into sentences (everything that follows is for each
sentence)
pattern-matcher
- combines INPUTPATTERN * CONTEXT1 * CONTEXT2 * into one string
- tokenizes the "path to be matched" into the individual words
(nodes)
- traverses the tree from the root; first
- tries matching underscore (_)wildcards
- matching of wildcards is recursive
- match one word of the current path component
- try remainder against child node
- if the whole remaining input matches
- and if the last node is a leaf
- return the template
- else try 2 words, then 3
- if all words in the string are used up and 
the current node is a
leaf
- return the template
- else stop matching underscores, and
- tries matching exact words in alphabetical order
- if there is a childnode that equals to the input 
word, recurse a
level deeper
- if at the next level there is a leaf, return 
the template
- else
- tries matching the star (*) wildcard
- when a complete path was matched, creates a
match-object
- holds information about the match
- the input (sentence)
- the template
- the strings matched to the wildcards

This first project should end there, with the template just returning
the values in the match-object. From there, the non-AIML aspects - the
new stuff - of the concept would be foregrounded.

Does this make sense to the casual observer?

Which known Clojure libraries should I be learning first?

Other comments, tips, disses?

Dirk
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+uns

Re: AIML pattern matcher design

2009-05-09 Thread dhs827

I'm completely engulfed in all this material, but I wanted to come
back and say that I'm stunned by the enthusiasm with which you share
your knowledge here. Many thanks, again.

Dirk


Parth Malwankar schrieb:
> On Fri, 08 May 2009 22:20:13 +0530, dhs827  wrote:
>
> >
>
> >
> > ; First thing to learn is XML parsing with Clojure.
> >
> 
> >
> > Other comments, tips, disses?
> >
> > Dirk
>
> In case you don't expect end users or other languages
> to access the configuration, one option you have is
> to save the configuration directly as Clojure data.
>
> As Clojure is a lisp, you have access to the reader and
> you could read the data (maps, vectors, etc.)
> directly from the file.
>
> E.g.:
>
> user=> (def x (read-string "{:a 1 :b 2}"))
> #'user/x
> user=> x
> {:a 1, :b 2}
> user=>
>
> See also: (doc read)
>
> If you decide to go ahead with xml, you can use
> the xml support in clojure core:
>
> http://clojure.org/api#toc673
>
> Regards,
> Parth
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---