Jürgen, GNU APL Gurus More on my current AI in APL work. I have implemented functions setup∆word2vec, distance and analogy in GNU APL. Run setup∆word2vec first, and then distance (try 'dog' when prompted for input). Try analogy with 'paris france berlin' (which should, of course, yield germany). The file vector8 must be in current directory when running the setup function.
To use this, you will have to build mem.cc -- put it into your GNU APL source in src/native, and add lib_mem.la to pkglib_LTLIBRARIES, and add a line 'lib_mem_la_SOURCES = mem.cc' You then need to 'autoreconf', 'configure' and 'make'. Since this is still early development, none of that has been automated. Also, this has ONLY been run on Linux 64 bit (no other platform has been tried). See describe∆word2vec for some details on data sizing. You can, of course, examine the functions in the workspace without having a lib_mem.so file, but those native functions are needed to run the sample. Here are the files (gzip compressed). https://www.dropbox.com/s/cfcaojjuzjxra7j/mem.cc.gz?dl=0 https://www.dropbox.com/s/97f5umkh3xd72cb/vector8.gz?dl=0 https://www.dropbox.com/s/pfheb6qic9wefqd/word2vec.xml.gz?dl=0 I am still using C code to generate vector8, but I would like to convert the training to APL as well. This is an embarrassingly parallel problem. I am thinking about how to push the access to the dataset lower into the APL to achieve more efficiency. Any comments/feedback/ideas are welcome. This is a very simple AI application, using (at present) a very very small model. I am looking to begin "scaling" this development soon. I need to be able to support both very dense datasets and sparse datasets (using additional transfer calls). The sparse datasets will be for tensor support. Again, feedback is welcome. I haven't yet implemented any of the tensor stuff -- right now, concentrating on tooling issues (I like APL for this work). Fred Weigel