Juergen This is useful -- I was looking at LApack.cc already. It is in line with what I need (as a template). I am not worried about saving these things, but I have a 3000000x300 array of C float,and do a 300 element vector by 300 element multiply on each of the 3 million rows in a "typical"processing step. I don't want to convert to C double (that would increase memory from 3.6GB to 7.2GB).I don't really want to copy the data at all! I can generate a descriptor to the data (memory pointer, dimensions). I think I want to plant the data into a shared memory region (and, in future, pass it to a GPU). I think I want to do some specific functions on the data -- right now I pass in row sets to GNU APL usingthe API, and execute APL code using the API. However, the control is exclusively from outside APL,meaning I cannot experimentally analyze using APL. I can work on the model given by LApack.cc, and supply some functions which (basically) providea "virtual memory/workspace". The main problem with these array sizes is saving and loading -- this array would be around 30GB inGNU APL (as far as I can tell). If ever saved, it would then take 300GB. I can convert from float to double,and create the Cell structures, but I would want to simply mmap() the thing into GNU APL (and, of course,never have the thing participate in memory management). Again, I was leaning towards partial mapping.Because, when I start with tensors, the arrays will be sparse. So, two real problems -- (1) how to deal with LARGE non-sparse matrices, and (2) how to deal withLARGE sparse matrices. I really like the expression afforded by APL. It may be possible to use the APL parser, and provide new implementations of primitives -- thanksfor that idea. LApack.cc seems to provide for something I can start with -- the actual LARGE arrays won't changeso this provides a good demark point and start for something workable. Thanks!Fred Weigel
On Sat, 2017-04-29 at 13:04 +0200, Juergen Sauermann wrote: > Hi Fred, > > > > I have not fully understood what you want to do exactly, but is > looks to me as if you want to go for > > native GNU APL functions. Native functions provide the means to > bypass the GNU APL interpreter > > itself to the extent desired. For example you can use APL > variables > but not the APL parser, or the > > APL parser but not the implementation of primitives, or whatever > else you are up to. > > > > As to plain double vectors, it is very difficult to introduce > them > as a new built-in data type because that > > change would affect: every APL primitive, every APL operator, > )LOAD, )SAVE, )DUMP, and a lot > > more. > > > > However, you can have a look at (the top-level of) the > implementation of the matrix divide primitive which > > is doing what you are maybe after. The implementation of matrix > divide expects either a double vector or > > a complex<double> vector as argument(s) and returns such a > vector as result. Before and after the computation > > of matrix divide a conversion between APL values and the plain > double or complex vector is performed. > > This conversion is very lightweight. If you have a homogenious > GNU > APL value, say all revel items being double, > > then that value is almost like a C double *. The difference is a > space between adjacent ravel elements. In other > > words (expressed in APL): > > > > C_vector ←→ 1 0 1 0 ... / APL_vector > > > > I can provide you with more information if you want to go along > this path. > > > > /// Jürgen > > > > > > > > > > On 04/29/2017 03:19 AM, Fred Weigel > wrote: > > > > > Jeurgen, and other GNU APL experts. > > > > I am exploring neural nets, word2vec and some other AI related > > areas. > > > > Right now, I want to tie in google's word2vec trained models (the > > billion word one GoogleNews-vectors-negative300.bin.gz) > > > > This is a binary file containing a lot of floating point data -- > > about > > 3.5GB of data. These are words, followed by cosine distances. I > > could > > attempt to feed this in slow way, and put it into an APL workspace. > > But... I also intend on attempting to feed the data to a GPU. So, > > what I > > am looking for is a modification to GNU APL (and yes, I am willing > > to do > > the work) -- to allow for the complete suppression of normal C++ > > allocations, etc. and allow the introduction of simple float/double > > vectors or matrices (helpful to allow "C"-ish or UTF-8-ish strings: > > the > > data is (C string containing word name) (fixed number of floating > > point)... repeated LOTs of times. > > > > The data set(s) may be compressed, so I don't want read them > > directly -- > > possibly from a shared memory region (64 bit system only, of > > course), or > > , perhaps using shared variables... but I don't think that would be > > fast > > enough. > > > > Anyway, this begins to allow the push into "big data" and AI > > applications. Just looking for some input and ideas here. > > > > Many thanks > > Fred Weigel > > > > > > > > > > > > > >