Leslie It is not so much "interpret speed". The data is an array of floats (32 bit) - 71,000 to 3,000,000 rows each with 200 to 300 columns. Each row will be subject to a vector multiplication for a query (obviously 71000 to millions, depending on number of rows). Yes, I am interested in parallel computation (one of the reasons I started looking at GNU APL).
The data is completely clean -- no NANs, etc. Each row corresponds to a word from a corpus. The word list is separate when computation begins (but, in the model data, interleaved; I extract and build the memory structures separately). My test model is 71,000 x 200 floats, the "standard" model is 3,000,000 x 300 floats (3.5GB of memory) The use is for low end AI (alternate word/concept selection, basic analogies) to begin the process of deriving "meaning" from documents. I figure around one billion operations per word in a document for this processing. I am looking at APL specification and testing, and deployment on GPGPU (OpenCL or CUDA). For example Futhark or something like that. FredW On Sat, 2017-04-29 at 01:50 +0000, Leslie S Satenstein wrote: > Hi Fred > Following up on Xiao-Yong Jin's response. > > You did not mention if you need the data in realtime or if you can > work at the apl interpretor speed.Do you have a structure for your > data. You mentioned a format of [text][floats] without > specifyingsize of text and number of floats. Is your data clean or > does it need to be vetted. (NANs excluded)? > I believe you should create a data dictionary which constructed with > sqlite. That data wouldbe loaded into sqlite via some C, CPP, python > code and subsequently read via shared variables.APL is an > interpretor. What would take hours with APL to do what you want to > do, could take a few > minutes by externally loading the sql database and then using APL for > presentation. > Its an interesting idea you have. Can you put out a more formal draft > starter document. > Something to fill in the topics below. > Aim:Data Descriptions/Quantities:Vetting and Filtering:Processing > speed: > Frequency of use. > > Since you propose to do the work, who can estimate the cost. > > From: Xiao-Yong Jin <jinxiaoy...@gmail.com> To: fwei...@crisys.com > Cc: GNU APL <bug-apl@gnu.org> > Sent: Friday, April 28, 2017 9:32 PM > Subject: Re: [Bug-apl] Use with word2vec > > > > If shared variables can go through SHMEM, you can probably interface > cuda that way without much bottle neck. > But with the way GNU APL is implemented now, there are just too many > other limitations on performance with arrays of such size. > > > On Apr 28, 2017, at 9:19 PM, Fred Weigel <fwei...@crisys.com> wrote: > > > > Jeurgen, and other GNU APL experts. > > > > I am exploring neural nets, word2vec and some other AI related > > areas. > > > > Right now, I want to tie in google's word2vec trained models (the > > billion word one GoogleNews-vectors-negative300.bin.gz) > > > > This is a binary file containing a lot of floating point data -- > > about > > 3.5GB of data. These are words, followed by cosine distances. I > > could > > attempt to feed this in slow way, and put it into an APL workspace. > > But... I also intend on attempting to feed the data to a GPU. So, > > what I > > am looking for is a modification to GNU APL (and yes, I am willing > > to do > > the work) -- to allow for the complete suppression of normal C++ > > allocations, etc. and allow the introduction of simple float/double > > vectors or matrices (helpful to allow "C"-ish or UTF-8-ish strings: > > the > > data is (C string containing word name) (fixed number of floating > > point)... repeated LOTs of times. > > > > The data set(s) may be compressed, so I don't want read them > > directly -- > > possibly from a shared memory region (64 bit system only, of > > course), or > > , perhaps using shared variables... but I don't think that would be > > fast > > enough. > > > > Anyway, this begins to allow the push into "big data" and AI > > applications. Just looking for some input and ideas here. > > > > Many thanks > > Fred Weigel > > > > > > >