Hi Xiao-Yong, IMHO the GNU APL core language should be kept to a reasonable minimum of non-redundant primitives. If you take a handful of existing primitives and combine them in some way then you get a large number of new functions that are useful in certain situations. However, giving such combinations their own distinguished name or even their own symbol makes the language as a whole unreadable because nobody (except, of course, Dyalog APL users) would have a chance to make any sense out of them. These symbols also limit portability of the programs using them and as a proponent of open software I want to be the last one that lures people into writing non-portable APL programs. A better way of providing functions with specific functionalities (like ⌸) is, IMHO, an APL library which defines a function for it with a name that tells the unenlightened user what they do. /// Jürgen On 09/10/2016 05:58 AM, Xiao-Yong Jin
wrote:
Seems like a good motivation to support quad equal: ⌸ See the key operator in dyalog: http://help.dyalog.com/15.0/Content/Language/Primitive%20Operators/Key.htmOn the other hand, pattern matching A[n]←x for in-place operation seems a good way to go. Not sure if it’s possible in GNU APL.On Sep 9, 2016, at 10:27 PM, Christian Robert <christian.rob...@polymtl.ca> wrote: I got to may be 2% of the work with this: alpha_only←{(⍵∊'abcdefghijklmnopqrstuvwxyz ')/⍵←tolower ⍵} remove_blank_lines←{(∊0≠⍴¨⍵)/⍵} tolower←{('abcdefghijklmnopqrstuvwxyz',⎕av)[('ABCDEFGHIJKLMNOPQRSTUVWXYZ',⎕av)⍳⍵]} )sic )erase readfile_fast ∇z←readfile_fast name;fd;lines;⎕io ⎕io←1 ⍝ Bring a file into a vector of strings, utf8 aware for both name and contents. →(0≠"r" ⎕fio[31] 18 ⎕cr name)/Error ⍝ Can not read file ? → Error z←⎕fio[26] 18 ⎕cr name ⍝ First pass, read the whole file lines←⍳+/((↑"\n")=z) ⍝ Compute the iota for each line z←(⍴lines)⍴⍬ ⍝ Preallocate "z" to the right size fd←⎕fio[3] 18 ⎕cr name ⍝ Open the file ⊣ {⊣z[⍵]←⊂19 ⎕cr ⎕ucs ¯1↓⎕fio[8] fd} ⍤0 lines ⍝ Put each line in the preallocated "z" ⊣ ⎕fio[4] fd ⋄ →0 ⍝ Close the file and return Error: ⎕ES ∊'Error on file "',name,'": ',⎕fio[2] | ⎕fio[1] '' ∇ alpha_only←{(⍵∊'abcdefghijklmnopqrstuvwxyz ')/⍵←tolower ⍵} remove_blank_lines←{(∊0≠⍴¨⍵)/⍵} tolower←{('abcdefghijklmnopqrstuvwxyz',⎕av)[('ABCDEFGHIJKLMNOPQRSTUVWXYZ',⎕av)⍳⍵]} vertical←{,[⍳0]⍵} words_only←{(⍵∊'abcdefghijklmnopqrstuvwxyz ')/⍵←tolower ⍵} ⍝ then ... z←remove_blank_lines alpha_only ¨ tolower ¨ readfile_fast 'big.txt' ⍴ z 103561 ⍝ here you have 103,561 lines, no empty ones, clean of special characters (but may have several blanks between each word). ⌊/⍴¨z ⍝ minimum line length, probable "I" 1 ⌈/⍴¨z ⍝ maximum line length, may contain 400 to 600 words on each line of 2488 characters. 2488 ⍝ at this point you have to iterate (rank operator?) over thoses 103,561 lines ⍝ to extract all the words in each lines, saving thems (unique) and count the occurence of ⍝ each word. ⍝ since APL can't do things like count['abc'] = 0 or count['abc'] += 1 (index with string on vectors) ⍝ it's a near no-end issue (eg: very difficult to do, but not impossible) ⍝ you will NEVER win race to language like "awk" who have indexed string *part* of the basic language. my 2 cents, Xtian. On 2016-09-09 17:39, Ala'a Mohammad wrote:Hi, I'm trying to create simple spell corrector (Norvig at http://norvig.com/spell-correct.html) in APL. I tried but stumbled upon the frequency/count stage and could not move further. The stopper was either WS Full, or apl process killed. I'm assuming the main issue is 'lack of experience with APL', and thus the inefficient coding. ftxt ← { ⎕FIO[26] ⍵ } a ← 'abcdefghijklmnopqrstuvwxyz' A ← 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' downcase ← { (a,⎕AV)[(A,⎕AV)⍳⍵] } nl ← ⎕UCS 13 cr ← ⎕UCS 10 tab ← ⎕UCS 9 nonalpha ← nl, cr, tab, ' 0123456789()[]!?%$,.:;/+*=<>-_#"`~@&' alphamask ← { ~ ⍵ ∊ nonalpha } hist ← { (⍪∪⍵),+/∨/¨(∪⍵)∘.⍷⍵ } fhist ← { hist (alphamask txt) ⊂ downcase txt ← ftxt ⍵ } ⍝ file ← '/misc/small.txt' ~ 28K ⍝ file ← '/misc/xaa' ~ 1.3M file ← '/misc/big.txt' ⍝ ~ 6.2M ⍝ following 2 lines for debugging ⎕ ← ⍴w ← (alphamask txt) ⊂ downcase txt ← ftxt file ⎕ ← ⍴u ← ∪w fhist file the errors happened inside 'hist' function, and I presume mostly due to the jot dot find (if understand correctly, operating on a matrix of length equal to : unique-length * words-length) Is there anyway to fix the issue? and then proceed to complete the solution. Also, Is this the way to create simple spell corrector in APL (that is a one which is capitalizing on APL strength as an array language)? I'm using LinuxMint 17.1 (kernel 3.13.0-37-generic #64-Ubuntu) Gnu APL 1.6 (794) Zsch 5.0.2 Emacs 25.1.50.1 Best, Ala'a P.S: I hoped that I could create the solution in APL and then get some wacks on the head from fellow experienced APL programmers before submitting it as 'another solution in X language'. but the hope stopped short before even getting the probability stage. |
- [Bug-apl] Spell corrector - APL Ala'a Mohammad
- Re: [Bug-apl] Spell corrector - APL Christian Robert
- Re: [Bug-apl] Spell corrector - APL Xiao-Yong Jin
- Re: [Bug-apl] Spell corrector - APL Juergen Sauermann
- [Bug-apl] Fwd: Re: Spell corrector - APL Christian Robert
- Re: [Bug-apl] Fwd: Re: Spell corrector - A... Juergen Sauermann
- Re: [Bug-apl] Spell corrector - APL Kacper Gutowski
- Re: [Bug-apl] Spell corrector - APL Ala'a Mohammad
- Re: [Bug-apl] Spell corrector - APL Juergen Sauermann
- Re: [Bug-apl] Spell corrector - APL Juergen Sauermann
- Re: [Bug-apl] Spell corrector - AP... Ala'a Mohammad
- Re: [Bug-apl] Spell corrector - APL Jay Foad
- Re: [Bug-apl] Spell corrector - APL Ala'a Mohammad