Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

Bakul Shah Sat, 13 Jan 2024 00:04:14 -0800

FYI Julia (on M1 MBP) seems much faster:

julia> @which factorial(big(100000000))
factorial(x::BigInt) in Base.GMP at gmp.jl:645


julia> @time begin; factorial(big(100000000)); 1; end
 27.849116 seconds (1.39 M allocations: 11.963 GiB, 0.22% gc time)

Probably they use Schönhage-Strassen multiplication algorithm for very large 
numbers as the 1E8! result will have over a 3/4 billion digits. I should try 
this in Gambit-Scheme (which has an excellent multiply implementation).

> On Jan 12, 2024, at 9:32 PM, Rob Pike <r...@golang.org> wrote:
> 
> Thanks for the tip. A fairly straightforward implementation of this algorithm 
> gives me about a factor of two speedup for pretty much any value. I went up 
> to 1e8!, which took about half an hour compared to nearly an hour for 
> MulRange.
> 
> I'll probably stick in ivy after a little more tuning. I may even try 
> parallelization.
> 
> -rob
> 
> 
> On Tue, Jan 9, 2024 at 4:54 PM Bakul Shah <ba...@iitbombay.org 
> <mailto:ba...@iitbombay.org>> wrote:
>> For that you may wish to explore Peter Luschny's "prime swing" factorial 
>> algorithm and variations!
>> https://oeis.org/A000142/a000142.pdf
>> 
>> And implementations in various languages including go: 
>> https://github.com/PeterLuschny/Fast-Factorial-Functions
>> 
>>> On Jan 8, 2024, at 9:22 PM, Rob Pike <r...@golang.org 
>>> <mailto:r...@golang.org>> wrote:
>>> 
>>> Here's an example where it's the bottleneck: ivy factorial
>>> 
>>> 
>>> !1e7
>>> 1.20242340052e+65657059
>>> 
>>> )cpu
>>> 1m10s (1m10s user, 167.330ms sys)
>>> 
>>> 
>>> -rob
>>> 
>>> 
>>> On Tue, Jan 9, 2024 at 2:21 PM Bakul Shah <ba...@iitbombay.org 
>>> <mailto:ba...@iitbombay.org>> wrote:
>>>> Perhaps you were thinking of this?
>>>> 
>>>> At iteration number k, the value xk contains O(klog(k)) digits, thus the 
>>>> computation of xk+1 = kxk has cost O(klog(k)). Finally, the total cost 
>>>> with this basic approach is O(2log(2)+¼+n log(n)) = O(n2log(n)).
>>>> A better approach is the binary splitting : it just consists in 
>>>> recursively cutting the product of m consecutive integers in half. It 
>>>> leads to better results when products on large integers are performed with 
>>>> a fast method.
>>>> 
>>>> http://numbers.computation.free.fr/Constants/Algorithms/splitting.html
>>>> 
>>>> I think you can do recursive splitting without using function recursion by 
>>>> allocating N/2 array (where b = a+N-1) and iterating over it; each time 
>>>> the array "shrinks" by half. A "cleverer" algorithm would allocate an 
>>>> array of *words* of a bignum, as you know that the upper limit on size is 
>>>> N*64 (for 64 bit numbers) so you can just reuse the same space for each 
>>>> outer iteration (N/2 multiplie, N/4 ...) and apply Karatsuba 2nd outer 
>>>> iteration onwards. Not sure if this is easy in Go.
>>>> 
>>>>> On Jan 8, 2024, at 11:47 AM, Robert Griesemer <g...@golang.org 
>>>>> <mailto:g...@golang.org>> wrote:
>>>>> 
>>>>> Hello John;
>>>>> 
>>>>> Thanks for your interest in this code.
>>>>> 
>>>>> In a (long past) implementation of the factorial function, I noticed that 
>>>>> computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when computed 
>>>>> in a recursive fashion than when computed iteratively: the reason (I 
>>>>> believed) was that the iterative approach seemed to produce a lot more 
>>>>> "internal fragmentation", that is medium-size intermediate results where 
>>>>> the most significant word (or "limb" as is the term in other 
>>>>> implementations) is only marginally used, resulting in more work than 
>>>>> necessary if those words were fully used.
>>>>> 
>>>>> I never fully investigated, it was enough at the time that the recursive 
>>>>> approach was much faster. In retrospect, I don't quite believe my own 
>>>>> theory. Also, that implementation didn't have Karatsuba multiplication, 
>>>>> it just used grade-school multiplication.
>>>>> 
>>>>> Since a, b are uint64 values (words), this could probably be implemented 
>>>>> in terms of mulAddVWW directly, with a suitable initial allocation for 
>>>>> the result - ideally this should just need one allocation (not sure how 
>>>>> close we can get to the right size). That would cut down the allocations 
>>>>> massively.
>>>>> 
>>>>> In a next step, one should benchmark the implementation again.
>>>>> 
>>>>> But at the very least, the overflow bug should be fixed, thanks for 
>>>>> finding it! I will send out a CL to fix that today.
>>>>> 
>>>>> Thanks,
>>>>> - gri
>>>>> 
>>>>> 
>>>>> 
>>>>> On Sun, Jan 7, 2024 at 4:47 AM John Jannotti <janno...@gmail.com 
>>>>> <mailto:janno...@gmail.com>> wrote:
>>>>>> Actually, both implementations have bugs!
>>>>>> 
>>>>>> The recursive implementation ends with:
>>>>>> ```
>>>>>> m := (a + b) / 2
>>>>>> return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b))
>>>>>> ```
>>>>>> 
>>>>>> That's a bug whenever `(a+b)` overflows, making `m` small. 
>>>>>> FIX: `m := a + (b-a)/2`
>>>>>> 
>>>>>> My iterative implementation went into an infinite loop here:
>>>>>> `for m := a + 1; m <= b; m++ {`
>>>>>> if b is `math.MaxUint64`
>>>>>> FIX: add `&& m > a` to the exit condition is an easy fix, but pays a 
>>>>>> small penalty for the vast majority of calls that don't have b=MaxUint64
>>>>>> 
>>>>>> I would add these to `mulRangesN` of the unit test:
>>>>>> ```
>>>>>>  {math.MaxUint64 - 3, math.MaxUint64 - 1, 
>>>>>> "6277101735386680760773248120919220245411599323494568951784"},
>>>>>> {math.MaxUint64 - 3, math.MaxUint64, 
>>>>>> "115792089237316195360799967654821100226821973275796746098729803619699194331160"}
>>>>>> ```
>>>>>> 
>>>>>> On Sun, Jan 7, 2024 at 6:34 AM John Jannotti <janno...@gmail.com 
>>>>>> <mailto:janno...@gmail.com>> wrote:
>>>>>>> I'm equally curious.
>>>>>>> 
>>>>>>> FWIW, I realized the loop should perhaps be
>>>>>>> ```
>>>>>>> mb := nat(nil).setUint64(b) // ensure mb starts big enough for b, even 
>>>>>>> on 32-bit arch
>>>>>>> for m := a + 1; m <= b; m++ {
>>>>>>>   mb.setUint64(m)
>>>>>>>   z = z.mul(z, mb)
>>>>>>> }
>>>>>>> ```
>>>>>>> to avoid allocating repeatedly for `m`, which yields:
>>>>>>> BenchmarkIterativeMulRangeN-10      354685      3032 ns/op    2129 B/op 
>>>>>>>      48 allocs/op
>>>>>>> 
>>>>>>> On Sun, Jan 7, 2024 at 2:41 AM Rob Pike <r...@golang.org 
>>>>>>> <mailto:r...@golang.org>> wrote:
>>>>>>>> It seems reasonable but first I'd like to understand why the recursive 
>>>>>>>> method is used. I can't deduce why, but the CL that adds it, by gri, 
>>>>>>>> does Karatsuba multiplication, which implies something deep is going 
>>>>>>>> on. I'll add him to the conversation.
>>>>>>>> 
>>>>>>>> -rob
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Jan 7, 2024 at 5:46 PM John Jannotti <janno...@gmail.com 
>>>>>>>> <mailto:janno...@gmail.com>> wrote:
>>>>>>>>> I enjoy bignum implementations, so I was looking through nat.go and 
>>>>>>>>> saw that `mulRange` is implemented in a surprising, recursive way,.  
>>>>>>>>> In the non-base case, `mulRange(a, b)` returns `mulrange(a, (a+b)/2) 
>>>>>>>>> * mulRange(1+(a+b)/2, b)` (lots of big.Int ceremony elided).
>>>>>>>>> 
>>>>>>>>> That's fine, but I didn't see any advantage over the straightforward 
>>>>>>>>> (and simpler?) for loop.
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> z = z.setUint64(a)
>>>>>>>>> for m := a + 1; m <= b; m++ {
>>>>>>>>>       z = z.mul(z, nat(nil).setUint64(m))
>>>>>>>>> }
>>>>>>>>> return z
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> In fact, I suspected the existing code was slower, and allocated a 
>>>>>>>>> lot more.  That seems true. A quick benchmark, using the existing 
>>>>>>>>> unit test as the benchmark, yields
>>>>>>>>> BenchmarkRecusiveMulRangeN-10           169417              6856 
>>>>>>>>> ns/op            9452 B/op        338 allocs/op
>>>>>>>>> BenchmarkIterativeMulRangeN-10          265354              4269 
>>>>>>>>> ns/op            2505 B/op        196 allocs/op
>>>>>>>>> 
>>>>>>>>> I doubt `mulRange` is a performance bottleneck in anyone's code! But 
>>>>>>>>> it is exported as `int.MulRange` so I guess it's viewed with some 
>>>>>>>>> value.  And seeing as how the for-loop seems even easier to 
>>>>>>>>> understand that the recursive version, maybe it's worth submitting a 
>>>>>>>>> PR? (If so, should I create an issue first?)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "golang-nuts" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to golang-nuts+unsubscr...@googlegroups.com 
>>>>>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>.
>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>> https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com
>>>>>>>>>  
>>>>>>>>> <https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>>>> 
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "golang-nuts" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to golang-nuts+unsubscr...@googlegroups.com 
>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>>>> 
>> 

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/B219273B-D067-4D57-9280-EEAB297778D7%40iitbombay.org.

Re: [go-nuts] Any interest in nat.mulRange simplification/optimization?

Reply via email to