[sage-devel] Re: Fwd: numerically stable fast univariate polynomial multiplication over RR[x]

Bill Hart Tue, 27 Apr 2010 16:44:25 -0700

Certainly if *all* your memory accesses take time log n then you are
in trouble. But if your algorithm is cache friendly, it should take
time O(n log n) to access memory overall.


So I agree with what you say. Your implementation must be cache
friendly.

Bill.

On Apr 28, 12:11 am, Tim Daly <d...@axiom-developer.org> wrote:
> Bill Hart wrote:
> > On Apr 27, 8:55 pm, rjf <fate...@gmail.com> wrote:
>
> >> Oh, just another note.
>
> >> There are people who have made their whole careers on devising
> >> asymptotically fast algorithms
> >> which have never been implemented, or (if they have been implemented)
> >> are not fast because
> >> their overly-simplified analysis does not fit the currently available
> >> computer efficiency profile.
> >> Indeed, their analysis may not fit any past profile of computers, and
> >> their algorithms were never
> >> practical.
>
> > Interesting observation. Furer's algorithm is not practical, as are
> > some matrix multiplication algorithms. Do you have any other examples
> > in mind?
>
> >> Note, for example, that memory access is far more expensive than
> >> arithmetic, and that
> >> many computers now can multiply or add in comparable time; and they
> >> have several multipliers. Etc.
>
> > I would say that memory access if far more expensive than addition,
> > but not arithmetic in general. FFT multiplication is n log n
> > (sometimes with a log log n, depending on what you are multiplying)
> > and I would say the memory access perhaps adds a hidden log n factor.
>
> > Can you perhaps be more specific as to what you are hinting at?
>
> On current microprocessors a multiply can be done in a few clock ticks
> (probably one
> apparent clock tick if the in-processor pipelines are fully overlapped).
> An L1 cache access
> is about 3 clock ticks, an L2 can be dozens, and a memory access can be
> several hundred
> or more (depending on DDR/2/3/..., inter-processor MESI cache, prefetch
> instructions, etc.)
>
> By analogy, it is more expensive to peel an egg than to slice a carrot
> but the time is buried
> under the time it takes to get one out of the fridge (L1), the
> supermarket (L2), the farm
> (main memory) or grow a new one (disk).
>
> So if an FFT that is optimized for, say 128 byte cache lines with
> minimal cache collisions,
> then the multiply times will matter. But changing the FFT to ignore
> cache line sizes so the
> fetch breaks a cache line on each read and the multiply times are
> completely irrelevant.
>
>
>
> >> RJF
>
> >> --
> >> To post to this group, send an email to sage-devel@googlegroups.com
> >> To unsubscribe from this group, send an email to 
> >> sage-devel+unsubscr...@googlegroups.com
> >> For more options, visit this group 
> >> athttp://groups.google.com/group/sage-devel
> >> URL:http://www.sagemath.org
>
> --
> To post to this group, send an email to sage-devel@googlegroups.com
> To unsubscribe from this group, send an email to 
> sage-devel+unsubscr...@googlegroups.com
> For more options, visit this group athttp://groups.google.com/group/sage-devel
> URL:http://www.sagemath.org

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] Re: Fwd: numerically stable fast univariate polynomial multiplication over RR[x]

Reply via email to