Certainly if *all* your memory accesses take time log n then you are in trouble. But if your algorithm is cache friendly, it should take time O(n log n) to access memory overall.
So I agree with what you say. Your implementation must be cache friendly. Bill. On Apr 28, 12:11 am, Tim Daly <d...@axiom-developer.org> wrote: > Bill Hart wrote: > > On Apr 27, 8:55 pm, rjf <fate...@gmail.com> wrote: > > >> Oh, just another note. > > >> There are people who have made their whole careers on devising > >> asymptotically fast algorithms > >> which have never been implemented, or (if they have been implemented) > >> are not fast because > >> their overly-simplified analysis does not fit the currently available > >> computer efficiency profile. > >> Indeed, their analysis may not fit any past profile of computers, and > >> their algorithms were never > >> practical. > > > Interesting observation. Furer's algorithm is not practical, as are > > some matrix multiplication algorithms. Do you have any other examples > > in mind? > > >> Note, for example, that memory access is far more expensive than > >> arithmetic, and that > >> many computers now can multiply or add in comparable time; and they > >> have several multipliers. Etc. > > > I would say that memory access if far more expensive than addition, > > but not arithmetic in general. FFT multiplication is n log n > > (sometimes with a log log n, depending on what you are multiplying) > > and I would say the memory access perhaps adds a hidden log n factor. > > > Can you perhaps be more specific as to what you are hinting at? > > On current microprocessors a multiply can be done in a few clock ticks > (probably one > apparent clock tick if the in-processor pipelines are fully overlapped). > An L1 cache access > is about 3 clock ticks, an L2 can be dozens, and a memory access can be > several hundred > or more (depending on DDR/2/3/..., inter-processor MESI cache, prefetch > instructions, etc.) > > By analogy, it is more expensive to peel an egg than to slice a carrot > but the time is buried > under the time it takes to get one out of the fridge (L1), the > supermarket (L2), the farm > (main memory) or grow a new one (disk). > > So if an FFT that is optimized for, say 128 byte cache lines with > minimal cache collisions, > then the multiply times will matter. But changing the FFT to ignore > cache line sizes so the > fetch breaks a cache line on each read and the multiply times are > completely irrelevant. > > > > >> RJF > > >> -- > >> To post to this group, send an email to sage-devel@googlegroups.com > >> To unsubscribe from this group, send an email to > >> sage-devel+unsubscr...@googlegroups.com > >> For more options, visit this group > >> athttp://groups.google.com/group/sage-devel > >> URL:http://www.sagemath.org > > -- > To post to this group, send an email to sage-devel@googlegroups.com > To unsubscribe from this group, send an email to > sage-devel+unsubscr...@googlegroups.com > For more options, visit this group athttp://groups.google.com/group/sage-devel > URL:http://www.sagemath.org -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org