On Jul 29, 2011, at 14:46, York Sun wrote: > On Fri, 2011-07-29 at 11:35 -0700, York Sun wrote: >> On Fri, 2011-07-29 at 12:20 -0500, Moffett, Kyle D wrote: >>> On Jul 28, 2011, at 17:41, York Sun wrote: >>>> I found a problem with the round up. Please try >>>> >>>> picos_to_mclk(mclk_to_picos(3)) >>>> >>>> You will notice the result is round up. I am sure it is unexpected. >>> >>> Hmm... what does fsl_ddr_get_mem_data_rate() return on your system? >>> I cannot reproduce this here with my memory freq of 800MHz. >>> >>> The function has obviously always done rounding, even before I made >>> any changes. Have you retested what the math does with the patch >>> reverted to see if it makes any difference? >>> >>> Hmm, looking at it, the obvious problem case would be mclk_to_picos() >>> using a rounded result in a subsequent multiply. Can you retest by >>> replacing mclk_to_picos() with the following code; this should keep >>> the error down by doing the rounding-to-10ps *after* the multiply. >>> >>> unsigned int mclk_to_picos(unsigned int mclk) >>> { >>> unsigned int data_rate = get_ddr_freq(0); >>> unsigned int result; >>> >>> /* Round to nearest 10ps, being careful about 64-bit multiply/divide >>> */ >>> unsigned long long mclk_ps = ULL_2e12 * mclk; >>> >>> /* Add 5*data_rate, for rounding */ >>> mclk_ps += 5*(unsigned long long)data_rate; >>> >>> /* Now perform the big divide, the result fits in 32-bits */ >>> do_div(mclk_ps, data_rate); >>> result = mclk_ps; >>> >>> /* We still need to round to 10ps */ >>> return 10 * (result/10); >>> } >>> >>> Obviously you could also get rid of the rounding alltogether, but I >>> don't know why it was put in the code in the first place... >> >> I think the problem comes from the round up. But your calculation of >> remainder is not right. You explained to me with example of 2*10^^12-1. >> It seems to be right until I tried another number 264*10^^7. I think the >> following calculation of remainder is >> >> clks_rem = do_div(clks, UL_5POW12); >> clks >>= 13; >> clks_rem |= (clks & (UL_2POW13-1)) * UL_5POW12;
That code you pasted is definitely not what is in the tree. The in-tree code is: 57 /* First multiply the time by the data rate (32x32 => 64) */ 58 clks = picos * (unsigned long long)get_ddr_freq(0); 59 60 /* 61 * Now divide by 5^12 and track the 32-bit remainder, then divide 62 * by 2*(2^12) using shifts (and updating the remainder). 63 */ 64 clks_rem = do_div(clks, UL_5pow12); 65 clks_rem <<= 13; 66 clks_rem |= clks & (UL_2pow13-1); 67 clks >>= 13; 68 69 /* If we had a remainder, then round up */ 70 if (clks_rem) 71 clks++; Can you tell me what your value of get_ddr_freq() is? I may be able to reproduce the issue here. Also, can you try your test case with the replacement for mclk_to_picos() that I pasted above and let me know what happens? I think the *real* error is that get_memory_clk_period_ps() (which I did not change) rounds up or down to a multiple of 10ps, and then the *result* of get_memory_clk_period_ps() is multiplied by the number of clocks. The effect is that the 5ps of rounding error that can occur is multiplied by the number of clocks, and may result in a much-too-large result. When you convert that number of picoseconds *back* to memory clocks you get a result that is too big, but that's because it was too many picoseconds! Where did you get the number 264*10^7? That seems too big to be a number of picoseconds (that's 2ms!), but way too small to be the product of get_ddr_freq() and some number of picoseconds. EG: For my system here with a DDR frequency of 800MHz, each clock is 2500ps, so 3 clocks times 2500ps times 800MHz is 6*10^12. In fact if you give "picos_to_mclk()" any number of picoseconds larger than at least 1 clock, the initial value of the "clks" variable is guaranteed to be at least 2*10^12, right? Using my number, let's run through the math: So if you start with picos = 7499 (to put rounding to the test) and get_ddr_freq() is 800MHz: clks = 7499 * 800*1000*1000; Now we have "clks" = 5999200000000, and we divide by 5**12: clks_rem = do_div(clks, UL_5pow12); Since 5999200000000/(5**12) is 24572.7232, we get: clks = 24572 clks_rem = 176562500 (which is equal to 5^12 * 0.7232) Now left-shift clks_rem by 13: clks_rem = 1446400000000 Then bitwise-OR in the lower 13 bits of clks: clks_rem |= (24572 & 8191) clks_rem = 1446400008188; Finally, rightshift clks by 13: clks = 2 Since there is a remainder, we "round up" from 2 to 3. These numbers empirically make sense, because I gave the system 7499ps (which is 3 clocks minus 1ps) and it rounded up properly. If I give it a simple number like 7500ps, then the math is really easy and obvious: clks = 7500 * 800*1000*100 Then the do_div(): clks = 24576 clks_rem = 0 Shifting clks_rem does nothing, because it is zero... Oring with the lower 13 bits of clks has no effect (it has no low bits set) Finally, when right-shifting clks by 13: clks = 3 This is obviously the correct result. Cheers, Kyle Moffett _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot