Any comment or more suggestions on this patch?
thanks, Cong On Mon, Sep 9, 2013 at 7:28 PM, Cong Hou <co...@google.com> wrote: > On Mon, Sep 9, 2013 at 6:26 PM, Xinliang David Li <davi...@google.com> wrote: >> On Fri, Sep 6, 2013 at 3:24 PM, Cong Hou <co...@google.com> wrote: >>> First, thank you for your detailed comments again! Then I deeply >>> apologize for not explaining my patch properly and responding to your >>> previous comment. I didn't understand thoroughly the problem before >>> submitting the patch. >>> >>> Previously I only considered the following three conversions for sqrt(): >>> >>> >>> 1: (float) sqrt ((double) float_val) -> sqrtf (float_val) >>> 2: (float) sqrtl ((long double) float_val) -> sqrtf (float_val) >>> 3: (double) sqrtl ((long double) double_val) -> sqrt (double_val) >>> >>> >>> We have four types here: >>> >>> TYPE is the type to which the result of the function call is converted. >>> ITYPE is the type of the math call expression. >>> TREE_TYPE(arg0) is the type of the function argument (before type >>> conversion). >>> NEWTYPE is chosen from TYPE and TREE_TYPE(arg0) with higher precision. >>> It will be the type of the new math call expression after conversion. >>> >>> For all three cases above, TYPE is always the same as NEWTYPE. That is >>> why I only considered TYPE during the precision comparison. ITYPE can >>> only be double_type_node or long_double_type_node depending on the >>> type of the math function. That is why I explicitly used those two >>> types instead of ITYPE (no correctness issue). But you are right, >>> ITYPE is more elegant and better here. >>> >>> After further analysis, I found I missed two more cases. Note that we >>> have the following conditions according to the code in convert.c: >>> >>> TYPE_PRECISION(NEWTYPE) >= TYPE_PRECISION(TYPE) >>> TYPE_PRECISION(NEWTYPE) >= TYPE_PRECISION(TREE_TYPE(arg0)) >>> TYPE_PRECISION (NEWTYPE) < TYPE_PRECISION (ITYPE) >>> >>> the last condition comes from the fact that we only consider >>> converting a math function call into another one with narrower type. >>> Therefore we have >>> >>> TYPE_PRECISION(TYPE) < TYPE_PRECISION (ITYPE) >>> TYPE_PRECISION(TREE_TYPE(arg0)) < TYPE_PRECISION (ITYPE) >>> >>> So for sqrt(), TYPE and TREE_TYPE(arg0) can only be float, and for >>> sqrtl(), TYPE and TREE_TYPE(arg0) can be either float or double with >>> four possible combinations. Therefore we have two more conversions to >>> consider besides the three ones I mentioned above: >>> >>> >>> 4: (float) sqrtl ((long double) double_val) -> (float) sqrt (double_val) >>> 5: (double) sqrtl ((long double) float_val) -> sqrt ((double) float_val) >>> >>> >>> For the first conversion here, TYPE (float) is different from NEWTYPE >>> (double), and my previous patch doesn't handle this case.The correct >>> way is to compare precisions of ITYPE and NEWTYPE now. >>> >>> To sum up, we are converting the expression >>> >>> (TYPE) sqrtITYPE ((ITYPE) expr) >>> >>> to >>> >>> (TYPE) sqrtNEWTYPE ((NEWTYPE) expr) >>> >>> and we require >>> >>> PRECISION (ITYPE) >= PRECISION (NEWTYPE) * 2 + 2 >>> >>> to make it a safe conversion. >>> >>> >>> The new patch is pasted below. >>> >>> I appreciate your detailed comments and analysis, and next time when I >>> submit a patch I will be more carefully about the reviewer's comment. >>> >>> >>> Thank you! >>> >>> Cong >>> >>> >>> >>> Index: gcc/convert.c >>> =================================================================== >>> --- gcc/convert.c (revision 201891) >>> +++ gcc/convert.c (working copy) >>> @@ -135,16 +135,19 @@ convert_to_real (tree type, tree expr) >>> CASE_MATHFN (COS) >>> CASE_MATHFN (ERF) >>> CASE_MATHFN (ERFC) >>> - CASE_MATHFN (FABS) >>> CASE_MATHFN (LOG) >>> CASE_MATHFN (LOG10) >>> CASE_MATHFN (LOG2) >>> CASE_MATHFN (LOG1P) >>> - CASE_MATHFN (LOGB) >>> CASE_MATHFN (SIN) >>> - CASE_MATHFN (SQRT) >>> CASE_MATHFN (TAN) >>> CASE_MATHFN (TANH) >>> + /* The above functions are not safe to do this conversion. */ >>> + if (!flag_unsafe_math_optimizations) >>> + break; >>> + CASE_MATHFN (SQRT) >>> + CASE_MATHFN (FABS) >>> + CASE_MATHFN (LOGB) >>> #undef CASE_MATHFN >>> { >>> tree arg0 = strip_float_extensions (CALL_EXPR_ARG (expr, 0)); >>> @@ -155,6 +158,27 @@ convert_to_real (tree type, tree expr) >>> if (TYPE_PRECISION (TREE_TYPE (arg0)) > TYPE_PRECISION (type)) >>> newtype = TREE_TYPE (arg0); >>> >>> + /* We consider to convert >>> + >>> + (T1) sqrtT2 ((T2) exprT3) >>> + to >>> + (T1) sqrtT4 ((T4) exprT3) >> >> Should this be >> >> (T4) sqrtT4 ((T4) exprT3) > > T4 is not necessarily the same as T1. For the conversion: > > (float) sqrtl ((long double) double_val) -> (float) sqrt (double_val) > > T4 is double and T1 is float. > > >>> + >>> + , where T1 is TYPE, T2 is ITYPE, T3 is TREE_TYPE (ARG0), >>> + and T4 is NEWTYPE. >> >> NEWTYPE is also the wider one between T1 and T3. > > Right. Actually this is easy to catch from the context just before > this comment. > > tree newtype = type; > if (TYPE_PRECISION (TREE_TYPE (arg0)) > TYPE_PRECISION (type)) > newtype = TREE_TYPE (arg0); > > > > thanks, > Cong > > >> >> David >> >>> All those types are of floating point types. >>> + T4 (NEWTYPE) should be narrower than T2 (ITYPE). This conversion >>> + is safe only if P1 >= P2*2+2, where P1 and P2 are precisions of >>> + T2 and T4. See the following URL for a reference: >>> + >>> http://stackoverflow.com/questions/9235456/determining-floating-point-square-root >>> + */ >>> + if (fcode == BUILT_IN_SQRT || fcode == BUILT_IN_SQRTL) >>> + { >>> + int p1 = REAL_MODE_FORMAT (TYPE_MODE (itype))->p; >>> + int p2 = REAL_MODE_FORMAT (TYPE_MODE (newtype))->p; >>> + if (p1 < p2 * 2 + 2 && !flag_unsafe_math_optimizations) >>> + break; >>> + } >>> + >>> /* Be careful about integer to fp conversions. >>> These may overflow still. */ >>> if (FLOAT_TYPE_P (TREE_TYPE (arg0)) >>> Index: gcc/testsuite/gcc.c-torture/execute/20030125-1.c >>> =================================================================== >>> --- gcc/testsuite/gcc.c-torture/execute/20030125-1.c (revision 201891) >>> +++ gcc/testsuite/gcc.c-torture/execute/20030125-1.c (working copy) >>> @@ -44,11 +44,11 @@ __attribute__ ((noinline)) >>> double >>> sin(double a) >>> { >>> - abort (); >>> + return a; >>> } >>> __attribute__ ((noinline)) >>> float >>> sinf(float a) >>> { >>> - return a; >>> + abort (); >>> } >>> >>> On Wed, Sep 4, 2013 at 3:26 PM, Joseph S. Myers <jos...@codesourcery.com> >>> wrote: >>>> On Wed, 4 Sep 2013, Xinliang David Li wrote: >>>> >>>>> > This patch submission still fails to pay attention to various of my >>>>> > comments. >>>>> > >>>>> >>>>> If you can provide inlined comments in the patch, that will be more >>>>> useful and productive. >>>> >>>> I have explained things several times in this thread already. I see no >>>> point in repeating things when what I would say has already been said and >>>> ignored. As far as I can tell, this latest patch submission has taken one >>>> line from the message it is in response to, and largely ignored the >>>> following two paragraphs (including where I explicitly say that said line >>>> should not appear literally in the source code at all). But, repeating >>>> what I said before yet again: >>>> >>>> (but you should be referring to the relevant types >>>> >>>> The patch does not do properly that. It refers explicitly to >>>> long_double_type_node and double_type_node. >>>> >>>> - "type", the type >>>> being converted to, "itype", the type of the function being called in the >>>> source code, "TREE_TYPE (arg0)", the type of the argument after >>>> extensions >>>> have been removed, and "newtype", computed from those >>>> >>>> The patch only engages with "type". I suspect "newtype" is what it really >>>> means there when using "type". When it uses long_double_type_node and >>>> double_type_node, those should be "itype". >>>> >>>> - so you should have >>>> expressions like the above with two or more of those four types, but not >>>> with long_double_type_node directly). >>>> >>>> See above. The patch uses long_double_type_node directly, contrary to >>>> what I explicitly said. You are free to disagree with something I say in >>>> a review - but in that case you need to reply specifically to the review >>>> comment explaining your rationale for disagreeing with it. Just ignoring >>>> the comment and not mentioning the disagreement wastes the time of >>>> reviewers. >>>> >>>> The patch submission will need to include a proper analysis to justify to >>>> the reader why the particular inequality with particular types from those >>>> four is correct in all cases where the relevant code may be executed. >>>> >>>> The comments only refer to "T1" and "T2" without explaining how they >>>> relate to the original expression (three types) or the variables within >>>> GCC dealing with various types (four types, because newtype gets >>>> involved). I said back in >>>> <http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00161.html> and >>>> <http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01384.html> that three types >>>> are involved - when I say "the patch submission needs to include its own >>>> analysis for the full generality of three types", again, it's >>>> inappropriate for a patch to omit such an analysis without justification. >>>> The patch submission needs to include an analysis that properly explains >>>> the transformation involved and the conditions under which it is safe. >>>> Maybe starting along the lines of: >>>> >>>> We have an expression of the form (T1) sqrtT2 ((T2) exprT3), where exprT3 >>>> has type T3 (TREE_TYPE (ARG0)), T2 is the type of the floating-point >>>> square root function being used (ITYPE), T1 is TYPE and all these types >>>> are binary floating-point types. We wish to optimize if possible to an >>>> expression of the form (T1) sqrtT4 ((T4) exprT3), where T4 (NEWTYPE) is >>>> narrower than T2. (Then explain the choice of T4 and the conditions under >>>> which the transformation is safe, with appropriate references.) >>>> >>>> I suggest that for the next patch submission you (the patch submitter) >>>> make sure you spend at least an hour properly understanding the issues and >>>> all the previous messages in the thread and writing up the detailed, >>>> coherent explanation of the transformation and analysis of the issues that >>>> I asked for some time ago, as if for a reader who hasn't read any of this >>>> thread or looked at this transformation in GCC before. I've certainly >>>> spent longer than that on review in this thread. It's normal for a patch >>>> involving anything at all tricky to take hours to write up (and also >>>> normal for one to discover, in the course of writing the detailed coherent >>>> analysis for people who aren't familiar with the code and the issues >>>> involved, that there's in fact something wrong with the patch and it needs >>>> revisiting before submission). >>>> >>>> -- >>>> Joseph S. Myers >>>> jos...@codesourcery.com