Hi Yuao,
Yuao Ma wrote:
I addressed some review comments and included additional feedback.
Thanks for the update. I know that the patch is incomplete, but as I
started with bootstrapping, I start with the following. I bet the issues
are known to you:
First, your patch does not bootstrap* here. As the stage2 and stage3 are
build with the current GCC, some -Werror= messages are enabled. And this
build fails with your patch:
../../../repos/gcc/gcc/fortran/trans-expr.cc:5338:10: error: enumeration
value ‘EXPR_CONDITIONAL’ not handled in switch [-Werror=switch]
Likewise for 'module.cc'.
(*With --enable-bootstrap, GCC is build trice, once with the system
compiler, then the just build stage1 compiler builds GCC without
debugging symbols and then stage2 builds stage3 with debugging symbols –
and then stage2 and stage3 are checked to be identical after stripping
the debugging symbols.)
It seems as if the following functions need to handle it (might be
incomplete and not all might be required):
dependency.cc: gfc_dep_compare_expr, gfc_check_dependency,
contains_forall_index_p, gfc_contains_implied_index_p
expr.cc: gfc_traverse_expr, gfc_check_init_expr, check_restricted,
frontend-passes.cc: gfc_expr_walker and traverse_io_block
module.cc: not handled.
frontend-passes.cc:
openmp.cc: expr_references_sym
resolve.cc: gfc_resolve_expr
trans-expr.c: gfc_apply_interface_mapping_to_expr
* * *
… as said, you are aware of it:
Yes I am aware of that, which is why I'm a bit fear of implementing
it. Every time we switch an expression type, we have to consider
whether a conditional expression should be a no-op or require special
handling. This would, however, be a great opportunity to dive deep
into the gfortran codebase.
Yes, but usually, neither new EXPR_TYPES nor new switch statement pop up.
* * *
Regarding the diagnostic:
...
Talking about the location, I wonder whether
the expr->where should be not the beginning of the expr,
i.e. 'll' for the outer and the leftmost ':' for the inner
conditional, but at the (respective) '?'.
Additionally, I wonder whether in
+ gfc_error ("True and false expressions in conditional
expression "
+ "must have the same declared type at %L",
+ &expr->where);
the '&expr->where' should be replaced by:
gfc_get_location_range (NULL, 0, &true_expr->where,0, &false_expr-
>where, 0)
* * *
I am not very familiar with PDT in Fortran. I will learn about it and
add the relevant parts.
I think for now, it is the simplest to just reject expressions with
BT_DERIVED + BT_CLASS wih a sorry (and a note to handle the kind
arguments of PDT).
* * *
Thanks for pointing that out; that was my oversight. I obviously
assumed conditional expressions don't require surrounding parentheses,
unlike in C++. To parse this form, maybe we need to handle it in
`match_primary`?
I have another question regarding this. The standard states that any
expression in parentheses should be treated as a data entity. Does a
conditional expression satisfy this requirement?
Well, data entity seems to be extremely wide, covering the result
of function calls and variables and other expressions and … [→ 5.4.3].
I think one place where '(…)' appeared in current code is:
'A' is a variable, '(A)' isn't but an expression. Hence:
integer :: A(-1:2)
lbound(A,dim=1) → -1 // lower bound of variable 'A'
lbound( (A), dim=1) → 1 // default lower bound of an expression is 1.
* * *
call myfunc(A) → subroutine may modify 'A'
call myfunc( (A) ) → argument is not definable, i.e. A may not be modified
* * *
While function calls are handled explicitly in the spec, the variable
vs. expression topic matters for code like:
C = (cond ? A : B ) ! expression or variable ?
Reason:
implicit none
integer, allocatable :: A(:), B(:), C(:)
allocate(A(-1:1))
A(:) = [1,2,3]
B = A
C = (A)
print *, lbound(B)
print *, lbound(C)
end
The example relies on (re)allocate on assignment and
prints '-1' (bound taken from a variable) and
'1' (bounds taken from an expression).
My reading is that for
C = (cond ? A : B)
the lower bound should be '1' - while for function calls
like:
lbound(cond ? A : B, dim=1)
the lower bound should be the one of A or B, respectively.
* * *
I don't know whether the '(...)' is just syntactical sugar,
added because of reasons like my example above or to aid parsing.
In fixed-form Fortran, the following line is ambiguous
d o i = f(g) + 40 * N
& ...
This can either start an assignment to an (possibly implicitly typed)
variable named 'doi'.
Or it can start a 'do' loop with loop variable 'i' ('do i = start,end,step').
The compiler only knows when it either hits the end of the statement or
a comma.
It might be that the added '(...)' helps in some cases that would otherwise
be ambiguous. (Albeit I don't see where it would be.)
* * *
R1002 conditional-expr is ( scalar-logical-expr ? expr [ : scalar-
logical-expr ? expr ]... : expr )
But Fortran does not seem to require it for inner conditional
expressions
in the 'else' branch:
print *, ( i>j ? i : j > k ? k) ! OK
print *, ( i>j ? j > k ? j : k : m ) ! invalid
print *, ( i>j ? (j > k ? j : k) : m ) ! OK
Yes, you are right. I have fixed the issue, so now the above three
should be valid.
Currently, the compiler accepts all three variants above
and A to E below. — But if I read R1002 correctly,
the second example above and 'D' below are invalid
and should be rejected.
I think your previous patch was already correctly handling it,
but due to the missing enforcement of the '(…)' requirement,
I am not completely sure.
In any case, I think R1002 means the following:
---------------------------------------
implicit none
logical :: cond, cond2, cond3
integer :: expr, expr2, expr3, expr4
! Fortran permits:
! ( scalar-logical-expr ? expr [ : scalar-logical-expr ? expr ]... : expr )
!
! Namely:
! ( scalar-logical-expr ? expr : expr )
print *, 'A:', ( cond ? expr : expr2 )
! Namely:
! ( scalar-logical-expr ? expr : scalar-logical-expr ? expr : expr )
print *, 'B:', ( cond ? expr : cond2 ? expr2 : expr3 )
! Namely:
! ( scalar-logical-expr ? expr : scalar-logical-expr ? expr :
scalar-logical-expr ? expr : expr )
print *, 'C:', ( cond ? expr : cond2 ? expr2 : cond3 ? expr3 : expr4 )
! INVALID - error expected
! ( scalar-logical-expr ? scalar-logical-expr ? expr : expr : expr )
print *, 'D:', ( cond ? cond2 ? expr : expr2 : expr3 ) ! { dg-error "..." }
! FIXED: Adding '( ... )'
! ( scalar-logical-expr ? ( ... ) : expr )
print *, 'E:', ( cond ? ( cond2 ? expr : expr2 ) : expr3 )
end
---------------------------------------
For the cond-expr feature, I was initially focused on the backend code
generation, but it looks like the frontend part (parsing & resolving)
still needs some work. I'll take care of that ASAP.
Thanks. As said, having a first version with some 'sorry' which reject
the cases to be handled later would be fine.
* * *
I do have a question about the backend, though. I've noticed that most
of the assignment scalarization happens in gfc_trans_assignment_1. If
we continue with our current approach for cond-expr, are we going to
duplicate a lot of common logic? I'm curious if we can reuse some of
that, like setting the string length or scalarizing the array.
I think we need do so some duplication as it might appear in multiple places
and we only want to evaluate function calls as few times as possible.
I hope that the changes are either small – or can be factored out.
But I have not really thought about it in depth. For simple scalars,
the current code seems to be fine.
For instance:
call sub( (cond ? A : B) )
here, passing the address of A and B directly works.
While for
call sub ( (cond ? A : B ) + 1)
a scalarized loop with a temporary is required.
For '(cond() ? A(g())%comp : B(h())%comp2(5:))'
the condition and (conditionally) g() and h() and
possibly also the effective array size should be evaluated
only once and before the scalarized loop.
That's not too different from existing code, except that
evaluations need to happen conditionally - and we have
to be careful as some code walking differs. For instance,
for:
integer, target :: A(10), B(10), C(:), D(10)
integer, allocatable :: C(:)
integer, pointer :: ptr(:)
C = (cond ? A : B)
a simple (mixing C style and Fortran-style syntax):
tmp = cond ? &A : &B
allocate(C( (cond ? size(A) : size(B)) ))
C(i) = tmp(i)
will do.
But for
ptr => B(::2)
D = (cond ? A : ptr)
the values are only contiguous for 'A' but not for 'ptr'.
Thus, we either need:
D(i) = cond ? A(i) : ptr(i)
putting the condition and the offset calculation into the loop.
or:
stride = cond ? A.stride : ptr.stride
tmp = cond ? A.base_addr : ptr.base_addr
...
D(i) = *(tmp + i * stride);
where the second version is more efficient, but we have to think
harder how to implement it. At least, both variants need to have
the same array rank - or are assumed rank, but that implies that
it only appears as actual argument.
* * *
There should be already some code which handles early evaluation
(like for 'C = var(g())%array' - where g() is only evalulated
once), which we can also use when scalarizing. Some other code
requires more thinking, but could also be implemented later
like the last example.
Tobias