On Wed, Apr 3, 2013 at 6:16 PM, Kenneth Zadeck <zad...@naturalbridge.com> wrote: > On 04/03/2013 09:53 AM, Richard Biener wrote: >> >> On Wed, Apr 3, 2013 at 2:05 PM, Kenneth Zadeck <zad...@naturalbridge.com> >> wrote: >>> >>> On 04/03/2013 05:17 AM, Richard Biener wrote: >>> >>>> In the end you will have a variable-size storage in TREE_INT_CST thus >>>> you will have at least to emit _code_ copying over meta-data and data >>>> from the tree representation to the wide-int (similar for RTX >>>> CONST_DOUBLE/INT). >>>> I'm objecting to the amount of code you emit and agree that the runtime >>>> cost is copying the meta-data (hopefully optimizable via CSE / SRA) >>>> and in most cases one (or two) iterations of the loop copying the data >>>> (not optimizable). >>> >>> i did get rid of the bitsize in the wide-int patch so at this point the >>> meta >>> data is the precision and the len. >>> not really a lot here. As usual we pay a high price in gcc for not >>> pushing >>> the tree rep down into the rtl level, then it would have been acceptable >>> to >>> have the tree type bleed into the wide-int code. >>> >>> >>> >>>>> 2) You present this as if the implementor actually should care about >>>>> the >>>>> implementation and you give 3 alternatives: the double_int, the >>>>> current >>>>> one, and HWI. We have tried to make it so that the client should >>>>> not >>>>> care. Certainly in my experience here, I have not found a place to >>>>> care. >>>> >>>> Well, similar as for the copying overhead for tree your approach >>>> requires >>>> overloading operations for HOST_WIDE_INT operands to be able to >>>> say wi + 1 (which is certainly desirable), or the overhead of using >>>> wide_int_one (). >>>> >>>>> In my opinion double_int needs to go away. That is the main thrust of >>>>> my >>>>> patches. There is no place in a compiler for an abi that depends on >>>>> constants fitting into 2 two words whose size is defined by the host. >>>> >>>> That's true. I'm not arguing to preserve "double-int" - I'm arguing to >>>> preserve a way to ask for an integer type on the host with (at least) >>>> N bits. Almost all double-int users really ask for an integer type on >>>> the >>>> host that has at least as many bits as the pointer representation (or >>>> word_mode) on >>>> the target (we do have HOST_WIDEST_INT == 32bits for 64bit pointer >>>> targets). No double-int user specifically wants 2 * HOST_WIDE_INT >>>> precision - that is just what happens to be there. Thus I am providing >>>> a way to say get me a host integer with at least N bits (VRP asks for >>>> this, for example). >>>> >>>> What I was asking for is that whatever can provide the above should >>>> share >>>> the functional interface with wide-int (or the othert way around). And >>>> I >>>> was claiming that wide-int is too fat, because current users of >>>> double-int >>>> eventually store double-ints permanently. >>> >>> The problem is that, in truth, double int is too fat. 99.something% of >>> all >>> constants fit in 1 hwi and that is likely to be true forever (i >>> understand >>> that tree vpn may need some thought here). The rtl level, which has, for >>> as >>> long as i have known it, had 2 reps for integer constants. So it was >>> relatively easy to slide the CONST_WIDE_INT in. It seems like the right >>> trickery here rather than adding a storage model for wide-ints might be a >>> way to use the c++ to invisibly support several (and by "several" i >>> really >>> mean 2) classes of TREE_CSTs. >> >> The truth is that _now_ TREE_INT_CSTs use double-ints and we have >> CONST_INT and CONST_DOUBLE. What I (and you) propose would >> get us to use variable-size storage for both, allowing to just use a >> single >> HOST_WIDE_INT in the majority of cases. In my view the constant >> length of the variable-size storage for TREE_INT_CSTs is determined >> by its type (thus, it doesn't have "optimized" variable-size storage >> but "unoptimized" fixed-size storage based on the maximum storage >> requirement for the type). Similar for RTX CONST_INT which would >> have fixed-size storage based on the mode-size of the constant. >> Using optimized space (thus using the encoding properties) requires you >> to fit the 'short len' somewhere which possibly will not pay off in the >> end >> (for tree we do have that storage available, so we could go with optimized >> storage for it, not sure with RTL, I don't see available space there). > > There are two questions here: one is the fact that you object to the fact > that we represent small constants efficiently
Huh? Where do I object to that? I question that for the storage in tree and RTX the encoding trick pays off if you need another HWI-aligned word to store the len. But see below. > and the second is that we take > advantage of the fact that fixed size stack allocation is effectively free > for short lived objects like wide-ints (as i use them). I don't question that and I am not asking you to change that. As part of what I ask for a more optimal (smaller) stack allocation would be _possible_ (but not required). > At the rtl level your idea does not work. rtl constants do not have a mode > or type. So if you do not compress, how are you going to determine how > many words you need for the constant 1. I would love to have a rep that > had the mode in it. But it is a huge change that requires a lot of > hacking to every port. Quoting from your RTL parts patch: +struct GTY((variable_size)) hwivec_def { + int num_elem; /* number of elements */ + HOST_WIDE_INT elem[1]; +}; + +#define HWI_GET_NUM_ELEM(HWIVEC) ((HWIVEC)->num_elem) +#define HWI_PUT_NUM_ELEM(HWIVEC, NUM) ((HWIVEC)->num_elem = (NUM)) + /* RTL expression ("rtx"). */ struct GTY((chain_next ("RTX_NEXT (&%h)"), @@ -343,6 +352,7 @@ struct GTY((chain_next ("RTX_NEXT (&%h)"), struct block_symbol block_sym; struct real_value rv; struct fixed_value fv; + struct hwivec_def hwiv; } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u; }; that 'wastes' one HOST_WIDE_INT. So the most efficient encoding of 0 will require two HOST_WIDE_INT size storage. Same as for double-int currently. 99.9% of all constants will fit into two HOST_WIDE_INTs, thus having the variable size encoding for the storage in RTX (as opposed to having num_elem == GET_MODE_BITSIZE / HOST_BITS_PER_WIDE_INT) is a waste of time. num_elem for a non-optimized encoding is available from the mode stored in the integer constant RTX... You say > At the rtl level your idea does not work. rtl constants do not have a mode > or type. Which is not true and does not matter. I tell you why. Quote: +#if TARGET_SUPPORTS_WIDE_INT + +/* Match CONST_*s that can represent compile-time constant integers. */ +#define CASE_CONST_SCALAR_INT \ + case CONST_INT: \ + case CONST_WIDE_INT which means you are only replacing CONST_DOUBLE with wide-int. And _all_ CONST_DOUBLE have a mode. Otherwise you'd have no way of creating the wide-int in the first place. Which means any future transition of CONST_INT to the wide-int code (which is desirable!) will need to fix the fact that CONST_INTs do not have a mode. Or even simpler - just retain CONST_INT behavior! CONST_INTs need a single HWI storage (moving that to wide-int with a len field regresses here) and CONST_INTs are always sign-extended. Preserve that! Make VOIDmode integer constants have a single HWI as storage, sign-extended. > I understand that this makes me vulnerable to the argument that we should > not let the rtl level ever dictate anything about the tree level, but the > truth is that a variable len rep is almost always used for big integers. > In our code, most constants of large types are small numbers. (Remember i > got into this because the tree constant prop thinks that left shifting any > number by anything greater than 128 is always 0 and discovered that that was > just the tip of the iceberg.) But mostly i support the decision to canonize > numbers to the smallest number of HWIs because most of the algorithms to do > the math can be short circuited. I admit that if i had to effectively > unpack most numbers to do the math, that the canonization would be a waste. > However, this is not really relevant to this conversation. Yes, you could > get rid of the len, but this such a small part of picture. Getting rid of 'len' in the RTX storage was only a question of whether it is an efficient way to go forward. And with considering to unify CONST_INTs and CONST_WIDE_INTs it is not. And even for CONST_WIDE_INTs (which most of the time would be 2 HWI storage, as otherwise you'd use a CONST_INT) it would be an improvement. > Furthermore, I am constrained at the rtl level because it is just too dirty > to share the storage. We tried that and the amount of whack a mole we were > playing was killing us. The above argument (not encode len explicitely in the RTX) is unrelated to "sharing the storage" (whatever you exactly refer to here). > I am comfortable making big changes at the portable level because i can > easily test them, but changes to fundamental data structures inside every > port is more than i am willing to do. If you are going to do that, then > you are going to have start giving rtl constants modes and that is a huge > change (that while desirable i cannot do). All CONST_DOUBLEs have modes. You do not touch CONST_INTs. VOIDmode CONST_WIDE_INTs seem perfectly possible to me (see above). So nothing to fix, even if that is desirable as you say and as I agree. > At the tree level you could share the storage, I admit that that is easy, i > just do not think that it is desirable. The stack allocation inside the > wide-int is very cheap and the fact that the number that comes out is almost > always one hwi makes this very efficient. Even if I was starting from > scratch this would be a strong contender to be the right representation. > > Fundamentally, i do not believe that the amount of copying that i am > proposing at the tree level will be significant. Yes, if you force me to > use an uncompressed format you can make what i propose expensive. An uncompressed format at tree / RTX will in 99.9% of all cases be one or two HWIs. Realize that all my arguments and requests are really independent (even though I make them together). As we now have the power of C++ and templates I see no good reason to not do the optimization of eliding the copying from the RTX / tree representation. It can be done as a followup I guess (with the disadvantage of leaving the possibility to "optimize" existing converted client code). Btw, is the current state of the patches accessible as a branch somewhere? >>>>> This is not a beauty contest argument, we have public ports are >>>>> beginning >>>>> to >>>>> use modes that are larger than two x86-64 HWIs and i have a private >>>>> port >>>>> that has such modes and it is my experience that any pass that uses >>>>> this >>>>> interface has one of three behaviors: it silently gets the wrong >>>>> answer, >>>>> it >>>>> ices, or it fails to do the transformation. If we leave double_int as >>>>> an >>>>> available option, then any use of it potentially will have one of these >>>>> three behaviors. And so one of my strong objections to this direction >>>>> is >>>>> that i do not want to fight this kind of bug for the rest of my life. >>>>> Having a single storage model that just always works is in my opinion a >>>>> highly desirable option. What you have never answered in a concrete >>>>> manner >>>>> is, if we decide to provide this generality, what it would be used for. >>>>> There is no place in a portable compiler where the right answer for >>>>> every >>>>> target is two HOST wide integers. >>>>> >>>>> However, i will admit that the HWI option has some merits. We try to >>>>> address this in our implementation by dividing what is done inline in >>>>> wide-int.h to the cases that fit in an HWI and then only drop into the >>>>> heavy >>>>> code in wide-int.c if mode is larger (which it rarely will be). >>>>> However, a >>>>> case could be made that for certain kinds of things like string lengths >>>>> and >>>>> such, we could use another interface or as you argue, a different >>>>> storage >>>>> model with the same interface. I just do not see that the cost of the >>>>> conversion code is really going to show up on anyone's radar. >>>> >>>> What's the issue with abstracting away the model so a fixed-size 'len' >>>> is possible? (let away the argument that this would easily allow an >>>> adaptor to tree) >>> >>> I have a particularly pessimistic perspective because i have already >>> written >>> most of this patch. It is not that i do not want to change that code, >>> it >>> is that i have seen a certain set of mistakes that were made and i do not >>> want to fix them more than once. At the rtl level you can see the >>> transition from only supporting 32 bit ints to supporting 64 bit its to >>> finally supporting two HWIs and that transition code is not pretty. My >>> rtl >>> patch fixes the places where an optimization was only made if the data >>> type >>> was 32 bits or smaller as well as the places where the optimization was >>> made >>> only if the data type is smaller than 64 bits (in addition to fixing all >>> of >>> the places where the code ices or simply gets the wrong answer if it is >>> larger than TImode.) The tree level is only modestly better, I believe >>> only >>> because it is newer. I have not seen any 32 bit only code, but it is >>> littered with transformations that only work for 64 bits. What is that >>> 64 >>> bit only code going to look like in 5 years? >> >> The user interface of wide-int does not depend on whether a storage model >> is abstracted or not. If you take advantage of the storage model by >> making its interface leaner then it will. But I guess converting >> everything >> before settling on the wide-int interface may not have been the wisest >> choice in the end (providing a wide-int that can literally replace >> double-int >> would have got you testing coverage without any change besides >> double-int.[ch] and wide-int.[ch]). > > I think that it was exactly the correct decision. We have made significant > changes to the structure as we have gone along. I have basically done most > of what you have suggested, (the interface is completely functional, i got > rid of the bitsize...) What you are running into is that mike stump, > richard sandiford and myself actually believe that the storage model is a > fundamentally bad idea. Well, that must be because of C++ ignorance. Let's keep that issue aside for now. Be sure I'll come back to it. >>> I want to make it easier to write the general code than to write the code >>> that only solves the problem for the size port that the implementor is >>> currently working on. So I perceive the storage model as a way to keep >>> having to fight this battle forever because it will allow the implementor >>> to >>> make a decision that the optimization only needs to be done for a >>> particular >>> sized integer. >>> >>> However, i get the fact that from your perspective, what you really want >>> is >>> a solution to the data structure problem in tree-vrp. >> >> No, that's just a convenient example. What I really want is a wide-int >> that is less visibly a replacement for CONST_DOUBLE. > > I think that that is unfair. Variable length reps are the standard > technique for doing wide math. I am just proposing using data structures > that are common best practices in the rest of the world and adapting them so > that they match the gcc world better than just hacking in a gmp interface. I don't object to variable-length reps. I question the need of 'bitsize' (gone! yay!) and now 'precision' in class wide_int. I question the need and efficiency of encoding 'len' in RTX CONST_WIDE_INT. >>> My patch for tree vrp >>> scans the entire function to find the largest type used in the function >>> and >>> then does all of the math at 2x that size. But i have to admit that i >>> thought it was weird that you use tree cst as your long term storage. >>> If >>> this were my pass, i would have allocated a 2d array of some type that >>> was >>> as large as the function in 1d and twice as large as the largest int used >>> in >>> the other dimension and not overloaded tree-cst and then had a set of >>> friend >>> functions in double int to get in and out. Of course you do not need >>> friends >>> in double int because the rep is exposed, but in wide-int that is now >>> hidden >>> since it now is purely functional. >>> >>> I just have to believe that there is a better way to do tree-vrp than >>> messing up wide-int for the rest of the compiler. >> >> It's not "messing up", it's making wide-int a generally useful thing and >> not tying it so closely to RTL. > > Again, this is unfair. > >> >>>>> 3) your trick will work at the tree level, but not at the rtl level. >>>>> The >>>>> wide-int code cannot share storage with the CONST_INTs. We tried >>>>> this, >>>>> and there are a million bugs that would have to be fixed to make it >>>>> work. >>>>> It could have worked if CONST_INTs had carried a mode around, but since >>>>> they >>>>> do not, you end up with the same CONST_INT sharing the rep for several >>>>> different types and that just did not work unless you are willing to do >>>>> substantial cleanups. >>>> >>>> I don't believe you. I am only asking for the adaptors to tree and RTL >>>> to >>>> work in an RVALUE-ish way (no modification, as obviously RTL and tree >>>> constants may be shared). I think your claim is because you have that >>>> precision and bitsize members in your wide-int which I believe is a >>>> design red herring. I suppose we should concentrate on addressing that >>>> one first. Thus, let me repeat a few questions on your design to >>>> eventually >>>> let you understand my concerns: >>>> >>>> Given two wide-ints, a and b, what precision will the result of >>>> a + b >>>> have? Consider a having precision 32 and b having precision 64 >>>> on a 32-bit HWI host. >>>> >>>> You define wide-int to have no sign information: >>>> >>>> + The representation does not contain any information about >>>> + signedness of the represented value, so it can be used to represent >>>> + both signed and unsigned numbers. For operations where the results >>>> + depend on signedness (division, comparisons), the signedness must >>>> + be specified separately. For operations where the signness >>>> + matters, one of the operands to the operation specifies either >>>> + wide_int::SIGNED or wide_int::UNSIGNED. >>>> >>>> but the result of operations on mixed precision operands _does_ depend >>>> on the sign, nevertheless most operations do not get a signedness >>>> argument. >>>> Nor would that help, because it depends on the signedness of the >>>> individual >>>> operands! >>>> >>>> double-int get's around this by having a common "precision" to which >>>> all smaller precision constants have to be sign-/zero-extended. So >>>> does CONST_INT and CONST_DOUBLE. >>>> >>>> Note that even with same precision you have introduced the same problem >>>> with the variable len. >>>> >>>> My proposal is simple - all wide-ints are signed! wide-int is basically >>>> an arbitrary precision signed integer format. The sign is encoded in >>>> the most significant bit of the last active HWI of the representation >>>> (val[len - 1] & (1 << HOST_BITS_PER_WIDE_INT - 1)). All values >>>> with less precision than len * HOST_BITS_PER_WIDE_INT are >>>> properly sign-/zero-extended to precision len * HOST_BITS_PER_WIDE_INT. >>>> >>>> This let's you define mixed len operations by implicitely >>>> sign-/zero-extending >>>> the operands to whatever len is required for the operation. >>>> >>>> Furthermore it allows you to get rid of the precision member (and >>>> bitsize anyway). >>>> Conversion from tree / RTL requires information on the signedness of the >>>> input (trivial for tree, for RTL all constants are signed - well, >>>> sign-extended). >>>> Whenever you want to transfer the wide-int to tree / RTL you have to >>>> sign-/zero-extend according to the desired precision. If you need sth >>>> else >>>> than arbitrary precision arithmetic you have to explicitely truncate / >>>> extend >>>> at suitable places - with overflow checking being trivial here. For >>>> optimization >>>> purposes selected operations may benefit from a combined implementation >>>> receiving a target precision and signedness. Whatever extra meta-data >>>> RTL requires does not belong in wide-int but in the RTX. Trivially >>>> a mode comes to my mind (on tree we have a type), and trivially >>>> each RTX has a mode. And each mode has a precision and bitsize. >>>> It lacks a sign, so all RTL integer constants are sign-extended for >>>> encoding efficiency purposes. mixed-mode operations will not >>>> occur (mixed len operations will), mixed-mode ops are exclusively >>>> sign-/zero-extensions and truncations. >>>> >>>> Representation of (unsigned HOST_WIDE_INT)-1 would necessarily >>>> be { 0, (unsigned HOST_WIDE_INT)-1 }, representation of -1 in any >>>> precision would be { -1 }. >>>> >>>> That was my proposal. Now, can you please properly specify yours? >> >> And you chose to not answer that fundamental question of how your >> wide-int is _supposed_ to work? Ok, maybe I shouldn't have distracted >> you with the bits before this. > > sorry, i missed this question by accident. > > we did not do infinite precision by design. We looked at the set of > programming languages that gcc either compiles or might compile and we > looked at the set of machines that gcc targets and neither of these two sets > of entities define their operations in terms of infinite precision math. > They always do math within a particular precision. Scripting languages do > use infinite precision but gcc does not compile any of them. So unless you > are going to restrict your set of operations to those that satisfy the > properties of a ring, it is generally not strictly safe to do your math in > infinite precision. > > we do all of the math in the precision defined by the types or modes that > are passed in. > > constants are defined to be the size of the precision unless they can be > compressed. The len field tells how many words are actually needed to > exactly represent the constant if (len-1)*HOST_WIDE_BITS_PER_WIDE_INT is > less than the precision. The decompression process is to add a number of > words to the high order side of the number. These words must contain a > zero if the highest represented bit is 0 and -1 if the highest represented > bit is 1. So the compressed representation is signed. Please document that. > i.e. this looks a lot like sign extension, but the numbers them selves are > not inherently signed or unsigned as in your representation. It is up to > the operators to imply the signess. So the unsigned multiply operation is > different than the signed multiply operation. But these are applied to > the bits without an interpretation of their sign. Which means that the inconsistency of ops getting a precision argument vs. arbitrarily taking the precision of the first operand is a design bug. Note that with your encoding scheme as far as I understand the "sign" of { -1 } depends on the precision. If the precision is less than that of HWI then { -1 } is signed, if it is exactly the precision of HWI the sign is unspecified, it could be -1U or -1 (as far as I understand it is not an invalid encoding of -1U?), if it has bigger precision than that of HWI then the sign is unspecified (it could be -1U or -1 in larger precision)? "Unspecified sign" means that you cannot combine two operands with different 'len' / 'precision' without knowing the sign of the operands you need to extend. Note that knowing the sign of the operation is not enough. So I believe that for correctness you need to assert at least gcc_assert that operands have the same precision - as you assume here for example: +wide_int +wide_int::operator + (const wide_int &op1) const +{ + wide_int result; + + if (precision <= HOST_BITS_PER_WIDE_INT) + { + result.len = 1; + result.bitsize = bitsize; + result.precision = precision; + result.val[0] = val[0] + op1.val[0]; + if (precision < HOST_BITS_PER_WIDE_INT) + result.val[0] = sext_hwi (result.val[0], precision); + } which is clearly wrong if op1.precision != precision. It also explains why wide_int::one needs a precision ... (I hope you got rid of the overloads) I'm not sure to what extent you got rid of the precision/bitsize taking functions like + inline wide_int lshift (unsigned int y, ShiftOp z, unsigned int bitsize, + unsigned int precision) const; + wide_int popcount (unsigned int bitsize, unsigned int prec) const; already. They don't make sense as *this already has a precision. Btw, my proposal would be to get rid of 'precision', as 'precision' is not needed to interpret the encoding as I would define it (the MSB would _always_ be a sign-bit) and as you say the operation has a sign. Truncating to a desired target precision can be done after the fact or as part of the operation. You already have to deal with extension as even with equal precision operands can have a different len. I'd like to see the current state of wide-int somewhere btw., given you've done significant changes since the last post. Thanks, Richard. > >> Richard. >> >>>> Thanks, >>>> Richard. >>>> >>>>> On 04/02/2013 11:04 AM, Richard Biener wrote: >>>>>> >>>>>> On Wed, Feb 27, 2013 at 2:59 AM, Kenneth Zadeck >>>>>> <zad...@naturalbridge.com> wrote: >>>>>>> >>>>>>> This patch contains a large number of the changes requested by Richi. >>>>>>> It >>>>>>> does not contain any of the changes that he requested to abstract the >>>>>>> storage layer. That suggestion appears to be quite unworkable. >>>>>> >>>>>> I of course took this claim as a challenge ... with the following >>>>>> result. >>>>>> It is >>>>>> of course quite workable ;) >>>>>> >>>>>> The attached patch implements the core wide-int class and three >>>>>> storage >>>>>> models (fixed size for things like plain HWI and double-int, variable >>>>>> size >>>>>> similar to how your wide-int works and an adaptor for the double-int >>>>>> as >>>>>> contained in trees). With that you can now do >>>>>> >>>>>> HOST_WIDE_INT >>>>>> wi_test (tree x) >>>>>> { >>>>>> // template argument deduction doesn't do the magic we want it to >>>>>> do >>>>>> // to make this kind of implicit conversions work >>>>>> // overload resolution considers this kind of conversions so we >>>>>> // need some magic that combines both ... but seeding the >>>>>> overload >>>>>> // set with some instantiations doesn't seem to be possible :/ >>>>>> // wide_int<> w = x + 1; >>>>>> wide_int<> w; >>>>>> w += x; >>>>>> w += 1; >>>>>> // template argument deduction doesn't deduce the return value >>>>>> type, >>>>>> // not considering the template default argument either ... >>>>>> // w = wi (x) + 1; >>>>>> // we could support this by providing rvalue-to-lvalue promotion >>>>>> // via a traits class? >>>>>> // otoh it would lead to sub-optimal code anyway so we should >>>>>> // make the result available as reference parameter and only >>>>>> support >>>>>> // wide_int <> res; add (res, x, 1); ? >>>>>> w = wi (x).operator+<wide_int<> >(1); >>>>>> wide_int<>::add(w, x, 1); >>>>>> return w.to_hwi (); >>>>>> } >>>>>> >>>>>> we are somewhat limited with C++ unless we want to get really fancy. >>>>>> Eventually providing operator+ just doesn't make much sense for >>>>>> generic wide-int combinations (though then the issue is its operands >>>>>> are no longer commutative which I think is the case with your wide-int >>>>>> or double-int as well - they don't suport "1 + wide_int" for obvious >>>>>> reasons). >>>>>> >>>>>> So there are implementation design choices left undecided. >>>>>> >>>>>> Oh, and the operation implementations are crap (they compute >>>>>> nonsense). >>>>>> >>>>>> But you should get the idea. >>>>>> >>>>>> Richard. >>>>> >>>>> >