On Fri, Aug 23, 2019 at 1:35 PM Ilya Leoshkevich <i...@linux.ibm.com> wrote: > > > Am 23.08.2019 um 13:24 schrieb Richard Biener <richard.guent...@gmail.com>: > > > > On Fri, Aug 23, 2019 at 12:43 PM Richard Sandiford > > <richard.sandif...@arm.com> wrote: > >> > >> Ilya Leoshkevich <i...@linux.ibm.com> writes: > >>> @@ -3819,6 +3820,82 @@ can_compare_p (enum rtx_code code, machine_mode > >>> mode, > >>> return 0; > >>> } > >>> > >>> +/* can_vector_compare_p presents fake rtx binary operations to the the > >>> back-end > >>> + in order to determine its capabilities. In order to avoid creating > >>> fake > >>> + operations on each call, values from previous calls are cached in a > >>> global > >>> + cached_binops hash_table. It contains rtxes, which can be looked up > >>> using > >>> + binop_keys. */ > >>> + > >>> +struct binop_key { > >>> + enum rtx_code code; /* Operation code. */ > >>> + machine_mode value_mode; /* Result mode. */ > >>> + machine_mode cmp_op_mode; /* Operand mode. */ > >>> +}; > >>> + > >>> +struct binop_hasher : pointer_hash_mark<rtx>, ggc_cache_remove<rtx> { > >>> + typedef rtx value_type; > >>> + typedef binop_key compare_type; > >>> + > >>> + static hashval_t > >>> + hash (enum rtx_code code, machine_mode value_mode, machine_mode > >>> cmp_op_mode) > >>> + { > >>> + inchash::hash hstate (0); > >>> + hstate.add_int (code); > >>> + hstate.add_int (value_mode); > >>> + hstate.add_int (cmp_op_mode); > >>> + return hstate.end (); > >>> + } > >>> + > >>> + static hashval_t > >>> + hash (const rtx &ref) > >>> + { > >>> + return hash (GET_CODE (ref), GET_MODE (ref), GET_MODE (XEXP (ref, > >>> 0))); > >>> + } > >>> + > >>> + static bool > >>> + equal (const rtx &ref1, const binop_key &ref2) > >>> + { > >>> + return (GET_CODE (ref1) == ref2.code) > >>> + && (GET_MODE (ref1) == ref2.value_mode) > >>> + && (GET_MODE (XEXP (ref1, 0)) == ref2.cmp_op_mode); > >>> + } > >>> +}; > >>> + > >>> +static GTY ((cache)) hash_table<binop_hasher> *cached_binops; > >>> + > >>> +static rtx > >>> +get_cached_binop (enum rtx_code code, machine_mode value_mode, > >>> + machine_mode cmp_op_mode) > >>> +{ > >>> + if (!cached_binops) > >>> + cached_binops = hash_table<binop_hasher>::create_ggc (1024); > >>> + binop_key key = { code, value_mode, cmp_op_mode }; > >>> + hashval_t hash = binop_hasher::hash (code, value_mode, cmp_op_mode); > >>> + rtx *slot = cached_binops->find_slot_with_hash (key, hash, INSERT); > >>> + if (!*slot) > >>> + *slot = gen_rtx_fmt_ee (code, value_mode, gen_reg_rtx (cmp_op_mode), > >>> + gen_reg_rtx (cmp_op_mode)); > >>> + return *slot; > >>> +} > >> > >> Sorry, I didn't mean anything this complicated. I just meant that > >> we should have a single cached rtx that we can change via PUT_CODE and > >> PUT_MODE_RAW for each new query, rather than allocating a new rtx each > >> time. > >> > >> Something like: > >> > >> static GTY ((cache)) rtx cached_binop; > >> > >> rtx > >> get_cached_binop (machine_mode mode, rtx_code code, machine_mode op_mode) > >> { > >> if (cached_binop) > >> { > >> PUT_CODE (cached_binop, code); > >> PUT_MODE_RAW (cached_binop, mode); > >> PUT_MODE_RAW (XEXP (cached_binop, 0), op_mode); > >> PUT_MODE_RAW (XEXP (cached_binop, 1), op_mode); > >> } > >> else > >> { > >> rtx reg1 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 1); > >> rtx reg2 = gen_raw_REG (op_mode, LAST_VIRTUAL_REGISTER + 2); > >> cached_binop = gen_rtx_fmt_ee (code, mode, reg1, reg2); > >> } > >> return cached_binop; > >> } > > > > Hmm, maybe we need auto_rtx (code) that constructs such > > RTX on the stack instead of wasting a GC root (and causing > > issues for future threading of GCC ;)). > > Do you mean something like this? > > union { > char raw[rtx_code_size[code]]; > rtx rtx; > } binop; > > Does this exist already (git grep auto.*rtx / rtx.*auto doesn't show > anything useful), or should I implement this?
It doesn't exist AFAIK, I thought about using alloca like rtx tem; rtx_alloca (tem, PLUS); and due to using alloca rtx_alloca has to be a macro like #define rtx_alloca(r, code) r = (rtx)alloca (RTX_CODE_SIZE(code)); memset (r, 0, RTX_HDR_SIZE); PUT_CODE (r, code); maybe C++ can help making this prettier but of course since we use alloca we have to avoid opening new scopes. I guess templates like with auto_vec doesn't work unless we can make RTX_CODE_SIZE constant-evaluated. Richard.