"juzhe.zh...@rivai.ai" <juzhe.zh...@rivai.ai> writes: >>> I don't think this is a property of decrementing IVs. IIUC it's really >>> a property of rgl->factor == 1 && factor == 1, where factor would need >>> to be passed in by the caller. Because of that, it should probably be >>> a separate patch. > Is it right that I just post this part code as a seperate patch then merge it?
No, not in its current form. Like I say, the test should be based on factors rather than TYPE_VECTOR_SUBPARTS. But a fix for this problem should come before the changes to IVs. >>> That is, current LOAD_LEN targets have two properties (IIRC): >>> (1) all vectors used in a given piece of vector code have the same byte size >>> (2) lengths are measured in bytes rather than elements >>> For all cases, including SVE, the number of controls needed for a scalar >>> statement is equal to the number of vectors needed for that scalar >>> statement. >>> Because of (1), on current LOADL_LEN targets, the number of controls >>> needed for a scalar statement is also proportional to the total number >>> of bytes occupied by the vectors generated for that scalar statement. >>> And because of (2), the total number of bytes is the only thing that >>> matters, so all users of a particular control can use the same control >>> value. >>> E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same >>> control (with no adjustment). 2xV16QI means 32 elements, while 2xV8HI >>> means 16 elements. V16QI's nscalars_per_iter would therefore be double >>> V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things >>> even out. >>> The code structurally supports targets that count in elements rather >>> than bytes, so that factor==1 for all element types. See the >>> "rgl->factor == 1 && factor == 1" case in: > >> if (rgl->max_nscalars_per_iter < nscalars_per_iter) >> { >> /* > For now, we only support cases in which all loads and stores fall back to > VnQI or none do. */ > >> gcc_assert (!rgl->max_nscalars_per_iter>> || > (rgl->factor == 1 && factor == 1) > || (rgl->max_nscalars_per_iter * rgl->factor >>> == nscalars_per_iter * factor)); > >> rgl->max_nscalars_per_iter = nscalars_per_iter; >> rgl->type = > vectype; >> rgl->factor = factor; >> }>> But it hasn't been tested, > since no current target uses it. >>> I think the above part of the patch shows that the current "factor is >>> always 1" path is in fact broken, and the patch is a correctness fix on >>> targets that measure in elements rather than bytes. >>> So I think the above part of the patch should go in ahead of the IV changes. >>> But the test should be based on factor rather than TYPE_VECTOR_SUBPARTS. > Since the length control measured by bytes instead of bytes is not > appropriate for RVV.You mean I can't support RVV auto-vectorization in > upstream GCC middle-end and I can only support it in my downstream, is > that right? No. I haven't said in this or previous reviews that something cannot be supported in upstream GCC. I'm saying that the code in theory supports counting in bytes *or* counting in elements. But only the first one has actually been tested. And so, perhaps not surprisingly, the support for counting elements needs a fix. The fix in your patch looks like it's on the right lines, but it should be based on factor rather than TYPE_VECTOR_SUBPARTS. See get_len_load_store_mode for how this selection happens: (1) IFN_LOAD_LEN itself always counts in elements rather than bytes. (2) If a target has instructions that count in elements, it should define load_len patterns for all vector modes that it supports. (3) If a target has instructions that count in bytes, it should define load_len patterns only for byte modes. The vectoriser will then use byte loads for all vector types (even things like V8HI). For (2), the loop controls will always have a factor of 1. For (3), the loop controls will have a factor equal to the element size in bytes. See: machine_mode vmode; if (get_len_load_store_mode (vecmode, is_load).exists (&vmode)) { nvectors = group_memory_nvectors (group_size * vf, nunits); vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); unsigned factor = (vecmode == vmode) ? 1 : GET_MODE_UNIT_SIZE (vecmode); vect_record_loop_len (loop_vinfo, lens, nvectors, vectype, factor); using_partial_vectors_p = true; } This part should work correctly for RVV and any future targets that measure in elements rather than bytes. The problem is here: tree final_len = vect_get_loop_len (loop_vinfo, loop_lens, vec_num * ncopies, vec_num * j + i); tree ptr = build_int_cst (ref_type, align * BITS_PER_UNIT); machine_mode vmode = TYPE_MODE (vectype); opt_machine_mode new_ovmode = get_len_load_store_mode (vmode, true); machine_mode new_vmode = new_ovmode.require (); tree qi_type = unsigned_intQI_type_node; This should be rearranged so that: - new_vmode is calculated before final_len - a "factor" is calculated in the same way as the above code - this factor is passed to vect_get_loop_len - vect_get_loop_len then uses this information to do a division. Thanks, Richard