Matt Turner <matts...@gmail.com> writes: > Two shaders that appear in Unigine benchmarks (Heaven and Valley) unpack > three bytes from an integer and convert each into a float: > > float((val >> 16u) & 0xffu) > float((val >> 8u) & 0xffu) > float((val >> 0u) & 0xffu) > > Instead of shifting, masking, and type converting like this: > > shr(8) g15<1>UD g25<8,8,1>UD 0x00000010UD > and(8) g16<1>UD g15<8,8,1>UD 0x000000ffUD > mov(8) g17<1>F g16<8,8,1>UD > > shr(8) g18<1>UD g25<8,8,1>UD 0x00000008UD > and(8) g19<1>UD g18<8,8,1>UD 0x000000ffUD > mov(8) g20<1>F g19<8,8,1>UD > > and(8) g21<1>UD g25<8,8,1>UD 0x000000ffUD > mov(8) g22<1>F g21<8,8,1>UD > > i965 can simply extract a byte and convert to float in a single > instruction: > > mov(8) g17<1>F g25.2<16,4,4>UB > mov(8) g20<1>F g25.1<16,4,4>UB > mov(8) g22<1>F g25.0<16,4,4>UB > > Decreases the number of instructions and cycles in the two programs by: > > #1706: 3728 -> 3363 instructions (-9.79%), 9594 -> 9180 cycles (-4.32%) > #1721: 4027 -> 3662 instructions (-9.06%), 10264 -> 9572 cycles (-6.74%)
I've been handling my byte extracts by special cases in ibfe/ubfe. Would it make sense to recognize these shifts and turn them into bfes, then special case ibfe/ubfe in the driver to emit simpler things when possible? extract_byte is one case of extraction I need to do all the time, and the other is aligned *signed* 16-bit fields. If we do want extract_byte, then I'll just need vc4_program.c's translation updated at the same time as this lands. I think it should be a pretty obvious variant from your i965 code, combined with ntq_emit_ubfe.
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev