In case of x86_64. This is the code: src_1(bool, bool): cmp dil, sil setb al ret
tgt_1(bool, bool): xor edi, 1 mov eax, edi and eax, esi ret Lets look at the latency of the src_1: cmp: latency of 1: (page 663, table C-17) setb: latency of 2. They don't report setb latency in intel instruction manual. But the closest instruction to this setbe does have latency of 2. But for tgt_1: xor: latency 1. mov: latency 1. (But it seems x86_64 does optimize this instruction and basically it is latency 0 in this case. In Zero-Latency MOV Instructions section they explain it [1].) and: latency 1. So even if you consider setb as latency of 1 it is equal. But if it is latency of 2, it should be a 1 latency win. 1) https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf Best wishes, Navid. ________________________________________ From: Jeff Law <jeffreya...@gmail.com> Sent: Tuesday, November 23, 2021 11:14 To: Navid Rahimi; Navid Rahimi via Gcc-patches Subject: [EXTERNAL] Re: [PATCH][WIP] PR tree-optimization/101808 Boolean comparison simplification On 11/23/2021 11:34 AM, Navid Rahimi via Gcc-patches wrote: > Hi GCC community, > > I wanted you take a quick look at this patch to solve this bug [1]. This is > the code example for the optimization [2] which does include a link to proof > of each different optimization. > > I think it should be possible to use simpler approach than what Andrew has > used here [3]. > > P.S. Tested and verified on Linux x86_64. > > 1) > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D101808&data=04%7C01%7Cnavidrahimi%40microsoft.com%7C29308ca3ff234b91a31608d9aeb57500%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732916650766903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=m%2BIgviZpMo0MT369dcIzefp810oz%2FMU9LC1Mk2FdChk%3D&reserved=0 > 2) > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcompiler-explorer.com%2Fz%2FGc448eE3z&data=04%7C01%7Cnavidrahimi%40microsoft.com%7C29308ca3ff234b91a31608d9aeb57500%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732916650766903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=IwNQZsaEUaB1MKRfL8OWkWYvx0ODq86Obt3eFuxZD40%3D&reserved=0 > 3) > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc.gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D101808%23c1&data=04%7C01%7Cnavidrahimi%40microsoft.com%7C29308ca3ff234b91a31608d9aeb57500%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637732916650766903%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2DB2AlZWbjJ6Fd3Aw%2Bb2Oub3t8d1i7%2FnaUQKUe2m4uQ%3D&reserved=0 Don't those match.pd patterns make things worse? We're taking a single expression evaluation (the conditional) and turning it into two logicals AFAICT. For the !x expression, obviously if x is a constant, then we can compute that at compile time and we're going from a single conditional to a single logical which is probably a win, but that's not the case with this patch AFAICT. jeff