[PATCH] PR57518, RA generated redundent code

2013-06-12 Thread Wei Mi
Hi, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518 pr57518 happened because update_equiv_regs in IRA marked a reg equivalent with a mem, lowered its mem_cost in scan_one_insn, set NO_REGS to its rclass, but didn't consider the reg was used in paradoxical subreg which prevented the reg from bei

Re: [PATCH] PR57518, RA generated redundent code

2013-06-12 Thread Wei Mi
The testcase is attached. Thanks, Wei. On Wed, Jun 12, 2013 at 5:03 PM, H.J. Lu wrote: > On Wed, Jun 12, 2013 at 2:44 PM, Wei Mi wrote: >> Hi, >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518 >> >> pr57518 happened because update_equiv_regs in IRA mark

Re: [PATCH] PR57518, RA generated redundent code

2013-06-18 Thread Wei Mi
Ping. On Wed, Jun 12, 2013 at 2:44 PM, Wei Mi wrote: > Hi, > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518 > > pr57518 happened because update_equiv_regs in IRA marked a reg > equivalent with a mem, lowered its mem_cost in scan_one_insn, set > NO_REGS to its rclass, bu

Re: [PATCH] PR57518, RA generated redundent code

2013-06-19 Thread Wei Mi
Yes, I think so. Regards, Wei. On Wed, Jun 19, 2013 at 2:00 PM, Xinliang David Li wrote: > Should the patch be ported to in 48 branch? > > thanks, > > David > > On Wed, Jun 19, 2013 at 11:46 AM, Vladimir Makarov > wrote: >> On 13-06-19 1:23 AM, Wei Mi wrote: >

[PATCH] PR57878, Incorrect code: live register clobbered in split2

2013-07-15 Thread Wei Mi
4.8.1. bootstrap and regression test are ok on x86_64-linux-gnu. Is it ok for trunk and 4.8 branch? Thanks, Wei. 2013-07-15 Wei Mi PR rtl-optimization/57878 * lra-assigns.c (reload_pseudo_compare_func): Switch the priority of bigger mode and tfreq. 2013-07-15 Wei Mi

Re: [PATCH] PR57878, Incorrect code: live register clobbered in split2

2013-07-19 Thread Wei Mi
Thank you! backported as r201068 in gcc-4_8-branch. Thanks, Wei. On Thu, Jul 18, 2013 at 10:05 AM, Vladimir Makarov wrote: > On 07/15/2013 02:26 PM, Wei Mi wrote: >> Hi, >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57878 >> >> The bug occurs because tf

[Google] X86_TUNE_USE_VECTOR_CONVERTS adjustment

2013-08-15 Thread Wei Mi
Turning off X86_TUNE_USE_VECTOR_CONVERTS uses cvtss2sd instead of unpcklps+cvtps2pd, which is better for some recent intel micro arch such as westmere and sandybridge. So turn it off for m_GENERIC and m_CORE_ALL. regression and bootstrap ok. ok for 4.8 branch? Index: config/i386/i386.c ==

Re: extend fwprop optimization

2013-02-25 Thread Wei Mi
Hi, On Mon, Feb 25, 2013 at 4:08 PM, Steven Bosscher wrote: > On Tue, Feb 26, 2013 at 12:32 AM, Wei Mi wrote: >> We also take insn splitting and peephole into consideration, >> .i.e, the cost of the change is the cost after insn splitting and >> peephole which may be applie

Re: extend fwprop optimization

2013-02-27 Thread Wei Mi
On Tue, Feb 26, 2013 at 2:59 AM, Steven Bosscher wrote: > On Tue, Feb 26, 2013 at 2:12 AM, Wei Mi wrote: >> But it is not a good transformation unless we know insn split will >> change a << (b & 63) to a << b; Here we want to see what the rtl looks >> lik

Re: extend fwprop optimization

2013-02-27 Thread Wei Mi
cation support for our case in simplify_binary_operation. I will send out a more official patch about fwprop extension soon. Then it may be easier to talk about its rationality. Thanks, Wei. On Wed, Feb 27, 2013 at 1:21 PM, Steven Bosscher wrote: > On Wed, Feb 27, 2013 at 7:37 PM, Wei Mi wrote: &

Re: extend fwprop optimization

2013-03-12 Thread Wei Mi
Thanks for the helpful comments! I have some replies inlined. Regards, Wei. On Mon, Mar 11, 2013 at 12:52 PM, Steven Bosscher wrote: > On Mon, Mar 11, 2013 at 6:52 AM, Wei Mi wrote: >> This is the fwprop extension patch which is put in order. Regression >> test and bootstrap pa

Re: extend fwprop optimization

2013-03-23 Thread Wei Mi
otstrap on x86_64-unknown-linux-gnu. Thanks, Wei. On Sun, Mar 17, 2013 at 12:15 AM, Wei Mi wrote: > Hi, > > On Sat, Mar 16, 2013 at 3:48 PM, Steven Bosscher > wrote: >> On Tue, Mar 12, 2013 at 8:18 AM, Wei Mi wrote: >>> For the motivational case, I need insn spli

Re: extend fwprop optimization

2013-03-25 Thread Wei Mi
On Mon, Mar 25, 2013 at 2:35 AM, Richard Biener wrote: > On Sun, Mar 24, 2013 at 5:18 AM, Wei Mi wrote: >> This is the patch to add the shift truncation in >> simplify_binary_operation_1. I add a new hook >> TARGET_SHIFT_COUNT_TRUNCATED which uses enum rtx_code to decide

Re: extend fwprop optimization

2013-03-25 Thread Wei Mi
> I am trying to figure out a way not to lose the opportunity when shift > truncation is not combined in a bit test pattern. Can we keep the > explicit truncation in RTL, but generate truncation code in assembly? > Then only shift truncation which not combined in a bit test > pattershift truncation

Re: extend fwprop optimization

2013-03-27 Thread Wei Mi
I am not familiar how to use define_subst, so I write a patch that changes define_insn_and_split to define_insn. bootstrapped and regression tested on x86_64-unknown-linux-gnu. A question is: after that change, Is there anyway I can make targetm.rtx_costs() aware about the truncation, .i.e the cos

Re: extend fwprop optimization

2013-04-01 Thread Wei Mi
1.c attached. On Mon, Apr 1, 2013 at 10:43 PM, Wei Mi wrote: > I attached the patch.4 based on r197308. r197308 changes shift-and > type truncation from define_insn_and_split to define_insn. patch.4 > changes ix86_rtx_costs for shift-and type rtx to get the correct cost > for the

Re: extend fwprop optimization

2013-04-03 Thread Wei Mi
Thanks for helping fixing it. I will take care to verify regression and bootstrap before checkin to release branches next time. Regards, Wei. On Wed, Apr 3, 2013 at 11:08 AM, Jakub Jelinek wrote: > On Thu, Mar 28, 2013 at 04:49:47PM +0100, Uros Bizjak wrote: >> 2013-03-2

[PATCH, PR60738] More LRA split for regno conflicting with single reg class operand

2014-04-25 Thread Wei Mi
regression test are ok for x86_64-linux-gnu. Is it ok for trunk? Thanks, Wei. ChangeLog: 2014-04-25 Wei Mi PR rtl-optimization/60738 * params.h: New param. * params.def: Ditto. * lra-constraints.c (need_for_split_p): Let more cases to do lra-split

Re: [PATCH, PR60738] More LRA split for regno conflicting with single reg class operand

2014-04-28 Thread Wei Mi
pr 28, 2014 at 12:57 AM, Steven Bosscher wrote: > On Sat, Apr 26, 2014 at 5:35 AM, Wei Mi wrote: >> Index: ira-lives.c >> === >> --- ira-lives.c (revision 209253) >> +++ ira-lives.c (working copy

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-04-30 Thread Wei Mi
Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk? Thanks, Wei. >> I attached the patch which combined your two patches and the fix in >> legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64, >> the code looked fine. Do you think it is ok? >> >> Thanks, >> Wei. > > Either p

Re: [PATCH] Builtins handling in IVOPT

2014-04-30 Thread Wei Mi
Ping. Thanks, Wei. On Tue, Dec 17, 2013 at 11:34 AM, Wei Mi wrote: > Ping. > > Thanks, > Wei. > > On Mon, Dec 9, 2013 at 9:54 PM, Wei Mi wrote: >> Ping. >> >> Thanks, >> wei. >> >> On Sat, Nov 23, 2013 at 10:46 AM, Wei Mi wrote:

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-01 Thread Wei Mi
On Wed, Apr 30, 2014 at 11:44 PM, Uros Bizjak wrote: > On Thu, May 1, 2014 at 6:42 AM, Wei Mi wrote: >> Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk? > > None of these patches have correct ChangeLog entries. Please follow > the rules, outlined in http://gcc.gnu.o

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-07 Thread Wei Mi
TLS_GD and UNSPEC_TLS_LD_BASE. It solves the sched2 and combine problems above, and now the optimization in tls_local_dynamic_32_once works. bootstrapped ok on x86_64-linux-gnu. regression is going on. Is it OK if regression passes? Thanks. Wei. ChangeLog: gcc/ 2014-05-07 Wei Mi * c

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-10 Thread Wei Mi
Here is a patch for the test. It contains two changes: 1. For emutls, there will be an explicit call generated at expand pass, and no stack adjustment is needed. So add /* { dg-require-effective-target tls_native } */ in the test. 2. Replace cfi_def_cfa_offset with insn sequence check. Is it ok?

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-12 Thread Wei Mi
>> Here is a patch for the test. It contains two changes: >> 1. For emutls, there will be an explicit call generated at expand >> pass, and no stack adjustment is needed. So add /* { >> dg-require-effective-target tls_native } */ in the test. >> 2. Replace cfi_def_cfa_offset with insn sequence chec

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-14 Thread Wei Mi
Can I checkin this testcase fix? Thanks, Wei. On Tue, May 13, 2014 at 1:39 AM, Rainer Orth wrote: > Wei Mi writes: > >> Thanks for trying the testcase. rtl scanning will be slightly better >> than assembly scanning. So how about this one? > > This one works

Re: [GCC RFC]A new and simple pass merging paired load store instructions

2014-05-20 Thread Wei Mi
On Tue, May 20, 2014 at 12:13 AM, Bin.Cheng wrote: > On Tue, May 20, 2014 at 1:30 AM, Jeff Law wrote: >> On 05/19/14 00:38, Bin.Cheng wrote: >>> >>> On Sat, May 17, 2014 at 12:32 AM, Jeff Law wrote: On 05/16/14 04:07, Bin.Cheng wrote: But can't you go through movXX

[GOOGLE] Builtins handling in IVOPT

2014-01-22 Thread Wei Mi
This patch handles the mem access builtins in ivopt. The original problem described here: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02648.html Bootstrapped and passed regression test. Performance test ok for plain, fdo and lipo. Ok for google 4.8 branch? Thanks, Wei. --- /usr/local/google/hom

Re: [GOOGLE] Builtins handling in IVOPT

2014-01-22 Thread Wei Mi
Comments added. I create another patch to add the parameter for AVG_LOOP_ITER. Both patches are attached. Thanks, Wei. On Wed, Jan 22, 2014 at 4:42 PM, Xinliang David Li wrote: > On Wed, Jan 22, 2014 at 2:23 PM, Wei Mi wrote: >> This patch handles the mem access builtins in ivopt. The

[google gcc-4_8] Don't use gcov counter related ssa name as induction variables

2014-02-10 Thread Wei Mi
they will not be identified as induction variables. Testing is going on. Is it ok if tests pass? 2014-02-10 Wei Mi * tree-flow-inline.h (make_prof_ssa_name): New. (make_temp_prof_ssa_name): Ditto. * tree.h (struct tree_base): Add PROFILE_GENERATED flag for ssa name.

Re: [google gcc-4_8] Don't use gcov counter related ssa name as induction variables

2014-02-10 Thread Wei Mi
Here is the updated patch, which follow UD chain to determine whether iv.base is defined by __gcovx.xxx[] var. It is a lot simpler than adding a tree bit. regression test and previously failed benchmark in piii mode is ok. Other test is going on. 2014-02-10 Wei Mi * tree-ssa-loop

Re: [google gcc-4_8] Don't use gcov counter related ssa name as induction variables

2014-02-11 Thread Wei Mi
gt; +return false; >> + >> + decl = TREE_OPERAND (rhs, 0); >> + if (TREE_CODE (decl) != VAR_DECL) >> +return false; > > > > Also check TREE_STATIC and DECL_ARTIFICIAL flag. > > > David > Check added. Add DECL_ARTIFICIAL setting in build_va

[PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-21 Thread Wei Mi
regression test pass on x86_64-linux-gnu. ok for trunk and gcc-4_9? Thanks, Wei. ChangeLog: 2014-07-21 Wei Mi PR middle-end/61776 * tree-profile.c (tree_profiling): Fix cfg after the const/pure flags of some funcs are reset after instrumentation. 2014-07-21 Wei Mi

Re: [PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-21 Thread Wei Mi
By the way, the resetting of const/pure flags loop is also executed during profile-useļ¼Œ but if there is no instrumentation, the reset is unnecessary. The flags are kept until pass_ipa_pure_const fixes them. And because of non-instantaneous ssa update, the fixes are reflected on ssa only after ipa

Re: [PATCH, PR61776] verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-27 Thread Wei Mi
he noreturn part in because it has no direct impact on pr60449 and pr61776. I can help Martin to test and post that part as an independent patch later. bootstrap and regression pass on x86_64-linux-gnu. Is it ok? Thanks, Wei. ChangeLog: 2014-07-27 Martin Jambor Wei Mi

Fwd: [GOOGLE, AUTOFDO] Assign different discriminators to calls with the same lineno

2014-08-06 Thread Wei Mi
improvement. Ok for google-4_9 if regression pass? Thanks, Wei. ChangeLog: 2014-08-06 Wei Mi * tree-cfg.c (increase_discriminator_for_locus): It was next_discriminator_for_locus. Add a param "return_next". (next_discriminator_for_locus): Renamed. (assign_disc

[PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-07 Thread Wei Mi
o the comments before ix86_current_function_calls_tls_descriptor, tls call may be optimized away. ix86_compute_frame_layout is the latest place to do the update. bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok for trunk if tests pass? Thanks, Wei. gcc/ChangeLog: 2014-03-07 Wei M

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-07 Thread Wei Mi
Regression test is ok. Thanks, Wei. On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi wrote: > Hi, > > This patch is to fix the problem described here: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066 > > I follow Ian's suggestion and set > ix86_tls_descriptor

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-07 Thread Wei Mi
Yes, x32 has the same problem. It should be tested. Fixed. Thanks, Wei. On Fri, Mar 7, 2014 at 2:06 PM, H.J. Lu wrote: > On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi wrote: >> Hi, >> >> This patch is to fix the problem described here: >> http://gcc.gnu.org/bugzilla/sh

[GOOGLE, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
k. ok for google-4_8 branch? Thanks, Wei. gcc/ChangeLog: 2014-03-07 Wei Mi * config/i386/i386.c (ix86_compute_frame_layout): update preferred_stack_boundary when there is tls expanded call. * config/i386/i386.md: set ix86_tls_descriptor_calls_expanded_in_cfun.

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
> There are several problems with this: > > 1. It doesn't work with C. Ok, I will change the testcase using C. > 2. IA32 has the same issue and isn't fixed. I thought IA32 didn't have the same issue because abi only requires 32 bit alignment for stack starting address. oh, I found the old pat

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
sider the case that tls call is optimized away? Thanks, Wei. On Wed, Mar 12, 2014 at 2:07 PM, H.J. Lu wrote: > On Wed, Mar 12, 2014 at 2:03 PM, Wei Mi wrote: >>> There are several problems with this: >>> >>> 1. It doesn't work with C. >> >> Ok, I w

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
Oh, I see. Thanks! Wei. On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu wrote: > On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi wrote: >> Hi H.J., >> >> Could you show me why you postpone the setting >> ix86_tls_descriptor_calls_expanded_in_cfun until

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
+} + +/* { dg-final { scan-assembler-times ".cfi_def_cfa_offset 16" 2 } } */ On Wed, Mar 12, 2014 at 2:51 PM, Wei Mi wrote: > Oh, I see. Thanks! > > Wei. > > On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu wrote: >> On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi wrote: >>&g

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
On Wed, Mar 12, 2014 at 3:07 PM, H.J. Lu wrote: > On Wed, Mar 12, 2014 at 2:58 PM, Wei Mi wrote: >> This is the updated testcase. > > Does my patch fix the original problem? Yes, it works. I am doing bootstrap and regression test for your patch. Thanks! >

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
>> Does my patch fix the original problem? > > Yes, it works. I am doing bootstrap and regression test for your patch. > Thanks! > The patch passes bootstrap and regression test on x86_64-linux-gnu. Thanks, Wei.

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
template for tls_local_dynamic_base_32/tls_global_dynamic_32, and set ix86_tls_descriptor_calls_expanded_in_cfun to true only after reload complete? Regards, Wei. On Wed, Mar 12, 2014 at 5:33 PM, H.J. Lu wrote: > On Wed, Mar 12, 2014 at 5:28 PM, Wei Mi wrote: >>>> Does my patch f

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread Wei Mi
pr58066-2.patch worked for pr58066.c on ia32/x32/x86_64, but it failed on bootstrap. /usr/local/google/home/wmi/workarea/gcc-r208410-2/build/./gcc/xgcc -B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/./gcc/ -B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/install/x86_64-unknown-

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread Wei Mi
> > My ia32 change generates much worse code: > > [hjl@gnu-6 gcc]$ cat /tmp/c.i > static __thread char ccc, bbb; > > int __cxa_get_globals() > { > return &ccc - &bbb; > } > [hjl@gnu-6 gcc]$ ./xgcc -B./ -S -O2 -fPIC /tmp/c.i > [hjl@gnu-6 gcc]$ cat c.s > .file "c.i" > .section .text.unlikely,"ax",@p

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread Wei Mi
> I tried pr58066-3.patch on the above testcase, the code it generated > seems ok. I think after we change the 32bits pattern in i386.md to be > similar as 64bits pattern, we should change 32bit expand to be similar > as 64bit expand in legitimize_tls_address too? > > Thanks, > Wei. > Sorry, I pas

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread Wei Mi
> Can we combine the last two patches, both adding call explicitly in > rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and > set ix86_tls_descriptor_calls_expanded_in_cfun to true only after > reload complete? > Hi H.J. I attached the patch which combined your two patches and t

[PATCH, x86] merge movsd/movhpd pair in peephole

2014-04-09 Thread Wei Mi
he patch is to add the merging in peephole. bootstrap and regression pass. Is it ok for stage1? Thanks, Wei. gcc/ChangeLog: 2014-04-09 Wei Mi * config/i386/i386.c (get_memref_parts): New function. (adjacent_mem_locations): Ditto. * config/i386/i386-protos.h: Add

Re: [PATCH, x86] merge movsd/movhpd pair in peephole

2014-04-09 Thread Wei Mi
part. It is the same thing we want. Look forward to your patch. Thanks, Wei. On Wed, Apr 9, 2014 at 7:27 PM, Bin.Cheng wrote: > On Thu, Apr 10, 2014 at 8:18 AM, Wei Mi wrote: >> Hi, >> >> For the testcase 1.c >> >> #include >> >> double a[1

Re: [PATCH, x86] merge movsd/movhpd pair in peephole

2014-04-21 Thread Wei Mi
Ping. Thanks, Wei. On Wed, Apr 9, 2014 at 5:18 PM, Wei Mi wrote: > Hi, > > For the testcase 1.c > > #include > > double a[1000]; > > __m128d foo1() { > __m128d res; > res = _mm_load_sd(&a[1]); > res = _mm_loadh_pd(res, &a[2]); > retu

<    1   2