Hi,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518
pr57518 happened because update_equiv_regs in IRA marked a reg
equivalent with a mem, lowered its mem_cost in scan_one_insn, set
NO_REGS to its rclass, but didn't consider the reg was used in
paradoxical subreg which prevented the reg from bei
The testcase is attached.
Thanks,
Wei.
On Wed, Jun 12, 2013 at 5:03 PM, H.J. Lu wrote:
> On Wed, Jun 12, 2013 at 2:44 PM, Wei Mi wrote:
>> Hi,
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518
>>
>> pr57518 happened because update_equiv_regs in IRA mark
Ping.
On Wed, Jun 12, 2013 at 2:44 PM, Wei Mi wrote:
> Hi,
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518
>
> pr57518 happened because update_equiv_regs in IRA marked a reg
> equivalent with a mem, lowered its mem_cost in scan_one_insn, set
> NO_REGS to its rclass, bu
Yes, I think so.
Regards,
Wei.
On Wed, Jun 19, 2013 at 2:00 PM, Xinliang David Li wrote:
> Should the patch be ported to in 48 branch?
>
> thanks,
>
> David
>
> On Wed, Jun 19, 2013 at 11:46 AM, Vladimir Makarov
> wrote:
>> On 13-06-19 1:23 AM, Wei Mi wrote:
>
4.8.1.
bootstrap and regression test are ok on x86_64-linux-gnu. Is it ok for
trunk and 4.8 branch?
Thanks,
Wei.
2013-07-15 Wei Mi
PR rtl-optimization/57878
* lra-assigns.c (reload_pseudo_compare_func): Switch the priority of
bigger mode and tfreq.
2013-07-15 Wei Mi
Thank you! backported as r201068 in gcc-4_8-branch.
Thanks,
Wei.
On Thu, Jul 18, 2013 at 10:05 AM, Vladimir Makarov wrote:
> On 07/15/2013 02:26 PM, Wei Mi wrote:
>> Hi,
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57878
>>
>> The bug occurs because tf
Turning off X86_TUNE_USE_VECTOR_CONVERTS uses cvtss2sd instead of
unpcklps+cvtps2pd, which is better for some recent intel micro arch
such as westmere and sandybridge. So turn it off for m_GENERIC and
m_CORE_ALL.
regression and bootstrap ok. ok for 4.8 branch?
Index: config/i386/i386.c
==
Hi,
On Mon, Feb 25, 2013 at 4:08 PM, Steven Bosscher wrote:
> On Tue, Feb 26, 2013 at 12:32 AM, Wei Mi wrote:
>> We also take insn splitting and peephole into consideration,
>> .i.e, the cost of the change is the cost after insn splitting and
>> peephole which may be applie
On Tue, Feb 26, 2013 at 2:59 AM, Steven Bosscher wrote:
> On Tue, Feb 26, 2013 at 2:12 AM, Wei Mi wrote:
>> But it is not a good transformation unless we know insn split will
>> change a << (b & 63) to a << b; Here we want to see what the rtl looks
>> lik
cation support for
our case in simplify_binary_operation.
I will send out a more official patch about fwprop extension soon.
Then it may be easier to talk about its rationality.
Thanks,
Wei.
On Wed, Feb 27, 2013 at 1:21 PM, Steven Bosscher wrote:
> On Wed, Feb 27, 2013 at 7:37 PM, Wei Mi wrote:
&
Thanks for the helpful comments! I have some replies inlined.
Regards,
Wei.
On Mon, Mar 11, 2013 at 12:52 PM, Steven Bosscher wrote:
> On Mon, Mar 11, 2013 at 6:52 AM, Wei Mi wrote:
>> This is the fwprop extension patch which is put in order. Regression
>> test and bootstrap pa
otstrap on x86_64-unknown-linux-gnu.
Thanks,
Wei.
On Sun, Mar 17, 2013 at 12:15 AM, Wei Mi wrote:
> Hi,
>
> On Sat, Mar 16, 2013 at 3:48 PM, Steven Bosscher
> wrote:
>> On Tue, Mar 12, 2013 at 8:18 AM, Wei Mi wrote:
>>> For the motivational case, I need insn spli
On Mon, Mar 25, 2013 at 2:35 AM, Richard Biener
wrote:
> On Sun, Mar 24, 2013 at 5:18 AM, Wei Mi wrote:
>> This is the patch to add the shift truncation in
>> simplify_binary_operation_1. I add a new hook
>> TARGET_SHIFT_COUNT_TRUNCATED which uses enum rtx_code to decide
> I am trying to figure out a way not to lose the opportunity when shift
> truncation is not combined in a bit test pattern. Can we keep the
> explicit truncation in RTL, but generate truncation code in assembly?
> Then only shift truncation which not combined in a bit test
> pattershift truncation
I am not familiar how to use define_subst, so I write a patch that
changes define_insn_and_split to define_insn. bootstrapped and
regression tested on x86_64-unknown-linux-gnu.
A question is: after that change, Is there anyway I can make
targetm.rtx_costs() aware about the truncation, .i.e the cos
1.c attached.
On Mon, Apr 1, 2013 at 10:43 PM, Wei Mi wrote:
> I attached the patch.4 based on r197308. r197308 changes shift-and
> type truncation from define_insn_and_split to define_insn. patch.4
> changes ix86_rtx_costs for shift-and type rtx to get the correct cost
> for the
Thanks for helping fixing it. I will take care to verify regression
and bootstrap before checkin to release branches next time.
Regards,
Wei.
On Wed, Apr 3, 2013 at 11:08 AM, Jakub Jelinek wrote:
> On Thu, Mar 28, 2013 at 04:49:47PM +0100, Uros Bizjak wrote:
>> 2013-03-2
regression test are ok for x86_64-linux-gnu. Is it ok for trunk?
Thanks,
Wei.
ChangeLog:
2014-04-25 Wei Mi
PR rtl-optimization/60738
* params.h: New param.
* params.def: Ditto.
* lra-constraints.c (need_for_split_p): Let more
cases to do lra-split
pr 28, 2014 at 12:57 AM, Steven Bosscher wrote:
> On Sat, Apr 26, 2014 at 5:35 AM, Wei Mi wrote:
>> Index: ira-lives.c
>> ===
>> --- ira-lives.c (revision 209253)
>> +++ ira-lives.c (working copy
Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk?
Thanks,
Wei.
>> I attached the patch which combined your two patches and the fix in
>> legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64,
>> the code looked fine. Do you think it is ok?
>>
>> Thanks,
>> Wei.
>
> Either p
Ping.
Thanks,
Wei.
On Tue, Dec 17, 2013 at 11:34 AM, Wei Mi wrote:
> Ping.
>
> Thanks,
> Wei.
>
> On Mon, Dec 9, 2013 at 9:54 PM, Wei Mi wrote:
>> Ping.
>>
>> Thanks,
>> wei.
>>
>> On Sat, Nov 23, 2013 at 10:46 AM, Wei Mi wrote:
On Wed, Apr 30, 2014 at 11:44 PM, Uros Bizjak wrote:
> On Thu, May 1, 2014 at 6:42 AM, Wei Mi wrote:
>> Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk?
>
> None of these patches have correct ChangeLog entries. Please follow
> the rules, outlined in http://gcc.gnu.o
TLS_GD and
UNSPEC_TLS_LD_BASE. It solves the sched2 and combine problems above,
and now the optimization in tls_local_dynamic_32_once works.
bootstrapped ok on x86_64-linux-gnu. regression is going on. Is it OK
if regression passes?
Thanks.
Wei.
ChangeLog:
gcc/
2014-05-07 Wei Mi
* c
Here is a patch for the test. It contains two changes:
1. For emutls, there will be an explicit call generated at expand
pass, and no stack adjustment is needed. So add /* {
dg-require-effective-target tls_native } */ in the test.
2. Replace cfi_def_cfa_offset with insn sequence check.
Is it ok?
>> Here is a patch for the test. It contains two changes:
>> 1. For emutls, there will be an explicit call generated at expand
>> pass, and no stack adjustment is needed. So add /* {
>> dg-require-effective-target tls_native } */ in the test.
>> 2. Replace cfi_def_cfa_offset with insn sequence chec
Can I checkin this testcase fix?
Thanks,
Wei.
On Tue, May 13, 2014 at 1:39 AM, Rainer Orth
wrote:
> Wei Mi writes:
>
>> Thanks for trying the testcase. rtl scanning will be slightly better
>> than assembly scanning. So how about this one?
>
> This one works
On Tue, May 20, 2014 at 12:13 AM, Bin.Cheng wrote:
> On Tue, May 20, 2014 at 1:30 AM, Jeff Law wrote:
>> On 05/19/14 00:38, Bin.Cheng wrote:
>>>
>>> On Sat, May 17, 2014 at 12:32 AM, Jeff Law wrote:
On 05/16/14 04:07, Bin.Cheng wrote:
But can't you go through movXX
This patch handles the mem access builtins in ivopt. The original
problem described here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02648.html
Bootstrapped and passed regression test. Performance test ok for
plain, fdo and lipo. Ok for google 4.8 branch?
Thanks,
Wei.
--- /usr/local/google/hom
Comments added. I create another patch to add the parameter for AVG_LOOP_ITER.
Both patches are attached.
Thanks,
Wei.
On Wed, Jan 22, 2014 at 4:42 PM, Xinliang David Li wrote:
> On Wed, Jan 22, 2014 at 2:23 PM, Wei Mi wrote:
>> This patch handles the mem access builtins in ivopt. The
they will not be identified as induction variables.
Testing is going on. Is it ok if tests pass?
2014-02-10 Wei Mi
* tree-flow-inline.h (make_prof_ssa_name): New.
(make_temp_prof_ssa_name): Ditto.
* tree.h (struct tree_base): Add PROFILE_GENERATED flag for ssa name.
Here is the updated patch, which follow UD chain to determine whether
iv.base is defined by __gcovx.xxx[] var. It is a lot simpler than
adding a tree bit.
regression test and previously failed benchmark in piii mode is ok.
Other test is going on.
2014-02-10 Wei Mi
* tree-ssa-loop
gt; +return false;
>> +
>> + decl = TREE_OPERAND (rhs, 0);
>> + if (TREE_CODE (decl) != VAR_DECL)
>> +return false;
>
>
>
> Also check TREE_STATIC and DECL_ARTIFICIAL flag.
>
>
> David
>
Check added. Add DECL_ARTIFICIAL setting in build_va
regression test pass on x86_64-linux-gnu. ok for trunk
and gcc-4_9?
Thanks,
Wei.
ChangeLog:
2014-07-21 Wei Mi
PR middle-end/61776
* tree-profile.c (tree_profiling): Fix cfg after the const/pure
flags of some funcs are reset after instrumentation.
2014-07-21 Wei Mi
By the way, the resetting of const/pure flags loop is also executed
during profile-useļ¼ but if there is no instrumentation, the reset is
unnecessary. The flags are kept until pass_ipa_pure_const fixes them.
And because of non-instantaneous ssa update, the fixes are reflected
on ssa only after ipa
he noreturn part in
because it has no direct impact on pr60449 and pr61776. I can help
Martin to test and post that part as an independent patch later.
bootstrap and regression pass on x86_64-linux-gnu. Is it ok?
Thanks,
Wei.
ChangeLog:
2014-07-27 Martin Jambor
Wei Mi
improvement. Ok for google-4_9 if regression pass?
Thanks,
Wei.
ChangeLog:
2014-08-06 Wei Mi
* tree-cfg.c (increase_discriminator_for_locus): It was
next_discriminator_for_locus. Add a param "return_next".
(next_discriminator_for_locus): Renamed.
(assign_disc
o the comments before
ix86_current_function_calls_tls_descriptor, tls call may be optimized
away. ix86_compute_frame_layout is the latest place to do the update.
bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok
for trunk if tests pass?
Thanks,
Wei.
gcc/ChangeLog:
2014-03-07 Wei M
Regression test is ok.
Thanks,
Wei.
On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi wrote:
> Hi,
>
> This patch is to fix the problem described here:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
>
> I follow Ian's suggestion and set
> ix86_tls_descriptor
Yes, x32 has the same problem. It should be tested. Fixed.
Thanks,
Wei.
On Fri, Mar 7, 2014 at 2:06 PM, H.J. Lu wrote:
> On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi wrote:
>> Hi,
>>
>> This patch is to fix the problem described here:
>> http://gcc.gnu.org/bugzilla/sh
k. ok
for google-4_8 branch?
Thanks,
Wei.
gcc/ChangeLog:
2014-03-07 Wei Mi
* config/i386/i386.c (ix86_compute_frame_layout): update
preferred_stack_boundary when there is tls expanded call.
* config/i386/i386.md: set
ix86_tls_descriptor_calls_expanded_in_cfun.
> There are several problems with this:
>
> 1. It doesn't work with C.
Ok, I will change the testcase using C.
> 2. IA32 has the same issue and isn't fixed.
I thought IA32 didn't have the same issue because abi only requires 32
bit alignment for stack starting address.
oh, I found the old pat
sider the case
that tls call is optimized away?
Thanks,
Wei.
On Wed, Mar 12, 2014 at 2:07 PM, H.J. Lu wrote:
> On Wed, Mar 12, 2014 at 2:03 PM, Wei Mi wrote:
>>> There are several problems with this:
>>>
>>> 1. It doesn't work with C.
>>
>> Ok, I w
Oh, I see. Thanks!
Wei.
On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu wrote:
> On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi wrote:
>> Hi H.J.,
>>
>> Could you show me why you postpone the setting
>> ix86_tls_descriptor_calls_expanded_in_cfun until
+}
+
+/* { dg-final { scan-assembler-times ".cfi_def_cfa_offset 16" 2 } } */
On Wed, Mar 12, 2014 at 2:51 PM, Wei Mi wrote:
> Oh, I see. Thanks!
>
> Wei.
>
> On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu wrote:
>> On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi wrote:
>>&g
On Wed, Mar 12, 2014 at 3:07 PM, H.J. Lu wrote:
> On Wed, Mar 12, 2014 at 2:58 PM, Wei Mi wrote:
>> This is the updated testcase.
>
> Does my patch fix the original problem?
Yes, it works. I am doing bootstrap and regression test for your patch. Thanks!
>
>> Does my patch fix the original problem?
>
> Yes, it works. I am doing bootstrap and regression test for your patch.
> Thanks!
>
The patch passes bootstrap and regression test on x86_64-linux-gnu.
Thanks,
Wei.
template for tls_local_dynamic_base_32/tls_global_dynamic_32, and
set ix86_tls_descriptor_calls_expanded_in_cfun to true only after
reload complete?
Regards,
Wei.
On Wed, Mar 12, 2014 at 5:33 PM, H.J. Lu wrote:
> On Wed, Mar 12, 2014 at 5:28 PM, Wei Mi wrote:
>>>> Does my patch f
pr58066-2.patch worked for pr58066.c on ia32/x32/x86_64, but it failed
on bootstrap.
/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/./gcc/xgcc
-B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/./gcc/
-B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/install/x86_64-unknown-
>
> My ia32 change generates much worse code:
>
> [hjl@gnu-6 gcc]$ cat /tmp/c.i
> static __thread char ccc, bbb;
>
> int __cxa_get_globals()
> {
> return &ccc - &bbb;
> }
> [hjl@gnu-6 gcc]$ ./xgcc -B./ -S -O2 -fPIC /tmp/c.i
> [hjl@gnu-6 gcc]$ cat c.s
> .file "c.i"
> .section .text.unlikely,"ax",@p
> I tried pr58066-3.patch on the above testcase, the code it generated
> seems ok. I think after we change the 32bits pattern in i386.md to be
> similar as 64bits pattern, we should change 32bit expand to be similar
> as 64bit expand in legitimize_tls_address too?
>
> Thanks,
> Wei.
>
Sorry, I pas
> Can we combine the last two patches, both adding call explicitly in
> rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and
> set ix86_tls_descriptor_calls_expanded_in_cfun to true only after
> reload complete?
>
Hi H.J.
I attached the patch which combined your two patches and t
he patch is to add the merging in peephole.
bootstrap and regression pass. Is it ok for stage1?
Thanks,
Wei.
gcc/ChangeLog:
2014-04-09 Wei Mi
* config/i386/i386.c (get_memref_parts): New function.
(adjacent_mem_locations): Ditto.
* config/i386/i386-protos.h: Add
part. It is the same thing we want. Look forward to
your patch.
Thanks,
Wei.
On Wed, Apr 9, 2014 at 7:27 PM, Bin.Cheng wrote:
> On Thu, Apr 10, 2014 at 8:18 AM, Wei Mi wrote:
>> Hi,
>>
>> For the testcase 1.c
>>
>> #include
>>
>> double a[1
Ping.
Thanks,
Wei.
On Wed, Apr 9, 2014 at 5:18 PM, Wei Mi wrote:
> Hi,
>
> For the testcase 1.c
>
> #include
>
> double a[1000];
>
> __m128d foo1() {
> __m128d res;
> res = _mm_load_sd(&a[1]);
> res = _mm_loadh_pd(res, &a[2]);
> retu
101 - 154 of 154 matches
Mail list logo