------- Comment #13 from steven at gcc dot gnu dot org 2008-12-15 21:27 ------- OK, to elaborate: I'm playing with this test case on ia64-linux, and I reduced the test case by some 8000 lines to make it compilable at all. With this 8000 lines less, it actually spends more time for me in "expand", in the function "find_temp_slot_from_address (rtx x)". It spends all of its time...
for (i = max_slot_level (); i >= 0; i--) for (p = *temp_slots_at_level (i); p; p = p->next) { if (XEXP (p->slot, 0) == x || p->address == x || (GET_CODE (x) == PLUS && XEXP (x, 0) == virtual_stack_vars_rtx && GET_CODE (XEXP (x, 1)) == CONST_INT && INTVAL (XEXP (x, 1)) >= p->base_offset && INTVAL (XEXP (x, 1)) < p->base_offset + p->full_size)) return p; else if (p->address != 0 && GET_CODE (p->address) == EXPR_LIST) for (next = p->address; next; next = XEXP (next, 1)) if (XEXP (next, 0) == x) /* ...here in this loop... */ return p; in the "for (next = p->address; ...)" loop. This list in p->address is actually several thousand items long and it is traversed many times: traversals ~ max_slot_level()*temp_slots_at_level(i)*list length of p->address which is, at best, cubic behavior. -- steven at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed|2008-12-10 15:39:38 |2008-12-15 21:27:40 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38474