Hi, Im sorry that this is not 100% specific to gcc, however this
mailing list is the last place where I think this knowledge may lie. I
have written some image processing routines in assembly language
making extensive use of MMX, and now I want to start optimizing it,
however I cant for the life of me find any documentation such as the
Intel/AMD optimization manuals for pentium/athlon/opertron cores. I
cant even find much useful information from mailing lists such as this
one, where i was hoping to find it. Anyway, Im no expert in the
matter, however I do understand the concepts of instruction pairing
and pipelining. I know that the geode lx core, which is what we have
for robocup, is non-superscalar. From what I understand the core has
two pipelines, the one to the Integer unit and the other to the
fpu/MMX/3d Now unit. Does this more or less mean that instruction
pairing has no effect? Is it still worth scheduling instructions in a
pattern, such as the 4 - 1 - 1 the intel optimization manual suggests
for its cores? I saw that gcc 4.3 added geode support, and Im hoping
someone will have some better knowledge of the subject. Can anyone
give me any pointers as to what i should be trying to optimize, or
better yet links to documentation or hard benchmarks? Thanks in
advance.

Reply via email to