carrot at google dot com
2014-10-23 23:08:54 UTC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63635
Bug ID: 63635
Summary: Reduce toc relative address computation for multiple
data access
Product: gcc
Version: 5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: carrot at google dot com
Target: powerpc64le
Currently ppc gcc generates two instructions to compute the address of non
local data. If the data layout is known to compiler, we can reduce one
instruction for the second and later data address computation. Following is an
example:
#include <stdio.h>
static int a,b,c;
void bar(int x)
{
a = b = c = x;
}
int foo()
{
return a+b+c;
}
int aa = 1;
int bb = 2;
int cc = 3;
int asdf()
{
return aa + bb + cc;
}
int main()
{
printf("Hello");
printf(", ");
printf("world.\n");
}
Compile it with options -O2 -m64 -mvsx -mcpu=power8
Function asdf is compiled to:
asdf:
0: addis 2,12,.TOC.-***@ha
addi 2,2,.TOC.-***@l
.localentry asdf,.-asdf
addis 3,2,***@toc@ha // A
addis 10,2,.LANCHOR1+***@toc@ha // B
addis 9,2,.LANCHOR1+***@toc@ha // C
lwz 3,***@toc@l(3) // D
lwz 10,.LANCHOR1+***@toc@l(10) // E
lwz 9,.LANCHOR1+***@toc@l(9) // F
add 3,3,10
add 3,3,9
extsw 3,3
blr
...
.globl cc
.globl bb
.globl aa
.section ".data"
.align 2
.set .LANCHOR1,. + 0
.type aa, @object
.size aa, 4
aa:
.long 1
.type bb, @object
.size bb, 4
bb:
.long 2
.type cc, @object
.size cc, 4
cc:
.long 3
Since the data layout of aa,bb,cc is known to compiler and their distance is
less than 64k, so the code sequence A-F can be optimized to:
addis 3,2,***@toc@ha
addi 3,3,***@toc@l
lwz 10,4(3)
lwz 9,8(3)
lwz 3,0(3)
Other functions can be similarly optimized.
Bug ID: 63635
Summary: Reduce toc relative address computation for multiple
data access
Product: gcc
Version: 5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: carrot at google dot com
Target: powerpc64le
Currently ppc gcc generates two instructions to compute the address of non
local data. If the data layout is known to compiler, we can reduce one
instruction for the second and later data address computation. Following is an
example:
#include <stdio.h>
static int a,b,c;
void bar(int x)
{
a = b = c = x;
}
int foo()
{
return a+b+c;
}
int aa = 1;
int bb = 2;
int cc = 3;
int asdf()
{
return aa + bb + cc;
}
int main()
{
printf("Hello");
printf(", ");
printf("world.\n");
}
Compile it with options -O2 -m64 -mvsx -mcpu=power8
Function asdf is compiled to:
asdf:
0: addis 2,12,.TOC.-***@ha
addi 2,2,.TOC.-***@l
.localentry asdf,.-asdf
addis 3,2,***@toc@ha // A
addis 10,2,.LANCHOR1+***@toc@ha // B
addis 9,2,.LANCHOR1+***@toc@ha // C
lwz 3,***@toc@l(3) // D
lwz 10,.LANCHOR1+***@toc@l(10) // E
lwz 9,.LANCHOR1+***@toc@l(9) // F
add 3,3,10
add 3,3,9
extsw 3,3
blr
...
.globl cc
.globl bb
.globl aa
.section ".data"
.align 2
.set .LANCHOR1,. + 0
.type aa, @object
.size aa, 4
aa:
.long 1
.type bb, @object
.size bb, 4
bb:
.long 2
.type cc, @object
.size cc, 4
cc:
.long 3
Since the data layout of aa,bb,cc is known to compiler and their distance is
less than 64k, so the code sequence A-F can be optimized to:
addis 3,2,***@toc@ha
addi 3,3,***@toc@l
lwz 10,4(3)
lwz 9,8(3)
lwz 3,0(3)
Other functions can be similarly optimized.