Reducing Cache Misses in Numerical Applications Using Data Relocation and Prefetching (PostScript version, PDF version)
Yoji Yamada, Teresa L. Johnson, Grant Haab, John C. Gyllenhaal, and Wen-mei W. Hwu
Center for Reliable and High-Performance Computing University of Illinois, Urbana, IL, Technical Report CRHC-95-04, 1995.
Numerical applications frequently contain nested loops that
process large arrays of data. The execution of these loop
structures often produces memory reference patterns that utilize
data caches poorly. Indeed, poor reuse of the data, large
working set sized, and frequent non-unit stride accesses all
combine to cause many cache misses. To improve cache performance,
data copying has been proposed. However, this technique has
high overhead.
In this paper, instead, we propose a combined hardware and
software technique called data relocation and prefetching which
eliminates much of the overhead of data copying through the use
of special hardware. Furthermore, by relocating the data while
performing software prefetching, the overhead of copying the
data can be reduced further. This technique performs better
than prefetching alone because it reduced further. This
technique performs better than prefetching multiple elements
at once. The hardware is designed to overlap relocation and
prefetching with normal execution, and to highly utilize the
available bus bandwidth. simulation results show that this
technique greatly reduces data cache miss rates. As a result,
large applications including PERFECT and SPEC benchmarks
achieve up to 2.5 times speedup. The hardware support required
by this technique has been greatly refined over that presented
in an earlier paper.
[ IMPACT Main Page |
Team Members |
Publications |
Software |
FAQ ]