The rePLay Framework
The rePLay Framework is a general-purpose
microprocessor architecture for the billion-transistors timeframe that attains high
performance by optimizing a program as it executes. As a program executes on
the rePLay hardware, a special logic structure identifies regions of the
program as candidates for highly aggressive optimization. Another structure
optimizes these regions and then stores them locally on-chip in a trace cache.
Whenever one of these locally-cached, highly optimized regions is executed,
performance is potentially higher than if the corresponding original code
executed. These regions are called frames:
they are dynamically-identified regions of the instruction stream that probabilistically have a single entry
point and a single exit point. This atomic property of frames allows
high-yielding optimizations to be performed on them, in the same manner that an
optimizing compiler performs aggressive code optimization on extended basic
blocks. The performance potential of these optimizations increases with longer
atomic regions. By generating frames dynamically, using information about
branch biases and correlations, we can create frames consisting of many
instructions, exhibiting a very high probability of complete execution, i.e.,
there is no early exit from the frame. Experimentally, we’ve observed
average frame sizes of 60-80 instructions on commercial code with completion
rates of over 98%. Furthermore, these
frames are constructed from a single flow of control. That is, all instructions within a frame are
control independent of each other.
Presentations:
Publications:
Tools:
Students:
Funding Sources: