The rePLay Framework

The rePLay Framework is a general-purpose microprocessor architecture for the billion-transistors timeframe that attains high performance by optimizing a program as it executes. As a program executes on the rePLay hardware, a special logic structure identifies regions of the program as candidates for highly aggressive optimization. Another structure optimizes these regions and then stores them locally on-chip in a trace cache.

Whenever one of these locally-cached, highly optimized regions is executed, performance is potentially higher than if the corresponding original code executed. These regions are called frames: they are dynamically-identified regions of the instruction stream that probabilistically have a single entry point and a single exit point. This atomic property of frames allows high-yielding optimizations to be performed on them, in the same manner that an optimizing compiler performs aggressive code optimization on extended basic blocks. The performance potential of these optimizations increases with longer atomic regions. By generating frames dynamically, using information about branch biases and correlations, we can create frames consisting of many instructions, exhibiting a very high probability of complete execution, i.e., there is no early exit from the frame. Experimentally, we’ve observed average frame sizes of 60-80 instructions on commercial code with completion rates of over 98%.  Furthermore, these frames are constructed from a single flow of control.  That is, all instructions within a frame are control independent of each other.

 

Presentations:

 

 

Publications:

 

Tools:

 

Students:

 

Funding Sources: