University of Illinois

Department of Electrical and Computer Engineering

ECE 512 (411) Spring 2005

Wen-mei W. Hwu, Instructor

 

ECE 512 (411) Course Reading List

 

Physical Influences on Architecture

 

J. Edmondson, “Impact of Physical Design on Architecture,” Chapter 1, The Design of High Performance Microprocessor Circuits, A. Chandrakasan, W. Bowhill, F. Fox, editors, IEEE Press.

 

R. Preston, “Register Files and Caches,” Chapter 14, The Design of High Performance Microprocessor Circuits, A. Chandrakasan, W. Bowhill, F. Fox, editors, IEEE Press.

 

R. Gonzalez and M. Horowitz, “Energy Dissipation in General Purpose Microprocessors,” IEEE Journal of Solid-State Circuits, Vol. 31, No. 9, Sept 1996.

 

S. Borkar, “Design Challenges of Technology Scaling IEEE Micro, Jul-Aug 1999.

 

Case Study 1: Pentium 4

 

G. Hinton et al, “The Microarchitecture of the Pentium 4 Processor,” Intel Technology Journal, Q1, 2001.

 

                Supplementary Reading:  J.  Huynh, “The AMD AthlonTM XP Processor with 512KB L2 Cache,” AMD White Paper, February 2003.

 

Case Study 2: 21264

 

D. Leibholz and R. Razdan, “The Alpha 21264: A 500MHz Out-of-Order Execution MicroprocessorCompcon 97, February 1997.

 

                Supplementary Readings:

                Circuit issues with 21264

                Circuit issues with 21264, II

                Kessler’s paper on the 21264

                Kessler’s presentation on the 21264

                Power considerations on the 21264

 

Front End Issues

 

E. Rotenberg, S. Bennett, J.E. Smith, "Trace Cache: A Low Latency Approach to High-Bandwidth Instruction Fetching,” MICRO 29, Dec 1996.

 

R. F. Krick et al, "Trace Based Instruction Caching," U.S. Patent #6,018,786, January 25, 2000

 

A. Seznec, S. Felix, V. Krishnan, Y. Sazeides, “Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor,” ISCA 29, May 2002.

 

M. Evers, S. J. Patel, R. Chappell, and Y. Patt, “An Analysis of Correlation and Predictability: What Makes Two-Level Branch Predictors Work,” ISCA 25, June 1998

 

Q. Jacobson, E. Rotenberg, and J. Smith, “Path-Based Next Trace Prediction,” MICRO 30, November 1997.

 

 

Register Renaming

 

S. Jourdan et al, “A Novel Renaming Scheme to Exploit Value Temporal Locality through Physical Register Reuse and Unification,” MICRO 31, Dec 1998.

 

T. Monreal et al, "Dynamic Register Renaming Through Virtual-Physical Registers  Journal of Instruction Level Parallelism, May 2000

 

Scheduling

 

S. Palacharla , N. Jouppi , J. E. Smith, “Complexity-effective Superscalar Processors”,  ISCA 24, June 1997

 

M. Brown, J. Stark, and Y. Patt, “Select-Free Scheduling Instruction Scheduling Logic”, MICRO 34, December 2001.

 

Execution

 

A. Baniasadi and A. Moshovos, “Instruction Distribution Heuristics for Quad-Clustered, Dynamically-Scheduled, Superscalar Processors”,  Micro 33, December 2000

 

Speculation/Recovery

 

J. E. Smith and A. R. Pleszkun, "Implementation of Precise Interrupts in Pipelined Processors," ISCA 12, June 1985

 

Memory Ordering

 

G. Chrysos and J. Emer, “Memory Dependence Prediction using Store Sets,” ISCA 25, June 1998.

 

A. Moshovos and G. Sohi, “Streamlining Inter-Operation Memory Communication via Data Dependence Prediction,” MICRO 30, December 1997.

 

Cache Access

 

T. Juan, J. Navarro, O. Temam, "Data Caches for Superscalar Processors," ICS 1997.

 

S. Cho, P. C. Yew, G. Lee, “A High-Bandwidth Memory Pipeline for Wide Issue Processors IEEE Transactions on Computers, July 2001.

 

Cache/Memory Interface

 

T. F. Chen and J. L. Baer, “Effective Hardware-Based Prefetching for High-Performance Microprocessors”, IEEE Transactions on Computers, May 1995.

 

O. Mutlu, J. Stark, C. Wilkerson, and Y. Patt, "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors", HPCA 9, February 2003

 

B. Jacob and D. Wang, “DRAM: Architectures, Interfaces, and Systems: A Tutorial”, Tutorial Presentation at ISCA 29, June 2002.

 

Case 3: Itanium

 

H. Sharangpani and K. Arora, “Itanium Processor Microarchitecture,” IEEE Micro, Sept-Oct 2000

 

                Supplementary Papers

                Various Authors, The Itanium 2 Papers and Presentations, ISSCC 2002.

                Overview Presentation,                       Overview Paper

                Register File and Integer Datapath Presentation

                L1 Data Cache Presentation,              L1 Paper

                L2 Cache Presentation

                L3 Cache Presentation,                       L3 Paper

               

The Software Approach to Scheduling

 

J. A. Fisher. "Trace Scheduling: A Technique for Global Microcode Compaction." IEEE Transactions on Computers, July 1981.

 

B.R. Rau, D.W.L. Yen, W.Yen, and R.A. Towle, “The Cydra 5 Departmental Supercomputer," IEEE Computer, January 1989.

 

R.P. Colwell et al, "A VLIW Architecture for a Trace Scheduling Compiler," ASPLOS 2, April 1987

 

B. R. Rau, M. S. Schlansker, and P. P. Tirumalai. "Code Generation Schema for Modulo Scheduled Loops", MICRO 25, December 1992

 

W. W. Hwu et al, "The Superblock: An Effective Technique for VLIW and Superscalar Compilation", The Journal of Supercomputing, 1993

 

D.I. August et al, "Integrated Predicated and Speculative Execution in the IMPACT EPIC Architecture”, ISCA 25, June 1998

 

                Supplementary Reading: S. Mahlke et al, “Effective Compiler Support for Predicated Execution Using the Hyperblock,” MICRO 24, December 1992.

 

Throughput Optimizations

 

D. Tullsen et al, “Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor”, ISCA 23, June 1996

                Supplementary Reading:  J. Emer, “EV8: The Post-Ultimate Alpha,” presentation at PACT 9, 2001.

 

C.K. Luk, "Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors,” ISCA 28, June 2001

 

K. Sundaramoorthy, Z. Purser, E. Rotenberg, “Slipstream Processors: Improving Both Performance and Fault Tolerance,” ASPLOS 9, November 2000.

 

R. Rajwar and J. Goodman, “Speculative Lock Elision,” MICRO 34, December 2001.

 

Beyond Instruction Scheduling-Based Microarchitecture

 

M. Lam and R. Wilson, “Limits of Control Flow on Parallelism”, ISCA 19, June 1992.

 

E. Sprangle and D. Carmean, “Increasing Processor Performance by Implementing Deeper Pipelines,” ISCA 29, May 2002.

 

A. Glew, “MLP yes! ILP no! Memory Level Parallelism, or, why I no longer worry about IPC”, ASPLOS 98 Wild and Crazy Ideas Session, San Jose, October 1998.

 

G. Sohi, S. Breach, and T. N. Vijaykumar, “Multiscalar Processors”, ISCA 22, June 1995.


M. Taylor et al, “The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs”, IEEE Micro, March-April 2002

 

                Supplementary Reading: M. Taylor et al, “The Raw Processor”, Presentation from HotChips.

 

S. Rixner et al, "A Bandwidth-Efficient Architecture for Media Processing", MICRO 31, November 1998

 

Beyond General-Purpose Performance Optimizations

 

If there is time