Department of Electrical and Computer
Engineering
ECE 512 (411) Spring 2005
Wen-mei W. Hwu, Instructor
ECE 512 (411) Course
Physical Influences on Architecture
J. Edmondson, Impact of
Physical Design on Architecture, Chapter 1, The Design of High Performance
Microprocessor Circuits, A. Chandrakasan, W. Bowhill, F. Fox, editors,
IEEE Press.
R. Preston, Register Files
and Caches, Chapter 14, The Design of High Performance Microprocessor
Circuits, A. Chandrakasan, W. Bowhill,
F. Fox, editors, IEEE Press.
R. Gonzalez and M. Horowitz,
Energy Dissipation in General Purpose
Microprocessors, IEEE Journal of Solid-State Circuits, Vol. 31, No. 9,
Sept 1996.
S.
Borkar, Design Challenges of Technology Scaling,
IEEE Micro, Jul-Aug 1999.
Case Study 1: Pentium 4
G. Hinton et al, The Microarchitecture of the Pentiumฎ 4 Processor,
Intel Technology Journal, Q1, 2001.
Supplementary
Case Study 2: 21264
D. Leibholz
and R. Razdan, The Alpha
21264: A 500MHz Out-of-Order Execution Microprocessor, Compcon
97, February 1997.
Supplementary
Kesslers
presentation on the 21264
Power considerations
on the 21264
Front End Issues
E. Rotenberg, S. Bennett,
J.E. Smith, "Trace Cache: A Low
Latency Approach to High-Bandwidth Instruction Fetching, MICRO 29, Dec
1996.
R. F. Krick
et al, "Trace Based Instruction Caching," U.S.
Patent #6,018,786,
A. Seznec,
S. Felix, V. Krishnan, Y. Sazeides, Design Tradeoffs for the Alpha EV8
Conditional Branch Predictor, ISCA 29, May 2002.
M. Evers, S. J. Patel, R.
Chappell, and Y. Patt, An
Analysis of Correlation and Predictability: What Makes Two-Level Branch
Predictors Work, ISCA 25, June 1998
Q. Jacobson, E. Rotenberg,
and J. Smith, Path-Based Next Trace Prediction,
MICRO 30, November 1997.
Register Renaming
S. Jourdan et al, A Novel Renaming Scheme to Exploit Value Temporal
Locality through Physical Register Reuse and Unification, MICRO 31, Dec
1998.
T. Monreal et al, "Dynamic Register Renaming Through
Virtual-Physical Registers, Journal
of Instruction Level Parallelism, May 2000
Scheduling
S. Palacharla , N. Jouppi , J. E. Smith, Complexity-effective
Superscalar Processors, ISCA 24, June 1997
M. Brown, J. Stark, and Y. Patt, Select-Free Scheduling
Instruction Scheduling Logic, MICRO 34, December 2001.
Execution
A. Baniasadi
and A. Moshovos, Instruction Distribution
Heuristics for Quad-Clustered, Dynamically-Scheduled, Superscalar Processors, Micro 33, December 2000
Speculation/Recovery
J. E. Smith and A. R. Pleszkun, "Implementation
of Precise Interrupts in Pipelined Processors," ISCA 12, June 1985
Memory Ordering
G. Chrysos
and J. Emer, Memory
Dependence Prediction using Store Sets, ISCA 25, June 1998.
A. Moshovos
and G. Sohi, Streamlining
Inter-Operation Memory Communication via Data Dependence Prediction, MICRO
30, December 1997.
Cache Access
T. Juan, J. Navarro, O. Temam, "Data Caches
for Superscalar Processors," ICS 1997.
S. Cho,
P. C. Yew, G. Lee, A High-Bandwidth Memory Pipeline for Wide
Issue Processors, IEEE Transactions on Computers,
July 2001.
Cache/Memory Interface
T. F. Chen and J. L. Baer, Effective Hardware-Based Prefetching
for High-Performance Microprocessors, IEEE Transactions on Computers, May
1995.
O. Mutlu,
J. Stark, C. Wilkerson, and Y. Patt, "Runahead Execution: An Alternative to Very Large
Instruction Windows for Out-of-order Processors", HPCA 9, February 2003
B. Jacob and D. Wang, DRAM: Architectures, Interfaces, and
Systems: A Tutorial, Tutorial Presentation at ISCA 29, June 2002.
Case 3: Itanium
H. Sharangpani
and K. Arora, Itanium Processor Microarchitecture,
IEEE Micro, Sept-Oct 2000
Supplementary Papers
Various Authors, The Itanium
2 Papers and Presentations, ISSCC 2002.
Overview
Presentation, Overview
Paper
Register
File and Integer Datapath Presentation
L1
Data Cache Presentation, L1 Paper
L3
Cache Presentation, L3 Paper
The Software Approach to Scheduling
J. A. Fisher. "Trace
Scheduling: A Technique for Global Microcode Compaction." IEEE Transactions on Computers, July
1981.
B.R. Rau, D.W.L. Yen, W.Yen, and R.A. Towle, The Cydra 5 Departmental
Supercomputer," IEEE Computer, January 1989.
R.P. Colwell et al, "A
VLIW Architecture for a Trace Scheduling Compiler," ASPLOS 2, April
1987
B. R. Rau,
M. S. Schlansker, and P. P. Tirumalai. "Code
Generation Schema for Modulo Scheduled Loops", MICRO 25, December 1992
W. W. Hwu
et al, "The Superblock: An Effective Technique for VLIW and Superscalar Compilation",
The Journal of Supercomputing,
1993
D.I. August et al, "Integrated Predicated and Speculative Execution
in the IMPACT EPIC Architecture, ISCA 25, June 1998
Supplementary
Throughput Optimizations
D. Tullsen
et al, Exploiting Choice: Instruction Fetch and Issue on an Implementable
Simultaneous Multithreading Processor, ISCA 23, June 1996
Supplementary
C.K. Luk,
"Tolerating Memory Latency through
Software-Controlled Pre-Execution in Simultaneous Multithreading Processors,
ISCA 28, June 2001
K. Sundaramoorthy,
Z. Purser,
R. Rajwar
and J. Goodman, Speculative Lock Elision,
MICRO 34, December 2001.
Beyond Instruction Scheduling-Based Microarchitecture
M. Lam and R. Wilson, Limits of Control Flow on Parallelism, ISCA
19, June 1992.
E. Sprangle
and D. Carmean, Increasing Processor Performance by
Implementing Deeper Pipelines, ISCA 29, May 2002.
A. Glew, MLP yes! ILP no! Memory
Level Parallelism, or, why I no longer worry about IPC, ASPLOS 98 Wild and Crazy Ideas Session,
G. Sohi,
S. Breach, and T. N. Vijaykumar, Multiscalar
Processors, ISCA 22, June 1995.
M. Taylor et al, The Raw
Microprocessor: A Computational Fabric for Software
Circuits and General-Purpose Programs, IEEE Micro, March-April
2002
Supplementary
S. Rixner
et al, "A Bandwidth-Efficient Architecture for Media
Processing", MICRO 31, November 1998
Beyond General-Purpose Performance Optimizations
If there is time