Power dissipation is a major issue in processor design. In particular, CMOS technology scaling
has significantly increased the leakage power dissipation so that it accounts for an increasingly large
fraction of processor power dissipation. One of the main issue is how to achieve power savings without loss
of performance.
Much of our work in this area has focused on cache power dissipation. We addressed issues
in L1 I- and D-cache dynamic as well as static power consumption. This included way caching to save
static and dynamic power in high-associativity caches (as an alternative to way prediction),
cached load-store queue as a low-cost alternative to L0 cache, using branch prediction information
to save power in instruction caches. We addressed L2 power consumption, in particular leakage power
in L2 peripheral circuits. The results of this research are applicable in both embedded and
high-performance processors.
Another aspect of this research is low-power instruction queue design for out-of-order processors.
CAM-based instruction queues are not scalable and consume significant amount of power due to wide
issue and CAM search on each cycle. One approach we proposed used a banked queue, thus dividing a
CAM into smaller banks with faster search. A pointer table indicates which bank an instruction belongs to.
A more complex approach disposed of CAM-based queue altogether and used instruction dependence pointers
and RAM-based queue for "direct" wakeup. It solved the problem of how to achieve fast branch
misprediction recovery when using pointers while using dependent pointers.
Finally, we investigated the problem of power consumption in the register file. Content-aware register file
utilized knowledge of instruction operand and effective address width to reduce the number
of bits read from the RF and to speed up TLB access using an "L0 TLB". This type of register file was also
shown to enable a new type of clustered processor with improved performance and reduced power.