# **Reducing Power Consumption of CMOS VLSI's through VDD and VTH Control**

Takayasu Sakurai

Center for Collaborative Research, University of Tokyo E-mail: tsakurai@iis.u-tokyo.ac.jp

#### Abstract

Lowering operating voltage,  $V_{DD}$ , is a key to lowpower CMOS digital VLSI's. In order to complete a certain task in a required time and in order to keep leakage current within a tolerable level in the low  $V_{DD}$ designs,  $V_{DD}$  and  $V_{TH}$  control is obligatory. This talk will cover several of the schemes including multi- $V_{TH}$ , variable  $V_{TH}$ , multi- $V_{DD}$  and variable  $V_{DD}$  to achieve low-power systems. Circuit level ideas to software related research are described.

#### **1. Introduction**

Power consumption of the VLSI's has been ever increasing (Fig.1) and a VLSI processor dissipating more than 100W has been introduced. A roadmap is suggesting even more power increase in the future with the supply voltage less than 0.5V (Fig.2). Thus, lowpower and low-voltage designs are and will continue to be important for further progress of VLSI's.

## 2. Power consumption of CMOS VLSI's

The expression for power consumption is shown in Fig.3. The crowbar current component (or short-circuit current component) is less than 10% of total active power at present and will be decreasing in the future when  $V_{TH}/V_{DD}$  is increased (Fig.4-7). Consequently, the charging and discharging current component is dominant in an active mode and in a standby mode, leakage current component dominates. In the leakage components, subthreshold current is dominant now but gate tunneling current and gate induced drain leakage should be considered in the future. In calculating the dynamic current component, the voltage dependent gate capacitance should be watched out for (Fig. 8, 9).

Using typical values, power and delay are calculated for various  $V_{DD}$  and  $V_{TH}$  in Figs. 10 and 11. In order to reduce the power, it is preferable to decrease  $V_{DD}$  but decreasing  $V_{DD}$  leads to the decrease of performance. When we reduce  $V_{DD}$ , if we reduce  $V_{TH}$  at the same time, it is possible to maintain the speed of circuits. Then, the issue is the increase of the subthreshold leakage in the low- $V_{TH}$  region. This is the reason why some  $V_{DD}$ - $V_{TH}$  control is needed to achieve low-power yet high-speed circuits.

# 3. Multi- $V_{TH}$ , Variable $V_{TH}$ , Multi- $V_{DD}$ and Variable $V_{DD}$

Using two V<sub>TH</sub>'s (MTCMOS) is one idea to take the trade-off between the speed in an active mode and the leakage in a standby mode (Fig.12). The other idea is to vary the V<sub>TH</sub> dynamically using substrate bias effect, namely VTCMOS, which has been also pursued and productized [8, 9] (Fig.13-15). The comparison of VTCMOS and MTCMOS is tabulated in Fig. 16. MTCMOS is conceptually simple and easy to implement and VTCMOS is better performance-wise. MTCMOS is definitely one way but does not operate properly when  $V_{DD}$  decreases below 0.5V. To overcome this shortcoming, Super Cut-off CMOS (SCCMOS) is proposed (Fig.17-20). By over-driving the MOS gate in a standby mode, it is possible to completely cut off the leakage current of  $low-V_{TH}$ MOSFET's. The original MTCMOS and VTCMOS are applicable to logic part of the design but are not applicable to low-voltage SRAM's. If MTCMOS is applied to an SRAM, the stored information is lost in a standby. On the other hand, if VTCMOS is applied to an SRAM with low-V<sub>TH</sub> memory cells for high-speed purpose, the leakage current in an active mode is enormous. A possible solution to this problem is rowwise selective biasing as is shown in Fig.21.

A multiple-voltage scheme known as Dual-VS scheme is shown in Figs. 22 and 24, where critical paths are driven with higher  $V_{DD}$ , while non-critical gates are operated under low  $V_{DD}$ . An example of

variable  $V_{DD}$  approach called software feedback loop is shown in Figs.25-26. Making use of data dependency, an order of magnitude reduction of power is possible with the scheme. This is a hardware-software cooperative approach for low power.

When using sub-0.5V  $V_{DD}$  and low- $V_{TH}$ , watch out for the positive temperature dependence of speed (Fig.27-29). Thermal instability may occur when improper package is used (Fig.30). In introducing new circuit concept, layout modification of standard cell library is needed sometimes. It has been shown, however, small number of cells are sufficient in a library to achieve high performance (Fig.31-33, Table I, II).

### 4. Other low-power approaches

Power consumption of a clock system in a digital VLSI is comparable to the power consumed in other logic gates (Fig.34). In order to reduce the power for clocking, reduced swing clock scheme with special flip-flops has been proposed (Fig.35). In an architectural level, a system LSI approach shows lower power than a general processor approach at the sacrifice of generality and historically low-power has set the technology trend (Fig.36-38).

### References

- K.Nose and T. Sakurai, "Closed-Form Expressions for Short-Circuit Power of Short-Channel CMOS Gates and Its Scaling Characteristics," ITC-CSCC (Korea), July 1998.
- [2] K.Nose and T.Sakurai, "Optimization of V<sub>DD</sub> and V<sub>TH</sub> for Low-Power and High-Speed Applications", ASPDAC'00, A6.1, Jan. 2000.
- [3] S. Mutoh, et al., "1V High-Speed Digital Circuit Technology with 0.5um Multi-Threshold CMOS," in Proc. IEEE 1993 ASIC Conf., 1993, pp. 186-189.

- [4] H.Kawaguchi and K.Nose, T.Sakurai, "A CMOS Scheme for 0.5V Supply Voltage with pico-Ampere Standby Current," 1998 ISSCC Digest of Tech. Papers, pp.192-193, Feb. 1998.
- [5] H.Kawaguchi, Y.Itaka and T.Sakurai, "Dynamic Leakage Cut-off Scheme for Low-Voltage SRAM's," Symp. on VLSI Circuits, pp.140-141, June, 1998.
- [6] M.Takahashi et al., "A 60mW MPEG4 Video Codec Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme," 1998 ISSCC Digest of Tech. Papers, pp.36-37, Feb.1998.
- [7] K.Nose, S.Chae, and T.Sakurai, "Voltage Dependent Gate Capacitance and its Impact in Estimating Power and Delay of CMOS Digital Circuits," submitted, CICC'00.
- [8] T.Kuroda, T.Fujita, S.Mita, T.Nagamatsu, S.Yoshioka, F.Sano, M.Norishima, M.Murota, M.Kato, M.Kinugawa, M.Kakumu, and T.Sakurai, "A 0.9V 150MHz 10mW 4mm<sup>2</sup> 2-D Discrete Cosine Transform Core Processor with Variable-Threshold-Voltage Scheme," in ISSCC, pp. 166-167, Feb. 1996.
- [9] H.Mizuno, K.Ishibashi, T.Shimura, T.Hattori, S.Narita, K.Shiozawa, S.Ikeda and K.Uchiyama, "A 18uA-Standby-Current 1.8V 200MHz Microprocessor with Self Substrate-Biased Data-Retention Mode," 1998 ISSCC Digest of Tech. Papers, pp.280-281, Feb.1999.
- [10] Seongsoo Lee and T.Sakurai, "Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Real-time Applications," ASPDAC'00, A5.2, Jan. 2000.
- [11] K.Kanda, K.Nose, H.Kawaguchi, and T.Sakurai, "Design Impact of Positive Temperature Dependence of Drain Current in Sub 1V CMOS VLSI's", CICC99, pp.563-566, May 1999.
- [12] N.Duc, and T.Sakurai, "Compact yet High-Performance (CyHP) Library for Short Time-to-Market with New Technologies," ASPDAC'00, A6.2, Jan. 2000.
- [13] H.Kawaguchi and T.Sakurai, "A Reduced Clock-Swing Flip-Flop (RCSFF) for 63% Power Reduction," IEEE J. of Solid-State Circuits, pp.807-811, May 1998.



Fig.1 Trend in processor power (from ISSCC)



Fig.2 Trend in voltage and power (from SIA)



Fig.3 Expression for CMOS power











Fig.6 Ratio of short-circuit power  $(P_S)$  vs total active power  $(P_S + P_D)$ 



Fig. 7 Optimum  $V_{DD}$  and  $V_{TH}$  (Ref.2)



Fig.8 Voltage dependent gate capacitance (Ref.7)



Fig9 Effect of voltage dependent gate capacitance



Fig.10Power dependence on  $V_{DD}$  &  $V_{TH}$ 



Fig.11Delay dependence on V<sub>DD</sub> & V<sub>TH</sub>



In active mode, low-V<sub>TH</sub> MOSFET's achieve high speed.
In standby mode when St'by signal is high, high-V<sub>TH</sub> MOSFET's in series to normal logic circuits cut off leakage current.

Fig.12Multi-Threshold CMOS (MTCMOS) (Ref.3)



Fig.13 Stanby Power Reduction circuit(SPR). Part of Variable Threshold CMOS (VTCMOS)











Fig.16Comparison between VTCMOS & MTCMOS



Fig.17Concept of Super Cut-off CMOS (SCCMOS) (Ref.4)



Fig.18Super Cut-off CMOS Scheme (SCCMOS)



Fig.19Maintaining information in standby







Fig.21Dynamic Leakage Cut-off SRAM (Ref.5)



Fig.22Clustered Voltage Scaling for Multiple VDD's. Dual voltage supply scheme (Dual-VS) (Ref.6)



Optimum VL/VH is between **0.6~0.7** for any kinds of path-delay distribution functions.

Fig.23Power Reduction vs.  $V_L/V_H$ 





Fig.25Software feedback loop for low-power (Ref.10)







Fig.27Positive temperature effects on  $I_{DS}$ -  $V_{GS}$  in sub-1V region (Ref.11)

| $I_{DS} \propto \mu(T) (V_{DD} - V_{TH}(T))^{\alpha}$ | т 🖊 | т 🔪 |
|-------------------------------------------------------|-----|-----|
| $\mu(T) = \mu(T_0)(T / T_0)^{-m}$                     | X   | 1   |
| $V_{TH}(T) = V_{TH}(T_0) - \kappa(T - T_0)$           | X   | 1   |

Typical Value :  $\alpha$ =1.5, m=1.5,  $\kappa$ =2.5[mV/T] Effects of  $V_{TH}$  and  $\mu$  on  $I_{DS}$  when temp. goes up

| 100 | D[K]                                         | V <sub>TH</sub> effect | μ effect |
|-----|----------------------------------------------|------------------------|----------|
|     | V <sub>DD</sub> =2.5V, V <sub>TH</sub> =0.5V | 10% 🗡                  | 35% 🔪    |
|     | V <sub>DD</sub> =1.0V, V <sub>TH</sub> =0.2V | 55% 🖊                  | 35% 🔪    |

Fig.28Temperature dependence of  $\mu$  and  $V_{\rm TH}$ 









Fig.31Average of relative delay vs. # of cells (Ref.12)



Fig.32Average of relative area vs. number of cells



Fig.33Relative synthesis time vs. # of cells.

### TABLE I: Contents of 11-cell CyHP library

| Flip-flops      | D-FF x1, D-FF x2       |
|-----------------|------------------------|
| Inverters       | INV x1, INV x2, INV x4 |
| Primitive gates | 2-NAND x2              |
|                 | 2-NOR x2               |
|                 | 2-XNOR x1              |
| Compound        | 2-InvNAND x2           |
| gates           | 2-InvNOR x2            |
| Multiplexer     | 2-MUXInv x1            |

#### TABLE II: Contents of 20-cell CyHP library

| Flip flops      | D-FFN x1                                       |
|-----------------|------------------------------------------------|
| Inverters       | INV x8, INV x16                                |
| Primitive gates | 2-NAND x1<br>2-NOR x1<br>3-NAND x1<br>3-NOR x1 |
| Compound gates  | 3-AND-NOR x1<br>3-OR-NAND x1                   |

(only cells that not in Table I are listed)



Fig.34Power distribution in CMOS LSI's



Fig.35Reduced Clock Swing Flip-Flop (Ref.13)

## Example of MPEG2 decoding







Fig.37System LSI approach is inherently low-power reducing waste





Fig.38What sets the technology trend? Low-power does.