# Low Power Design of Digital Circuits

Takayasu Sakurai

Center for Collaborative Research, University of Tokyo E-mail: tsakurai@iis.u-tokyo.ac.ip

### **Abstract**

This paper will cover several of the schemes including multi- $V_{TH}$ , variable  $V_{TH}$ , multi- $V_{DD}$  and variable  $V_{DD}$  to achieve low-power systems. Circuit level ideas to software related research are described.

#### 1. Introduction

Power consumption of the VLSI's has been ever increasing (Fig.1) and a VLSI processor dissipating more than 100W has been introduced. A roadmap is suggesting even more power increase in the future with the supply voltage less than 0.5V (Fig.2). Thus, low-power and low-voltage designs are and will continue to be important for further progress of VLSI's.

### 2. Power consumption of CMOS VLSI's

The expression for power consumption is shown in Fig.3. The crowbar current component (or short-circuit current component) is less than 10% of total active power at present and will be decreasing in the future when V<sub>TH</sub>/V<sub>DD</sub> is increased (Fig.4-7). Consequently, the charging and discharging current component is dominant in an active mode and in a standby mode, leakage current component dominates. In the leakage components, subthreshold current is dominant now but gate tunneling current and gate induced drain leakage should be considered in the future. In calculating the dynamic current component, the voltage dependent gate capacitance should be watched out for (Fig. 8, 9).

Using typical values, power and delay are calculated for various  $V_{DD}$  and  $V_{TH}$  in Figs. 10 and 11. In order to reduce the power, it is preferable to decrease  $V_{DD}$  but decreasing  $V_{DD}$  leads to the decrease of performance. When we reduce  $V_{DD}$ , if we reduce  $V_{TH}$  at the same time, it is possible to maintain the speed of circuits. Then, the issue is the increase of the subthreshold leakage in the low- $V_{TH}$  region. This is the reason why some  $V_{DD}$ - $V_{TH}$  control is needed to achieve low-power yet high-speed circuits.

# 3. Multi- $V_{TH}$ , Variable $V_{TH}$ , Multi- $V_{DD}$ and Variable $V_{DD}$

Using two  $V_{TH}$ 's (MTCMOS) is one idea to take the trade-off between the speed in an active mode and the leakage in a standby mode. The other idea is to vary the  $V_{TH}$  dynamically using substrate bias effect, namely VTCMOS, which has been also pursued and productized [8, 9]. MTCMOS is definitely one way but does not operate properly when  $V_{DD}$  decreases below

0.5V. To overcome this shortcoming, Super Cut-off CMOS (SCCMOS) is proposed (Fig.12-15). By overdriving the MOS gate in a standby mode, it is possible to completely cut off the leakage current of low-V<sub>TH</sub> MOSFET's. The original MTCMOS and VTCMOS are applicable to logic part of the design but are not applicable to low-voltage SRAM's. If MTCMOS is applied to an SRAM, the stored information is lost in a standby. On the other hand, if VTCMOS is applied to an SRAM with low-V<sub>TH</sub> memory cells for high-speed purpose, the leakage current in an active mode is enormous. A possible solution to this problem is row-wise selective biasing as is shown in Fig.16.

In a multiple-voltage scheme known as Dual-VS scheme, critical paths are driven with higher  $V_{DD}$ , while non-critical gates are operated under low  $V_{DD}$ . An example of variable  $V_{DD}$  approach called software feedback loop is shown in Figs.17-18. Making use of data dependency, an order of magnitude reduction of power is possible with the scheme. This is a hardware-software cooperative approach for low power.

When using sub-0.5V V<sub>DD</sub> and low-V<sub>TH</sub>, watch out for the positive temperature dependence of speed (Fig.19-21). Thermal instability may occur when improper package is used (Fig.22). In introducing new circuit concept, layout modification of standard cell library is needed sometimes. It has been shown, however, small number of cells are sufficient in a library to achieve high performance (Fig.33, Table I, II).

### 4. Other low-power approaches

Power consumption of a clock system in a digital VLSI is comparable to the power consumed in other logic gates (Fig.24). In order to reduce the power for clocking, reduced swing clock scheme with special flipflops has been proposed (Fig.25). In the buffer insertion process in interconnection system optimization, if the delay is considered as a target function, the power increase by the buffer insertion amounts up to 60%, while if PD product is considered as an object function, the power increase is 26% (Fig.26). Verifying the standby power is another important issue. By using a special current sensing device, it is possible to measure the standby current (Fig.27-28). In an architectural level, a system LSI approach shows lower power than a general processor approach at the sacrifice of generality and historically low-power has set the technology trend (Fig.29-31).

### References

- K.Nose and T. Sakurai, "Closed-Form Expressions for Short-Circuit Power of Short-Channel CMOS Gates and Its Scaling Characteristics," ITC-CSCC (Korea), July 1998.
- [2] K.Nose and T.Sakurai, "Optimization of V<sub>DD</sub> and V<sub>TH</sub> for Low-Power and High-Speed Applications", ASPDAC'00, A6.1, Jan. 2000.
- [3] S. Mutoh, et al., "IV High-Speed Digital Circuit Technology with 0.5um Multi-Threshold CMOS," in Proc. IEEE 1993 ASIC Conf., 1993, pp. 186-189.
- [4] H.Kawaguchi and K.Nose, T.Sakurai, "A CMOS Scheme for 0.5V Supply Voltage with pico-Ampere Standby Current," 1998 ISSCC Digest of Tech. Papers, pp. 192-193, Feb. 1998.
- [5] H.Kawaguchi, Y.Itaka and T.Sakurai, "Dynamic Leakage Cutoff Scheme for Low-Voltage SRAM's," Symp. on VLSI Circuits, pp.140-141, June, 1998.
- [6] M.Takahashi et al., "A 60mW MPEG4 Video Codec Using Clustered Voltage Scaling with Variable Supply-Voltage Scheme," 1998 ISSCC Digest of Tech. Papers, pp.36-37, Feb 1998
- [7] K.Nose, S.Chae, and T.Sakurai, "Voltage Dependent Gate Capacitance and its Impact in Estimating Power and Delay of CMOS Digital Circuits," submitted, CICC'00.



Fig.1 Trend in processor power (from ISSCC)



Fig.2 Trend in voltage and power (from SIA)



Fig.3 Expression for CMOS power

- [8] T.Kuroda, T.Fujita, S.Mita, T.Nagamatsu, S.Yoshioka, F.Sano, M.Norishima, M.Murota, M.Kato, M.Kinugawa, M.Kakumu, and T.Sakurai, "A 0.9V 150MHz 10mW 4mm² 2-D Discrete Cosine Transform Core Processor with Variable-Threshold-Voltage Scheme," in ISSCC, pp. 166-167, Feb. 1996.
- [9] H.Mizuno, K.Ishibashi, T.Shimura, T.Hattori, S.Narita, K.Shiozawa, S.Ikeda and K.Uchiyama, "A 18uA-Standby-Current 1.8V 200MHz Microprocessor with Self Substrate-Biased Data-Retention Mode," 1998 ISSCC Digest of Tech. Papers, pp.280-281, Feb.1999.
- [10] Seongsoo Lee and T.Sakurai, "Run-time Power Control Scheme Using Software Feedback Loop for Low-Power Realtime Applications," ASPDAC'00, A5.2, Jan. 2000.
- [11] K.Kanda, K.Nose, H.Kawaguchi, and T.Sakurai, "Design Impact of Positive Temperature Dependence of Drain Current in Sub IV CMOS VLSI's", CICC99, pp.563-566, May 1999.
- [12] N.Duc, and T.Sakurai, "Compact yet High-Performance (CyHP) Library for Short Time-to-Market with New Technologies," ASPDAC'00, A6.2, Jan. 2000.
- [13] H.Kawaguchi and T.Sakurai, "A Reduced Clock-Swing Flip-Flop (RCSFF) for 63% Power Reduction," IEEE J. of Solid-State Circuits, pp.807-811, May 1998.
- [14] K.Nose and T.Sakurai, "Micro IDDQ Test using Lorentz Force MOSFET's," Symp. On VLSI Circuits, June 1999.



Fig.4 Short-circuit current (crowbar current)

$$P_{S} = \frac{k(v_{D0P}) f o_{r}^{2} C_{IN} V_{DD}^{2}}{\frac{v_{D0P} g(v_{T}, \alpha)}{2k(v_{D0P})} F O \beta_{r} + h(v_{T}, \alpha) f o_{p}}$$

$$\begin{split} g(v_{7},\alpha) &= \frac{\alpha_{N} + 1}{f(\alpha)} \frac{(1 - v_{7N})^{\alpha_{N}} (1 - v_{7T})^{\alpha_{P} + 2}}{(1 - v_{7N} - v_{7P})^{\alpha_{P} + 2} \cos_{\pi} + 2} \\ h(v_{7},\alpha) &= 2^{\alpha_{P}} (\alpha_{p} + 1) \frac{(1 - v_{7N} - v_{7T})^{\alpha_{P} + 1}}{(1 - v_{7N} - v_{7T})^{\alpha_{P} + 1}} \\ k(v_{D0P}) &= \frac{0.9}{0.8} + \frac{v_{2NP}}{0.8} \ln \frac{10v_{D0P}}{e} \end{split} \qquad fo_{p} = \frac{I_{D0P}}{I_{D0PN}} \quad \beta_{r} = \frac{I_{D0P}}{I_{D0N}} \end{split}$$

Fig.5 Expression for short-circuit power (Ref.1)



Fig. 6 Ratio of short-circuit power  $(P_s)$  vs total active power  $(P_s + P_n)$ 

SOUTH THE SECOND STATE OF THE SECOND SECOND



Fig. 7 Optimum  $V_{DD}$  and  $V_{TH}$  (Ref.2)



Fig.8 Voltage dependent gate capacitance (Ref.7)



Fig9 Effect of voltage dependent gate capacitance



Fig.10Power dependence on  $V_{\text{DD}} \,\&\, V_{\text{TH}}$ 



Fig. I I Delay dependence on  $V_{DD}$  &  $V_{TH}$ 



Fig.12Concept of Super Cut-off CMOS (SCCMOS) (Ref.4)



Fig.13 Super Cut-off CMOS Scheme (SCCMOS)



Fig.14Maintaining information in standby



Fig.15Delay characteristics (inverter & NAND) of SCCMOS. SCCMOS can push the limit of low-voltage operation down to 0.5V.



Fig.16Dynamic Leakage Cut-off SRAM (Ref.5)



Fig.17Software feedback loop for low-power (Ref.10)



Fig.18Power saving by software feedback loop. More than an order of magnitude reduction of power is possible with the scheme.



Fig.19Positive temperature effects on  $I_{DS}$ -  $V_{GS}$  in sub-1V region (Ref.11)



| Effects of $V_{TH}$ and $\mu$ on $I_{DS}$ when temp. goes up |                        |          |  |  |
|--------------------------------------------------------------|------------------------|----------|--|--|
| 100[K]                                                       | V <sub>TH</sub> effect | μ effect |  |  |
| V <sub>DO</sub> =2.5V, V <sub>TH</sub> =0.5V                 | 10%                    | 35% 🗽    |  |  |
| V 40V V 00V                                                  |                        |          |  |  |

Fig.20Temperature dependence of  $\mu$  and  $V_{\text{TH}}$ 



Fig.21 Measurement of 32bit full adder



Fig.22 Transient response of chip temperature



Fig.23 Average of relative delay vs. # of cells (Ref.12)

TABLE I: Contents of 11-cell CyHP library

| Flip-flops      | D-FF x1, D-FF x2                   |
|-----------------|------------------------------------|
| Inverters       | INV x1, INV x2, INV x4             |
| Primitive gates | 2-NAND x2<br>2-NOR x2<br>2-XNOR x1 |
| Compound gates  | 2-InvNAND x2<br>2-InvNOR x2        |
| Multiplexer     | 2-MUXInv x1                        |

## TABLE II: Contents of 20-cell CyHP library

| Flip flops      | D-FFN x1                                       |  |
|-----------------|------------------------------------------------|--|
| Inverters       | INV x8, INV x16                                |  |
| Primitive gates | 2-NAND x1<br>2-NOR x1<br>3-NAND x1<br>3-NOR x1 |  |
| Compound gates  | 3-AND-NOR x1<br>3-OR-NAND x1                   |  |

(only cells that not in Table I are listed)



Fig.24Power distribution in CMOS LSI's



Fig.25Reduced Clock Swing Flip-Flop (Ref.13)



Fig.26Delay and power optimization for repeaters



Electrons deflected by B<sub>y</sub>.
Voltage difference between Vo1 and Vo2.

Fig.27Lorentz Force MOS (LMOS) (Ref.14)



Fig.28 Measured  $\Delta V_{\rm D}$  dependence on  $l_{\rm P}$ 

### **Example of MPEG2 decoding**



Fig.29Architectural approach to low-power LSI's



Fig. 30 System LSI approach is inherently low-power reducing waste



- Not cost nor speed but power set the technology trend.
- Integration can achieve low cost and high speed as a system.

Fig.31 What sets the technology trend? Low-power does.