# Variable Supply-Voltage Scheme for Low-Power High-Speed CMOS Digital Design

Tadahiro Kuroda, Member, IEEE, Kojiro Suzuki, Shinji Mita, Tetsuya Fujita, Fumiyuki Yamane, Fumihiko Sano, Akihiko Chiba, Yoshinori Watanabe, Koji Matsuda, Takeo Maeda, Takayasu Sakurai, Member, IEEE, and Tohru Furuyama, Member, IEEE

Abstract—This paper describes a variable supply-voltage (VS) scheme. From an external supply, the VS scheme automatically generates minimum internal supply voltages by feedback control of a buck converter, a speed detector, and a timing controller so that they meet the demand on its operation frequency. A 32-b RISC core processor is developed in a 0.4- $\mu$ m CMOS technology which optimally controls the internal supply voltages with the VS scheme and the threshold voltages through substrate bias control. Performance in MIPS/W is improved by a factor of more than two compared with its conventional CMOS design.

*Index Terms*—Buck converter, low power CMOS circuits, low threshold voltage, low voltage.

## I. INTRODUCTION

OWERING both the supply voltage  $V_{DD}$  and threshold voltage  $V_{\text{TH}}$  enables high-speed, low-power operation [1]-[4]. Fig. 1 depicts equispeed lines (broken lines) and equipower lines (solid lines) on a  $V_{\rm DD}$ - $V_{\rm TH}$  plane calculated from their theoretical models [5], [6]. Typically, circuits are designed at  $V_{
m DD}$  = 3.3 V  $\pm$  10% and  $V_{
m TH}$  = 0.55 V  $\pm$ 0.1 V as shown by a rectangle in Fig. 1. This rectangle is a design window because all the circuit specifications should be satisfied within the rectangle for yield-conscious design. In the design window, circuit speed becomes the slowest at the corner A while at the corner B power dissipation becomes the highest. Therefore, better tradeoffs between speed and power can be found by reducing fluctuations of  $V_{\rm DD}$  and  $V_{\rm TH}$  especially in low  $V_{\rm DD}$  [7], [8]. The equispeed lines and the equipower lines are normalized at the corner A and B as designated by normalized factor  $\kappa_s$  and  $\kappa_p$ , respectively, so that it can be figured out how much speed and power dissipation are improved or degraded compared to the typical condition by sliding and sizing the design window on the  $V_{\rm DD}$ - $V_{\rm TH}$  plane. For example, at  $V_{\rm DD}$  = 2.1 V ±5% and  $V_{\rm TH}$  = 0.18 V ± 0.05 V power dissipation can be reduced to about 40% while maintaining the circuit speed. In this way, optimizing  $V_{\rm DD}$ and  $V_{\rm TH}$  is essential in low-power high-speed CMOS design

Manuscript received July 18, 1997; revised October 22, 1997.

T. Maeda is with the Semiconductor Group, Toshiba Corp., Kawasaki, Japan.

T. Sakurai is with the Institute of Industrial Science, University of Tokyo, Tokyo, Japan.

Publisher Item Identifier S 0018-9200(98)01022-1.

Equi-power (solid lines) 0.01 0.05 0.1 0.2 0.3 0.5 0.7 0.8 .1 0.7 40MHz 1.2 0.6 0 0.5 00 speed  $V_{TH}(V)$ P=300mW 0.4 0.3 30MHz 40MHz 0.2 =120mW 0.1 leakage =10%0 0.5 2.5 1 1.5 2 3 3.5 4  $V_{DD}(\mathbf{V})$ 

Fig. 1. Exploring low- $V_{\rm DD}$ , low- $V_{\rm TH}$  design space.

while they are given as constant and common parameters in the conventional CMOS design.

 $V_{\rm TH}$  can be controlled through substrate bias. A variable threshold-voltage CMOS (VTCMOS) technology is developed [6], [8]–[12]. It dynamically varies  $V_{\rm TH}$  through substrate-bias  $V_{\rm BB}$ .  $V_{\rm BB}$  is controlled so as to compensate  $V_{\rm TH}$  fluctuations in an active mode, while in a standby mode and in the  $I_{\rm DDQ}$ testing, deep  $V_{\rm BB}$  is applied to increase  $V_{\rm TH}$  and cut off subthreshold leakage current. It is reported in [9] that  $V_{\rm TH}$ fluctuations can be reduced to  $\pm 0.05$  V under  $\pm 0.15$  V process fluctuations.

A self-adjusting voltage reduction circuit has been developed [13] using a phase locked loop. However, there has been no report on a digital control scheme. This paper presents a digital circuit scheme to control  $V_{\rm DD}$  on a chip, namely the variable supply-voltage scheme (VS scheme). In the VS scheme, a dc-dc converter [14] generates an internal supply voltage  $V_{\rm DDL}$  very efficiently from an external power supply.  $V_{\rm DDL}$  is controlled by monitoring propagation delay of a critical path in a chip such that it is set to the minimum of voltages in which the chip can operate at a given clock frequency  $f_{\rm ext}$ . This control also reduces  $V_{\rm DDL}$  fluctuations, which is essential in low-voltage design. A 32-b RISC core processor is designed with the VS scheme in the VTCMOS

T. Kuroda, K. Suzuki, S. Mita, T. Fujita, F. Yamane, and T. Furuyama are with the System ULSI Engineering Laboratory, Toshiba Corporation, Kawasaki 210, Japan.

F. Sano, A. Chiba, Y. Watanabe, and K. Matsuda are with the System LSI Development Division, Toshiba Micro Electronics Corp., Kawasaki, Japan.



Fig. 2. Variable supply-voltage (VS) scheme.

[15] and achieves more than double the MIPS/W performance compared with the previous CMOS design [16] in the same technology.

In Section II, the VS scheme is described. Circuit implementations are presented in Section III together with a discussion of low-power circuit design. The RISC core with the VS scheme is fabricated in a 0.4- $\mu$ m CMOS technology and compared with the previous design. The experimental results are reported in Section IV. Section V is dedicated to conclusions.

### II. VARIABLE SUPPLY-VOLATAGE (VS) SCHEME

The VS scheme is illustrated in Fig. 2. It consists of three parts: 1) a buck converter, 2) a timing controller, and 3) a speed detector. The buck converter generates  $(N/64) \cdot V_{DD}$  for the internal supply voltage  $V_{DDL}$ . N is an integer from 0 to 63 which is provided from the timing controller. Therefore, the resolution of  $V_{DDL}$  is about 50 mV for  $V_{DD} = 3.3$  V. A duty control circuit generates rectangular waveforms with duty cycle of N/64 whose average voltage is produced by the second-order low-pass filter configured by external inductance L and capacitance C. The lower limit of  $V_{DDL}$  can be set in the duty control circuit to assure the minimum operating voltage of a chip. The upper limit can also be set to prevent N from transiting spuriously from 63 to 0 as a result of noise.

The timing controller calculates N by accumulating numbers provided from the speed detector, +1 to raise  $V_{\text{DDL}}$  and -1 to lower  $V_{\text{DDL}}$ . The accumulation is carried out by a clock whose frequency is controlled by a 10-b programmable counter.

The speed detector monitors critical path delay in the chip by its replicas under  $V_{\text{DDL}}$ . When  $V_{\text{DDL}}$  is too low for the circuit operation in  $f_{\text{ext}}$ , the speed detector outputs +1 to raise  $V_{\text{DDL}}$ . On the other hand, when  $V_{\text{DDL}}$  is too high, the speed detector outputs -1 to lower  $V_{\text{DDL}}$ . By this feedback control, the VS scheme can automatically generate the minimum  $V_{\text{DDL}}$  which meets the demand on its operation frequency. For failsafe control, a small delay is added to the critical path replicas.

Since the speed detection cycle based on  $f_{\text{ext}}$  (e.g., 25 ns) is much faster than the time constant of the low-pass filter (e.g., 16  $\mu$ s) the feedback control may fall into oscillation. The programmable counter in the timing controller adjusts the accumulation frequency  $f_{\text{N}}$  to assure fast and stable response of the feedback control.

There is no interference between the VS scheme and the VTCMOS. The VTCMOS controls  $V_{\rm TH}$  by referring to leakage current of a chip, while the VS scheme controls  $V_{\rm DDL}$  by referring to  $f_{\rm ext}$ .  $V_{\rm DDL}$  is also affected by  $V_{\rm TH}$  because circuit speed is dependent on  $V_{\rm TH}$ . Therefore,  $V_{\rm TH}$  is determined by the VTCMOS, and under the condition,  $V_{\rm DDL}$  is determined by the VS scheme. Since VTCMOS is immune to  $V_{\rm DDL}$  noise [6], there is no feedback from the VS scheme to the VTCMOS, resulting in no oscillation problem between them.

#### **III. CIRCUIT IMPLEMENTATIONS**

## A. Buck Converter

Fig. 3 depicts a circuit schematic of the buck converter. When the output of a 6-b counter n is between 0 and N, a pMOS of an output inverter is turned on. When n is between N + 1 and 63, an nMOS of the output inverter is turned on. When n is between N and N + 1, and between 63 and 0, neither the pMOS nor the nMOS is turned on to prevent short current from flowing in the large output inverter. The output voltage of the buck converter  $V_{\text{DDL}}$  is therefore controlled with 64-step resolution. This resolution causes +50 mV error at  $V_{\text{DDL}}$  from  $V_{\text{DD}} = 3.3$  V, which yields +3.3%  $V_{\text{DDL}}$  error at  $V_{\text{DDL}} = 1.5$  V. Note that the error is always positive because the speed detector cannot accept lower  $V_{\text{DDL}}$  than a target voltage.



Fig. 3. Buck converter: (a) circuit schematic and (b) timing chart.

The external low-pass filter L and C, an effective resistance of the output inverter R, and its switching period  $\Delta T$  (or switching frequency f) should be designed considering dcdc conversion efficiency  $\eta$ , output voltage ripple when output current is constant  $\Delta V_{\text{out}}/V_{\text{out}}$ , output voltage drop when output current changes  $\delta V_{\text{out}}/V_{\text{out}}$ , time constant of the filter as an index of the response  $T_0$ , and pattern area S.

The efficiency  $\eta$  can be expressed as follows:

$$\eta = \frac{V_{\text{out}}I_{\text{out}}}{V_{\text{out}}I_{\text{out}} + I_{\text{out}}^2 R + P_{\text{VX}} + P_{\text{control}}} \tag{1}$$

where  $P_{VX}$  is power dissipation at the output inverter caused by overshoot and undershoot at  $V_X$  from  $V_{DD}$  and ground potential due to inductance current, and  $P_{control}$  is power dissipation of control circuits. Fig. 4 shows simulated waveforms at  $V_X$ . As shown in the figure, inappropriate L increases  $P_{VX}$ . Its analytical model can be derived from an equivalent LCRcircuit in Fig. 5 with the following two assumptions.

- 1) Duty ratio D is assumed to be 0.5 for calculation simplicity.
- 2) Damping factor of the low-pass filter is assumed to be one for fast and stable response

$$\xi = \frac{R}{2}\sqrt{\frac{C}{L}} = 1. \tag{2}$$

After the conventional manipulation of differential equations of the equivalent circuit,  $P_{VX}$  is given as follows (see Appendix A for the detailed derivation):

$$P_{\rm VX} \approx \frac{V_{\rm DD}^2}{R} \cdot \frac{\beta^2}{24} \tag{3}$$

where

$$\beta \equiv \frac{\Delta T}{T_0}.$$
 (4)

 $T_0$  is the time constant of the filter which is related to settling time and given by

$$T_0 = \sqrt{LC}.$$
 (5)



Fig. 4. Simulated waveforms at  $V_X$ .



Fig. 5. Equivalent LCR circuit.



Fig. 6. Power dissipation dependence on scale-up factor in cascaded inverters.

The output voltage ripple when the output current is constant,  $\Delta V_{\text{out}}/V_{\text{out}}$ , can also be derived from the differential equations and expressed as follows (see Appendix A for the detailed derivation):

$$\frac{\Delta V_{\text{out}}}{V_{\text{out}}} \approx \frac{\beta^2}{16}.$$
(6)

Sudden change in output current causes the output voltage drop  $\delta V_{\text{out}}$ . Suppose all the circuits under  $V_{\text{out}}$  start to operate at once. The output current changes from zero to  $I_{\text{out}}$ . As a result, the filter C discharges, and  $V_{\text{out}}$  drops. Current is then supplied through the filter L to recover the voltage drop. This recovery time is considered to be an order of  $T_0$ . So the amount of charge of  $I_{\text{out}} \cdot T_0$  is derived from C to yield the voltage drop of  $I_{\text{out}} \cdot T_0/C$ . The output voltage drop when the output current changes  $\delta V_{\text{out}}/V_{\text{out}}$  is therefore approximately given by

$$\frac{\delta V_{\text{out}}}{V_{\text{out}}} \approx \frac{I_{\text{out}} \cdot T_0}{C \cdot V_{\text{out}}}.$$
(7)

 $P_{\rm control}$ , on the other hand, is written as

$$P_{\text{control}} = \alpha_c N_{\text{max}} f C_c V_{\text{DD}}^2 + f C_{\text{buffer}} V_{\text{DD}}^2 + \alpha_{\text{replica}} f_{\text{ext}} C_{\text{replica}} V_{\text{out}}^2$$
(8)

where

$$f = \frac{1}{\Delta T}.$$
 (9)

The first term is power dissipation of the duty control circuits where operating frequency is  $N_{\text{max}} \cdot f$ .  $N_{\text{max}}$  is the output voltage resolution which is 64 in this design. The second term is power dissipation of the buffer circuit in the buck converter, and the third term is power dissipation of the replica circuits in the speed detector. In each term,  $\alpha$  is switching probability and C is capacitance.

Since most of the layout pattern is occupied by the large inverter and the buffer circuits, pattern area can be expressed as

$$S = \frac{S_1}{R} + S_2 \tag{10}$$

where  $S_1$  and  $S_2$  are constants.

From these equations, the smaller  $\beta$  is, the smaller  $P_{VX}$  is and the smaller the output voltage ripple. On the other hand, for the smaller settling time, the smaller  $T_0$  is preferable. Therefore,  $\Delta T$  should be reduced, which in turn increases  $P_{\text{control}}$ . In this way there are tradeoffs among these parameters.

For example, under the following constraints:

Output voltage:  $V_{out} = 2.1 V;$ 

Output current:  $I_{out} = 67 \text{ mA} (P_{out} = 140 \text{ mW});$ 

Output voltage ripple when output current is constant:  $\Delta V_{\rm out}/V_{\rm out} < 0.1\%$ ;

Output voltage drop when output current changes:  $\delta V_{\text{out}}/V_{\text{out}} < 2\%$ ;

Filter time constant (related to settling time):  $T_0 < 100 \ \mu$ s; Pattern area:  $S < 500 \ \mu$ m-square;

dc-dc efficiency:  $\eta = maximum$ .

L, C, R, and f can be numerically solved as follows.

Low-pass filter inductance:  $L = 8 \ \mu \text{H}$ ;

Low-pass filter capacitance:  $C = 32 \ \mu\text{F}$ ;

Output inverter effective resistance:  $R = 1 \Omega$ ;

Output inverter switching frequency: f = 1 MHz.

For the equivalent  $R = 1 \Omega$  in the output inverter, transistor size of the pMOS and the nMOS is as large as 7.6 mm and 3.8 mm, respectively. Cascaded inverters are necessary to drive the output inverter with a typical inverter whose pMOS and nMOS transistor size is about 8  $\mu$ m and 4  $\mu$ m, respectively. When transistor size ratio of the final stage  $(W_n)$  to the first stage  $(W_0)$  of the cascaded inverters is given, the optimum scale-up factor x and the optimum number of stages n to minimize the power dissipation are given by (see Appendix B for the detailed derivation)

$$x = 1 + \sqrt{1 + K} \tag{11}$$

$$n = \frac{\log\left(\frac{W_n}{W_0}\right)}{\log x} \tag{12}$$

where K is the ratio of power dissipation due to capacitance charging and discharging to power dissipation due to crowbar current when x = 1. From simulation study depicted in Fig. 6, the above equations hold very accurately with K = 8. The optimum scale-up factor x becomes four, and the optimum number of stages, n becomes five in this design.

#### B. Speed Detector

A circuit schematic of the speed detector is shown in Fig. 7(a). It has three paths under  $V_{\text{DDL}}$ : 1) a critical path replica of the chip "CPR," 2) the same critical path replica with inverter gates equivalent to 3% additional delay "CPR+," and 3) direct connection between flip-flops "REF." Since the direct connection can always transmit the test data correctly within the cycle time of  $f_{\text{ext}}$  even in low  $V_{\text{DDL}}$ , it can be referred to as a correct data. Other paths may output wrong data when the delay time becomes longer than the cycle time of the given  $f_{\text{ext}}$  at the given  $V_{\text{DDL}}$ . By comparing the outputs of these paths with that of the direct connection, it can be deduced whether or not the chip operates correctly in  $f_{\text{ext}}$  at  $V_{\text{DDL}}$ . When  $V_{\text{DDL}}$  is not high enough, the outputs of the two paths "CPR" and "CPR+" are both wrong, and the speed detector outputs +1 to raise  $V_{DDL}$ . When  $V_{DDL}$  is higher, equivalent to more than 3% delay in the critical path than the given  $f_{\text{ext}}$ , the outputs of the two paths are both correct, and the speed detector outputs -1 to lower  $V_{\text{DDL}}$ . When  $V_{\text{DDL}}$  is in between, the output of the critical path "CPR" is correct and that of the longer path "CPR+" is wrong, and the speed detector outputs 0 to maintain  $V_{DDL}$ . This nondetecting voltage gap is necessary to stabilize  $V_{\text{DDL}}$  but yields an offset error. The offset error should be minimized but no smaller than the minimum resolution of the  $V_{\text{DDL}}$ . This is because if the gap is smaller than the resolution, no  $V_{\text{DDL}}$  level may exist in the voltage gap. This may cause the output voltage ripple as large as the resolution. The 3% additional delay corresponds to 80 mV in  $V_{\text{DDL}}$ , which is larger than the resolution of 50 mV. In total, V<sub>DDL</sub> may have 130 mV offset error.

A timing chart of the speed detector is illustrated in Fig. 7(b). The test data in this figure is an example of the critical path becoming critical in propagating a low-to-high signal. The test is performed every eight clock cycles. The other seven clock cycles are necessary in low  $V_{\rm DDL}$  for not evaluating test data provided before.  $V_{\rm DDL}$  can be set at very low voltages where the propagation delay becomes eight multiples of the cycle time of  $f_{\rm ext}$ . This mislocking, however, can be avoided by setting the lower limit of  $V_{\rm DDL}$  in the timing controller. The compared results are registered by flip-flops which are held by a hold signal as shown in Fig. 7(a) until the next evaluation.

Since the critical path replicas operate at  $V_{\text{DDL}}$ , the signals need to be level-shifted to  $V_{\text{DD}}$ . A sense-amplifier flip-flop [17] is employed to perform level-shifting and registering simultaneously.



Fig. 7. Speed detector: (a) circuit schematic and (b) timing chart.

## C. Timing Controller

A timing controller adjusts the control frequency of  $N, f_N$ , to realize fast and stable response of the feedback control. The higher the  $f_N$ , the faster the response but the lower the stability. Conventional stability analysis and compensation techniques, however, are rather difficult to apply for several reasons. In the speed detector, circuit speed is a nonlinear function of  $V_{\text{DDL}}$ . Its output is +1 or -1 regardless of the magnitude of the error in  $V_{\text{DDL}}$ . Most of the control is performed in digital while the low-pass filter is analog. With these difficulties, a programmable counter is introduced as a practical way to control  $f_{\rm N}$ . Based upon experimental evaluation, the optimum  $f_N$  can be found and set to the programmable counter.

Fig. 8 depicts simulation results of  $V_{\text{DDL}}$  after power-on. When  $f_{\rm N}$  is 1 MHz, much faster than the roll-off frequency of the low-pass filter, 10 kHz, oscillation appears in  $V_{\text{DDL}}$ . When  $f_{\rm N}$  is 62.5 kHz, on the other hand, the response of  $V_{\rm DDL}$  is fast and stable.  $V_{\rm DDL}$  can reach the target voltage in 100  $\mu s$ after power-on.

#### **IV. EXPERIMENTAL RESULTS**

A 32-b RISC core processor R3900 is implemented by about 440 k transistors, including a 32-b multiply/accumulate (MAC) unit, a 4-kB direct mapped instruction cache, and a 1-kB two-way set-associative data cache [16]. Layout is slightly modified for the VS scheme and the VTCMOS. A VS macro and a VT macro are added at the corners of the chip. Many of the substrate contacts are removed [12] and the rest are connected to the VT macro. The chip is fabricated in a 0.4-µm CMOS n-well/p-sub double-metal technology. A chip micrograph appears in Fig. 9. Main features are summarized in Table I. The VS and the VT macros occupies  $0.45 \times 0.59 \text{ mm}^2$ 



Fig. 8. Simulated  $V_{DDL}$  response after power-on.



Fig. 9. Chip micrograph.

TABLE I FEATURES

| 0.4 $\mu$ m CMOS, n-well/p-sub,                               |
|---------------------------------------------------------------|
| double-metal, $V_{\rm TH} = 0.05 \text{ V} \pm 0.1 \text{ V}$ |
| $0.2 \text{ V} \pm 0.05 \text{ V}$                            |
| $3.3 \text{ V} \pm 10\%$                                      |
| $0.8~\mathrm{V}\sim2.9~\mathrm{V}\pm5\%$                      |
| 140 mW @ 40 MHz                                               |
| $8.0 \times 8.0 \text{ mm}^2$                                 |
| $0.45 \times 0.59 \text{ mm}^2$                               |
| $0.49 \times 0.72 \text{ mm}^2$                               |
|                                                               |

and  $0.49 \times 0.72 \text{ mm}^2$ , respectively. The total area penalty of the two macros is less than 1% of the chip size.

Fig. 10 is a shmoo plot of the RISC processor. The RISC core operates at 40 MHz at 1.9 V, and at 10 MHz at 1.3 V. In this figure, measured  $V_{\text{DDL}}$  versus  $f_{\text{ext}}$  are also plotted. The VS scheme can generate the minimum  $V_{\text{DDL}}$  of the voltages where the circuit can operate at  $f_{\text{ext}}$ . Practically failfree operation should be guaranteed. The VS scheme should be designed such that  $V_{DDL}$  is controlled to sit sufficiently inside of the pass region in the shmoo plot by adding supplementary gates to the critical path replicas.



Fig. 10. Shmoo plot of R3900 and measured  $V_{\rm DDL}$ .



Fig. 11. Measured power dissipation versus operating frequency.

Fig. 11 shows a measured power dissipation of the RISC core without I/O. White circles and black squares in this figure represent power dissipation at 3.3 V and  $V_{DDL}$  determined by the VS scheme, respectively. The VS scheme can reduce power by an amount larger than that which can be achieved by reducing clock frequency. The power dissipation at  $f_{\text{ext}} = 0$ in the VS scheme is about 20 mW, which comes from the dc-dc converter. This power loss is mainly due to circuits for experimental purposes and can be reduced to lower than 10 mW. The dc-dc efficiency  $\eta$  is measured and plotted in Fig. 12. The left side of the peak is degraded by the power dissipation in dc-dc itself, while the right side of the peak is degraded by parasitic resistance. Due to the power dissipation of the experimental circuits and due to high contact resistance of about 6  $\Omega$  in a probe card, the maximum efficiency is lower than anticipated. If the experimental circuits are removed and the chip is bond-wired in a package, the maximum efficiency is estimated to be higher than 85%.

Measure performance in MIPS/W are 320 MIPS/W at 33 MHz, and 480 MIPS/W at 20 MHz, which are improved by a factor of more than two compared with that of the previous design, 150 MIPS/W [16].

Fig. 13 shows measured  $V_{\text{DDL}}$  voltage regulated by the VS scheme when  $V_{\text{DD}}$  is varied by about 50%. The robustness



Fig. 12. Measured dc-dc efficiency.



Fig. 13. Measured  $V_{\rm DDL}$  versus  $V_{\rm DD}$ .

to the supply-voltage change is clearly demonstrated.  $V_{\rm DDL}$  is regulated at a target voltage as long as  $V_{\rm DD}$  is higher than the target.

#### V. CONCLUSION

The VS scheme is presented and examined. The VS scheme can minimize internal supply voltages automatically according to its operating frequency and reduces voltage fluctuations. The 300 MIPS/W RISC core processor with the VS scheme in the VTCMOS has been fabricated. Performance in MIPS/W has been improved by a factor of more than two compared with the previous design just by adding the VS macro and the VT macro. Area penalty is smaller than 1%.

# APPENDIX A DERIVATION OF POWER EQUATIONS FROM EQUIVALENT LCR CIRCUIT

Differential equations of the equivalent circuit in Fig. 5 are given by

$$\left(LC\frac{d^2}{dt^2} + CR\frac{d}{dt} + 1\right) \cdot V_{\text{out}}^{(1)}(t) = V_{\text{DD}} \tag{A1}$$

$$I_L^{(1)}(t) = C \frac{dV_{\text{out}}^{(1)}}{dt}$$
 (A2)

when  $0 \le t \le D\Delta T$  (pMOS is on), and

$$\left(LC\frac{d^2}{dt^2} + CR\frac{d}{dt} + 1\right) \cdot V_{\text{out}}^{(2)}(t) = 0 \tag{A3}$$
$$I_L^{(2)}(t) = C\frac{dV_{\text{out}}^{(2)}}{dt} \tag{A4}$$

when  $D\Delta T \leq t \leq \Delta T$  (nMOS is on).

With the following boundary conditions:

$$V_{\text{out}}^{(1)}(t = D\Delta T) = V_{\text{out}}^{(2)}(t = D\Delta T)$$
(A5)  
$$V_{\text{out}}^{(1)}(t = D\Delta T) = V_{\text{out}}^{(2)}(t = D\Delta T)$$
(A5)

$$I_L^{(1)}(t = D\Delta T) = I_L^{(2)}(t = D\Delta T)$$
(A6)

$$V_{\text{out}}^{(2)}(t = \Delta T) = V_{\text{out}}^{(1)}(t = 0)$$
(A7)

$$I_L^{(2)}(t = \Delta T) = I_L^{(1)}(t = 0)$$
(A8)

and the following two assumptions discussed in Section III-A:

$$\xi = \frac{R}{2}\sqrt{\frac{C}{L}} = 1 \tag{A9}$$
$$D = 0.5 \tag{A10}$$

the differential equations can be solved as follows:

$$V_{\text{out}}^{(1)}(t) = V_{\text{DD}} \left\{ 1 - \left( P + Q \cdot \frac{t}{T_0} \right) e^{-\frac{t}{T_0}} \right\}$$
(A11)  
$$V_{\text{out}}^{(2)}(t) = V_{\text{DD}} \left\{ D + Q \left( \frac{t}{T_0} - \frac{\beta}{T_0} \right) - \left( \frac{t}{T_0} - \frac{\beta}{T_0} \right) \right\}$$
(A11)

$$V_{\text{out}}^{(2)}(t) = V_{\text{DD}} \left\{ P + Q \left( \frac{t}{T_0} - \frac{p}{2} \right) \right\} e^{-\left( \frac{t}{T_0} - \frac{p}{2} \right)}$$
(A12)

where

$$P \equiv \frac{\left(1 - \frac{\beta}{2}\right)e^{-\frac{\beta}{2}} + 1}{\left(1 + e^{-\frac{\beta}{2}}\right)^2}$$
(A13)

$$Q \equiv \frac{1}{1 + e^{-\frac{\beta}{2}}} \tag{A14}$$

$$T_0 \equiv \sqrt{LC} \tag{A15}$$

$$\rho = \Delta T \tag{A16}$$

$$\beta \equiv \frac{-1}{T_0}.$$
 (A16)

 $V_{\rm X}$  is therefore given by

$$V_X^{(1)}(t) = V_{\text{DD}} - R \cdot I^{(1)}(t)$$
  
=  $V_{\text{DD}} \left[ 1 - 2 \left\{ P - \left( 1 - \frac{t}{T_0} \right) Q \right\} e^{-\frac{t}{T_0}} \right]$  (A17)  
 $V_X^{(2)}(t) = -R \cdot I^{(2)}(t)$ 

$$=2V_{\rm DD}\left[P-\left\{1-\left(\frac{t}{T_0}-\frac{\beta}{2}\right)\right\}Q\right]e^{-\left(\frac{t}{T_0}-\frac{\beta}{2}\right)}.$$
(A18)

Since  $P_{VX}$  is power dissipation at the output inverter caused by overshoot and undershoot at  $V_X$  from  $V_{DD}$  and ground potential, it can be calculated by

$$P_{VX} = \frac{1}{\Delta T} \cdot \left[ \int_{0}^{\Delta T} \frac{1}{R} \left\{ V_{\text{DD}} - V_{X}^{(1)}(t) \right\}^{2} dt + \int_{\Delta T}^{\Delta T} \frac{1}{R} \left\{ -V_{X}^{(2)}(t) \right\}^{2} dt \right]$$
$$= \frac{V_{\text{DD}}^{2}}{R} \cdot \frac{2}{\beta} \cdot \frac{1 - \beta e^{-\frac{\beta}{2}} - e^{-\beta}}{1 - e^{-\frac{\beta}{2}}}$$
(A19)
$$\approx \frac{V_{\text{DD}}^{2}}{R} \cdot \frac{\beta^{2}}{24} \quad (\beta \ll 1).$$
(A20)

Since we assume  $D = 0.5, V_{out}$  is  $V_{DD}/2$ . The maximum output ripple can be expressed by

$$\frac{\Delta V_{\text{out}}}{2} = \left| V_{\text{out}}(t=t_0) - \frac{V_{\text{DD}}}{2} \right| \tag{A21}$$

$$\left. \frac{dV_{\text{out}}}{V_{\text{out}}} \right|_{t=t_0} = 0.$$
(A22)

Solving these equations we obtain

$$\frac{\Delta V_{\text{out}}}{V_{\text{out}}} = 2 \left[ \frac{2 \exp\left\{-\frac{\beta e^{-\frac{\beta}{2}}}{2\left(1+e^{-\frac{\beta}{2}}\right)}\right\}}{1+e^{-\frac{\beta}{2}}} - 1 \right] \qquad (A23)$$
$$\approx \frac{\beta^2}{16}. \qquad (A24)$$

#### APPENDIX B

# POWER OPTIMIZATION OF CASCADED INVERTER STAGES

In this appendix, optimum scale-up of a chain of CMOS inverters for minimum power dissipation is discussed when transistor size ratio of the final stage to the first stage is given. This problem can be seen not only in the dc-dc converter but also in an output pad where the output transistor size is given from specifications such as drive capability.

Power dissipation in the *i*th stage can be expressed as follows:

$$P_i = Pc_i + Ps_i \tag{B1}$$

where  $Pc_i$  is power dissipation due to charging and discharging and  $Ps_i$  is power dissipation due to crowbar current. Let us assume that  $Pc_i$  is proportional to transistor width  $W_i$  and that  $Ps_i$  is proportional to  $W_i$  and inversely proportional to signal slope. Let us also assume that the signal slope is proportional to  $W_{i-1}/W_i$  because driving current is proportional to  $W_{i-1}$ and loading capacitance is proportional to  $W_i$ . Then  $Pc_i$  and  $Ps_i$  are given by

$$Pc_i = uW_i \tag{B2}$$

$$Ps_i = vW_i \cdot \frac{W_i}{W_{i-1}} \tag{B3}$$

where u and v are constants. Total power dissipation is therefore given by

$$P_{\text{total}} = \sum P_i = v(M-1) \cdot \frac{x(K+x)}{x-1} \cdot W_0 \qquad (B4)$$

where

$$x = \frac{W_i}{W_{i-1}} \qquad \text{(for all } i\text{)} \tag{B5}$$

$$M = \frac{W_n}{W_0} = x^n \tag{B6}$$

$$K = \frac{u}{v}.$$
 (B7)

When  $\partial P_{\text{total}}/\partial x = 0$  the total power dissipation becomes minimum. Then

$$x = 1 + \sqrt{1 + K} \tag{B8}$$
$$\log(\frac{W_n}{W})$$

$$n = \frac{\log(W_0)}{\log x}.$$
 (B9)

#### ACKNOWLEDGMENT

The encouragement throughout this work by M. Saitoh and Y. Unno of Toshiba Corp. are appreciated.

### REFERENCES

- A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low-power CMOS digital design," *IEEE J. Solid-State Circuits*, vol. 27, pp. 473–484, Apr. 1992.
- [2] D. Liu and C. Svensson, "Trading speed for low power by choice of supply and threshold voltages," *IEEE J. Solid-State Circuits*, vol. 28, pp. 10–17, Jan. 1993.
- [3] T. Kuroda and T. Sakurai, "Overview of low-power ULSI circuit techniques," *IEICE Trans. Electron.*, vol. E78-C, no. 4, pp. 334–344, Apr. 1995.
- [4] V. R. von Kaenel, M. D. Pardoen, E. Dijkstra, and E. A. Vittoz, "Automatic adjustment of threshold & supply voltages for minimum power consumption in CMOS digital circuits," in *Proc. ISLPE'94*, Oct. 1994, pp. 78–79.
- [5] T. Sakurai and A. R. Newton, "Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas," *IEEE J. Solid-State Circuits*, vol. 25, pp. 584–594, Apr. 1990.
- [6] T. Kuroda, T. Fujita, S. Mita, T. Nagamatu, S. Yoshioka, K. Suzuki, F. Sano, M. Norishima, M. Murota, M. Kako, M. Kinugawa, M. Kakumu, and T. Sakurai, "A 0.9 V 150 MHz 10 mW 4 mm<sup>2</sup> 2-D discrete cosine transform core processor with variable-threshold-voltage scheme," *IEEE J. Solid-State Circuits*, vol. 31, pp. 1770–1779, Nov. 1996.
  [7] S.-W. Sun and P. G. Y. Tsui, "Limitation of CMOS supply-voltage
- [7] S.-W. Sun and P. G. Y. Tsui, "Limitation of CMOS supply-voltage scaling by MOSFET threshold-voltage variation," *IEEE J. Solid-State Circuits*, vol. 30, pp. 947–949, Aug. 1995.
  [8] T. Kuroda and T. Sakurai, "Threshold-voltage control schemes through
- [8] T. Kuroda and T. Sakurai, "Threshold-voltage control schemes through substrate-bias for low-power high-speed CMOS LSI design," J. VLSI Signal Processing Syst., vol. 13, no. 2/3, pp. 191–201, Aug./Sept. 1996.
- T. Kobayashi and T. Sakurai, "Self-adjusting threshold-voltage scheme (SATS) for low-voltage high-speed operation," in *Proc. CICC'94*, May 1994, pp. 271–274.
- [10] K. Seta, H. Hara, T. Kuroda, M. Kakumu, and T. Sakurai, "50% active-power saving without speed degradation using standby power reduction (SPR) circuit," in *ISSCC Dig. Tech. Papers*, Feb. 1995, pp. 318–319.
- [11] T. Kuroda, T. Fujita, T. Nagamatu, S. Yoshioka, T. Sei, K. Matsuo, Y. Hamura, T. Mori, M. Murota, M. Kakumu, and T. Sakurai, "A high-speed low-power 0.3 μm CMOS gate array with variable threshold voltage (VT) scheme," in *Proc. CICC'96*, May 1996, pp. 53–56.
- [12] T. Kuroda, T. Fujita, S. Mita, T. Mori, K. Matsuo, M. Kakumu, and T. Sakurai, "Substrate noise influence on circuit performance in variable threshold-voltage scheme," in *Proc. ISLPED'96*, Aug. 1996, pp. 309–312.
- [13] P. Macken, M. Degrauwe, M. van Paemel, and H. Oguey, "A voltage reduction technique for digital systems," in *ISSCC Dig. Tech. Papers*, Feb. 1990, pp. 238–239.
- [14] A. J. Stratakos, S. R. Sanders, and R. W. Brodersen, "A low-voltage CMOS dc-dc converter for a portable battery-operated system," in *Proc. IEEE Power Electronics Specialists Conf.*, June 1994, vol. 1, pp. 619–626.
- [15] K. Suzuki, S. Mita, T. Fujita, F. Yamane, F. Sano, and T. Sakurai, A. Chiba, Y. Watanabe, K. Matsuda, T. Maeda, and T. Kuroda, "A 300 MIPS/W RISC core processor with variable supply-voltage scheme in variable threshold-voltage CMOS," in *Proc. CICC'97*, May 1997, pp. 587–590.
- [16] M. Nagamatsu, H. Tago, T. Miyamori, M. Kamata, H. Murakami, Y. Ootaguro *et al.*, "A 150 MIPS/W CMOS RISC processor for PDA applications," in *ISSCC Dig. Tech. Papers*, Feb. 1995, pp. 114–115.
- [17] M. Matsui, H. Hara, K. Šeta, Y. Uetani, L.-S. Kim, T. Nagamatsu, T. Shimazawa, S. Mita, G. Otomo, T. Ohto, Y. Watanabe, F. Sano, A. Chiba, K. Matsuda, and T. Sakurai, "200 MHz video compression macrocells using low-swing differential logic," in *ISSCC Dig. Tech. Papers*, Feb. 1994, pp. 76–77.



**Tadahiro Kuroda** (M'88) received the B.S. degree in electronic engineering from the University of Tokyo, Tokyo, Japan, in 1982.

In 1982, he joined Toshiba Corporation, Japan, where he was engaged in the development of CMOS design rules, CMOS gate arrays, and CMOS standard cells. From 1988 to 1990, he was a visiting scholar at the University of California, Berkeley, doing research in the field of VLSI CAD. In 1990, he was back in Toshiba and involved in the development of BiCMOS ASIC's and ECL gate arrays.

In 1993, he joined the Semiconductor Device Engineering Laboratory in Toshiba where he was engaged in the research and development of highspeed circuits for telecommunication. Since 1996, he has been responsible for the research and development of multimedia LSI's including media processors and video compression/decompression LSI's in the System ULSI Engineering Laboratory in Toshiba. His research interests include high-speed, low-power, low-voltage circuit design techniques. He is a Visiting Lecturer at the University of Tokyo.

Mr. Kuroda is serving as a program committee member for the CICC, the ISLPED, and the Symposium on VLSI Circuits. He is a member of the IEICEJ.



Kojiro Suzuki received the B.S. and M.S. degrees in electronic engineering and Ph.D. degree in superconductivity from University of Tokyo, Tokyo, Japan, in 1990, 1992, and 1995, respectively. His Ph.D. work was on design and fabrication of a high-sensitivity SQUID with Nb/AIOx/Nb Josephson junctions.

In 1995 he joined Toshiba Corporation, Kawasaki, Japan, where he was engaged in the research and development of high-speed and lowpower CMOS digital circuits. His current interests

include low-voltage CMOS circuit design and supply-voltage regulators.



Shinji Mita was born in Aichi, Japan, on March 18, 1970. He received the B.S. degree in electrical engineering from the University of Kyushu, Fukuoka, Japan, in 1992.

In 1992 he joined Toshiba Corporation, Kawasaki, Japan. Since 1992 he has been with Semiconductor Device Engineering Laboratory at Toshiba, where he has been involved in the research and development of multimedia LSI's. His current interests include high-speed, low-power, low-voltage techniques in CMOS.



**Tetsuya Fujita** was born in Tokyo, Japan, on August 30, 1963. He received the B.S. degree in electronic engineering from Hosei University, Tokyo, Japan, in 1986.

In 1986 he joined Toshiba Corporation, Kawasaki, Japan, where he was engaged in the establishment of CMOS and ECL gate array libraries. Since 1996 he has been with System ULSI Engineering Laboratory at Toshiba, where he has been involved in the research and development of communication LSI's. His current interests

include low-power low-voltage techniques in CMOS.



Fumiyuki Yamane was born in Hiroshima, Japan, in 1972. He received the B.S. degree from Chuo University, Tokyo, Japan, in 1995.

In 1995, he joined the System ULSI Engineering Laboratory, Toshiba Corporation, Kawasaki, Japan. He has been engaged in research and development of circuit technology. He is currently engaged in circuit design, research, and development of highperformance CMOS cache SRAM's for a microprocessor.



Takeo Maeda was born in Tokyo, Japan, on December 26, 1957. He received the B.S. and M.S. degrees in electronic engineering from Waseda University, Tokyo, Japan, in 1981 and 1983, respectively.

He joined the Semiconductor Device Engineering Laboratory, Toshiba Corporation, Kawasaki, Japan, in 1983, where he was engaged in the research and development of process and device technologies for CMOS and BiCMOS static memories. Since 1992, he was engaged in the Micro and Custom LSI Division and now he is the Group Manager of the Custom LSI Product Engineering Department.

Mr. Maeda is a member of the Japan Society of Applied Physics and the Institute of Electronics and Communication Engineers of Japan.



Fumihiko Sano was born in Shiga, Japan, on March 18, 1967. He received the B.S. degree in electrical engineering from Fukui National College of Technology, Japan, in 1988.

He joined Toshiba Microelectronics Corporation, Kawasaki, Japan. He then joined Toshiba's Microelectronics Engineering Laboratory, Kawasaki, Japan, where he has been engaged in the research and development of BiCMOS macrocells for highperformance ASIC's. He has also been engaged in the research and development of VLD macrocells

implemented in MPEG-2-decoder LSI, MPEG-4-codec LSI, and VT-CMOS circuit technology.



Akihiko Chiba was born in Hokkaidou, Japan, on May 19, 1967. He received the B.S. degree in mechanics from Hachinohe Institute of Technology, Japan, in 1990.

He joined Toshiba Microelectronics Corporation, Kawasaki, Japan. He then joined Toshiba's Semiconductor Device Engineering Laboratory, Kawasaki, Japan, where he has been engaged in the research and development of BiCMOS ASIC and DCT macrocells for video compression/decompression LSI's and LSI testing technology.



Yoshinori Watanabe was born in Mie, Japan, on May 11, 1961. He received the B.S. degree in electronic control engineering from Tokai University, Japan, in 1984.

He joined Toshiba Microelectronics Corporation, Kawasaki, Japan. He has been engaged in the research and development of BiCMOS ASIC and BiCMOS memory macrocells and DCT macrocells for video compression/decompression LSI's. He has currently developed an MPEG-4 codec LSI.



Koji Matsuda was born in Tokyo, Japan, on December 20, 1959. He received the B.S. degree in computer science from Shonan Institute of Technology, Kanagawa, Japan, in 1982.

In 1982 he joined the Toshiba Microelectronics Corporation, Kawasaki, Japan, where he has been engaged in VLSI testing research and development of BiCMOS ASIC and DCT macrocells for video compression/decompression LSI's. He has currently developed an MPEG-4 codec LSI.



Takayasu Sakurai (S'77-M'78) received the B.S., M.S., and Ph.D. degrees in electronic engineering from University of Tokyo, Tokyo, Japan, in 1976, 1978, and 1981, respectively. His Ph.D work was on electronic structures of an Si-Si02 interface.

In 1981 he joined the Semiconductor Device Engineering Laboratory, Toshiba Corporation, Japan, where he was engaged in the research and development of CMOS dynamic RAM and 64-Kb, 256-Kb SRAM, 1-Mb virtual SRAM, cache memories, and BiCMOS ASIC's. During the development, he

also worked on the modeling of interconnect capacitance and delay, new memory architectures, hot-carrier resistant circuits, arbiter optimization, gatelevel delay modeling, alpha/nth power MOS model, and transistor network synthesis. From 1988 through 1990, he was a Visiting Scholar at the University of California, Berkeley, doing research in the field of VLSI CAD. Back at Toshiba in 1990 he managed multimedia LSI development including media processors and video compression/decompression LSI's. Since 1996, he has been a Professor at the Institute of Industrial Science, University of Tokyo, working on low-power and high-performance LSI designs.

Dr. Sakurai has been serving as a program committee member for CICC, DAC, ICCAD, ICVC, ISLPED, ASP-DAC, TAU, CSW, VLSI, and FPGA Workshops. He is a technical committee chairperson for the VLSI Circuits Symposium. He is a member of the IEICEJ and the Japan Society of Applied Physics.



Tohru Furuyama (S'83-M'84) received the B.S. degree in physics from the University of Tokyo, Tokyo, Japan, in 1975, the M.S. degree in electrical engineering from Cornell University, Ithaca, NY, in 1984, and the Ph.D. degree in information science from the University of Tokyo in 1988.

In 1975, he joined the Research and Development Center, Toshiba Corporation, Kawasaki, Japan. His research interests included MOS memory design and MOS device physics. In 1979, he joined the Semiconductor Device Engineering Laboratory (SDEL),

Toshiba Corporation, where he was engaged in the advanced DRAM circuit design. From January 1983 to July 1984, he was at Cornell University where he carried out the research on a novel DRAM cell structure and the limitation of the device scaling. After returning to SDEL in 1984, he was responsible for 4-Mb commodity DRAM development, the Rambus DRAM project, and the merged DRAM/logic project for graphics application. While at SEDL, he also lead the research on new circuit technologies including sense amplifiers, the studies on reliability issues, such as  $\alpha$ -particle induced soft errors, hot carrier related problems, and wafer-level burn-in technologies. In August 1994, he moved to Burlington, VT, to join the Toshiba/IBM/Siemens trilateral joint development project as a 64 M DRAM Design Manager. Since July 1996 he has been a Senior Manager at the System LSI Engineering Laboratory, Toshiba Corporation, where he has been in charge of the development of advanced CMOS circuit technologies, for example, low-power technologies, as well as various multimedia-oriented LSI's.