# ISSCC95 / SESSION 19 / TECHNOLOGY DIRECTIONS: Quantum Computing & Low-Power Digital

# FP 19.4: 50% Active-Power Saving without Speed Degradation using Standby Power Reduction (SPR) Circuit

Katsuhiro Seta, Hiroyuki Hara, Tadahiro Kuroda, Masakazu Kakumu, Takayasu Sakurai

### Toshiba Corporation, Kanagawa, Japan

High-speed and low-power are required for multimedia LSIs, since portability with battery operation is sometimes the key factor for multimedia equipment, while delivering giga operations per second (GOPS) processing power for digital video use [1]. To understand circuit delay and power dissipation dependence on power supply voltage ( $V_{DD}$ ) and threshold voltage of MOSFETs ( $V_{TH}$ ), a typical logic circuit shown in Figure 1 is investigated. Fanout is chosen to be 5 which corresponds to the statistical average of gate load in ASICs. Figure 2 shows a simulated delay dependence on  $V_{DD}$  and  $V_{TH}$ . The same  $V_{TH}$  is reduced to 0.3V,  $V_{DD}$  can be decreased down to 2V while maintaining the speed at  $V_{TH} = 0.7V$  and  $V_{DD} = 3V$  which is the typical operation condition for high-speed LSIs. The active power dissipation, in this case, is reduced by more than 50%.

The energy-delay (ED) product is plotted as a function of  $V_{DD}$  and  $V_{TH}$  in Figure 3. Minimizing the ED product is a good approach for optimizing LSIs for portable use, since the ED product reflects the battery consumption (E) for completing a job in a certain time [2]. The ED product is also minimized at about 0.3V  $V_{TH}$  when  $V_{DD}$  is 2V.

The only drawback of choosing  $0.3V\,V_{\rm TH}$  is the increase in standby power dissipation. If this standby power problem can be solved, high-speed and low power operation is achieved just by lowering  $V_{\rm DD}$  and  $V_{\rm TH}$  at the same time. This paper presents a solution to this problem of standby power increase in the low- $V_{\rm TH}$  region. The main idea of this standby power reduction (SPR) scheme is that a substrate bias is applied in a standby mode to increase the threshold voltage and to lower the subthreshold leak current. While in an active mode, the substrate bias is not applied, assuring high-speed operation. According to measured  $I_{\rm DS}\cdot V_{\rm GS}$  characteristics for a  $0.3\mu$ m nMOS transistor, the threshold voltage and be increased by 0.4V by applying a substrate voltage of -2V. This means that if the substrate bias of -2V is applied in a standby mode, the threshold voltage is increased from 0.3V to 0.7V and thus realizes the same standby current as the  $0.7V\,V_{\rm TH}$  LSI.

Figures 4 and 5 show a circuit diagram of the proposed SPR circuit and simulated waveforms of the circuit. The circuit consists of a level-shifting part and a voltage-switch part. When CE (Chip Enable) is asserted in an active mode, the n-well bias,  $V_{\text{NWELL}}$ , becomes equal to  $V_{\text{DD}}$  which is set at 2V in the test chip design. The p-well bias,  $V_{\text{pWELL}}$  becomes  $V_{\text{SS}}$ . When CE is disabled in standby mode,  $V_{\text{nWELL}}$  becomes  $V_{\text{SS}}$ . When CE is disabled in standby mode,  $V_{\text{nWELL}}$  becomes  $V_{\text{SS}}$ . When CE is disabled in standby mode transition take about 50 ns. The power dissipation of this SPR circuit in the standby mode is 0.1µA, the dominating factor of which is the current through M4 and M5 This current can be reduced by one order of magnitude further if the transition time can be slower.  $V_{\text{NBB}}, V_{\text{PB}}, V_{\text{DD}}$  and  $V_{\text{SS}}$  are applied from the external source but the power supplies connected to  $V_{\text{NBB}}$  and  $V_{\text{PBB}}$  only need to supply 0.1µA or less. Although the SPR scheme can be realized together with a self sub-bias circuit, the response becomes more than a  $\mu$ s order [3]. The diodes in the circuit are built using a junction-well structure through which current flows only in active mode.

In designing the circuit, care is taken so that no transistor sees high-voltage stress of gate oxide and junctions.  $V_{GB}$ - $V_{GD}$  trajectories of MOSFETs used in the SPR circuit do not go beyond  $\pm(V_{DD}+\alpha)$ , which assures sufficient reliability of gate oxide. On the other hand,  $V_{SB}$  (source-bulk voltage) -  $V_{DB}$  (drain-bulk voltage) trajectories of MOSFETs in the SPR circuit do not go beyond  $\pm(V_{DD}+V_{BLS})$ , where  $V_{BLAS}$  signifies the larger voltage of  $|V_{NBB} - V_{DD}|$  and  $|V_{SB} - V_{PBB}|$ . This voltage is applied to junctions but the breakdown voltage of junctions of 0.3 $\mu$ m MOSFETs is more than 9V and hencejunction breakdown does not occur for any MOSFETs.

Figure 6 shows a micrograph of the test chip. A ring oscillator constructed with 49 stages of 2-input NAND gates and the SPR circuit are implemented using 0.3µm process technology. The SPR circuit occupies 2500µm<sup>2</sup> for either n-well or p-well bias circuit. In cases where nMOS circuit determines the speed as in nMOS pass transistor logic environments, only  $V_{TH}$  for nMOS should be lowered and hence only p-well bias circuit is needed. If both of the n-well and p-well bias circuits are required as in Figure 4,  $5000 \mu m^2$ Si area is occupied and a triple-well technology is to be used. The standby current of less than 0.1µA is measured on the test chip when the chip enable (CE) is disabled. If the CE is asserted in the active mode, the standby current is measured larger by three orders. The speed of the 2-input NAND gate of 300ps is achieved at  $V_{DD} = 2.0V$ . Setting time is less than 100ns. The proposed SPR scheme is fully compatible with the existing CAD tools including automatic placement and routers. As for the standard cell library, the cells should be modified to separate substrate bias lines and power supply lines. The area overhead to the total chip, however, is estimated to be less than 5%. The substrate bias lines can be as narrow as possible and can be scaled.

#### Acknowledgments

Valuable discussions and constant encouragement by H. Shibata, K. Maeguchi, and Y. Unno are appreciated.

#### References

[1] Matsui, M., et al., "200MHz Video Compression / Decompression Macrocells Using Low-Swing Differential Logic," ISSCC Digest of Technical Papers, pp. 76-77, Feb., 1994.

[2] Burr, J.B., et al., "A 200mV Self-Testing Encoder / Decoder Using Stanford Ultra Low Power CMOS," ISSCC Digest of Technical Papers, pp. 84-85, Feb., 1994.

[3] Kobayashi, T., et al., "Self-Adjusting Threshold-Voltage Scheme (SATS) for Low-Voltage High-Speed Operation," Proc. IEEE CICC'94, pp. 271-271, May, 1994.

## 1995 IEEE International Solid-State Circuits Conference

318

0-7803-2495-1 / 95 / \$4.00 / © 1995 IEEE



Figure 1: Typical logic circuit used to calculate delay and power vs. supply voltage (VDD) and threshold voltage (VTH).



Figure 3: Simulated energy-delay product vs. VDD and VTH.







Figure 2: Simulated delay vs. VDD and VTH by SPICE.



Figure 5: Simulated waveforms of the SPR circuit.



Figure 6: Micrograph of test chip.