Challenges in VLSI Design
Toward the New Millennium

Takayasu Sakurai
Prof. at Center for Collaborative Research, and
Institute of Industrial Science,
University of Tokyo
E-mail:tsakurai@iis.u-tokyo.ac.jp

1. Scaling and three crises
2. Power crisis
3. Interconnection crisis
4. Complexity crisis
Silicon Age

- Info processing by LSI (Si)
- Info transfer by fibers (SiO₂)

Memory
Processors
Sensors
Communicators

By Yoshio Nishimura
World-wide Semiconductor Market

(Billion$)

10,000

1,000

100

10

1980 1990 2000 2010

Semiconductor

Steel

GNP

Electric

Automobile

by Starc

T. Sakurai
World semiconductor market

Data: World semiconductor market statistics
System LSI for Next Generation Games

- Clock freq. 300MHz
- 10M transistors
- Graphics synthesizer integrate
  - 40M tr. With embedded DRAM
- Memory bandwidth 3.2GB/s
- Floating operation 6.2GFLOPS/sec
- 3D CG 6.6M polygon/sec
- MPEG2 decode
Applications of System LSI’s

PC & peripherals

PC
printer
game
PDA
hard disk • CDROM
display

Digital consumer
digital TV
digital camera
digital movie
car navigation
DVD • CD • MD

Communication
LAN/WAN
mobile phone
wireless network
Fax • modem

Communication / network

T.Sakurai
Conventional I-V curve at 0.04µm (Even down to 0.014µm)


Scaling law

<table>
<thead>
<tr>
<th>Transistor</th>
<th>Numbers are exponent to k ($k^n$)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Voltage</td>
<td>[V]</td>
</tr>
<tr>
<td>Tr. size</td>
<td>[x]</td>
</tr>
<tr>
<td>Oxide thickness</td>
<td>[t]</td>
</tr>
<tr>
<td>Current</td>
<td>[I~$V^{1.3}$/t]</td>
</tr>
<tr>
<td>Tr. capacitance</td>
<td>[Cg~$x^2$/t]</td>
</tr>
<tr>
<td>Tr. delay</td>
<td>[Tg~$CgV/I$]</td>
</tr>
<tr>
<td>Tr. power</td>
<td>[Pg~$CgV^2/Tg$]</td>
</tr>
<tr>
<td>Tr. power density</td>
<td>[p~$Rg/x^2$]</td>
</tr>
<tr>
<td>Tr. desity</td>
<td>[n~ $1/x^2$]</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Interconnection</th>
<th>Local</th>
<th>Middle</th>
<th>Global</th>
<th>VDD/VSS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Length</td>
<td>[L]</td>
<td>-1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>Width</td>
<td>[W]</td>
<td>-1</td>
<td>0.5</td>
<td>0</td>
</tr>
<tr>
<td>Thickness</td>
<td>[T]</td>
<td>-1</td>
<td>0.5</td>
<td>0</td>
</tr>
<tr>
<td>Height</td>
<td>[H]</td>
<td>-1</td>
<td>0.5</td>
<td>0</td>
</tr>
<tr>
<td>Resistance</td>
<td>[Rm~L/W/T]</td>
<td>1</td>
<td>0.5</td>
<td>0.5</td>
</tr>
<tr>
<td>Capacitance</td>
<td>[Cm~LW/H]</td>
<td>-1</td>
<td>0.5</td>
<td>0.5</td>
</tr>
<tr>
<td>RC delay/Tr. delay</td>
<td>[Tm~$RmCm/Tg$]</td>
<td>1.7</td>
<td>1.7</td>
<td>1.7</td>
</tr>
<tr>
<td>Current density</td>
<td>[J~$pLW/V/W/T$]</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>Dc Noise</td>
<td>[SNdc~$JWLRm/V$]</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
</tbody>
</table>

\[ k_{ds} = \frac{\mu \varepsilon}{\tau_{ox}} \left( \frac{W}{L} \right)^{\frac{V_{gs}-V_t}{2}} \sim [V^{\alpha/t}] \]

\[ \alpha = 1.3 \]

### Scaling Law

#### Favorable effects

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Effect</th>
</tr>
</thead>
<tbody>
<tr>
<td>Size</td>
<td>x1/2</td>
</tr>
<tr>
<td>Voltage</td>
<td>x1/2</td>
</tr>
<tr>
<td>Electric Field</td>
<td>x1</td>
</tr>
<tr>
<td>Speed</td>
<td>x2</td>
</tr>
<tr>
<td>Cost</td>
<td>x1/4</td>
</tr>
</tbody>
</table>

#### Unfavorable effects

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Effect</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power</td>
<td>x1.6</td>
</tr>
<tr>
<td>RC delay/Tr. delay</td>
<td>x3.2</td>
</tr>
<tr>
<td>Current density</td>
<td>x1.6</td>
</tr>
<tr>
<td>Voltage noise</td>
<td>x3.2</td>
</tr>
<tr>
<td>Design complexity</td>
<td>x4</td>
</tr>
</tbody>
</table>

---

T. Sakurai
Three crises in VLSI designs

- Power crisis
- Interconnection crisis
- Complexity crisis
Ever Increasing VLSI Power

(Power consumption of processors published in ISSCC)

Year

Power (W)

x4 / 3 years
VDD, Power and Current Trend

Year
Voltage [V]
Power per chip [W]
Current [A]
VDD current [A]


T.Sakurai
# Necessity for Low-Power Design

<table>
<thead>
<tr>
<th>Power range</th>
<th>Concerns</th>
<th>Typical applications (All need high-perf.)</th>
</tr>
</thead>
</table>
| < 0.1W      | · Battery life                               | Portable
              · PDA
              · Communications                                   |
| ~ 1W        | · Inexpensive package limit
              · System heat (10W / box)                           | Consumer
              · Set-Top-Box
              · Audio-Visual                                       |
| > 10W       | · Ceramic package limit
              · IR drop of power lines                           | Processor
              · High-end MPU's
              · Multimedia DSP's                                   |
Trend in Computer

Price of main-stream computer

$1M

$1K

Year


Vac. tube

historical $3M

Transistor

Large scale $3M

IC

WS $30K

LSI

VLSI

PC $3000

System LSI

??? $300

DOWNSIZING is the keyword

T.Sakurai
Computer-Communication-Consumer

Cell phone
Tecketing
Reservation

PDA
Schedule
Address book

E-cashing
E-trading
E-banking

Internet
Web browse
Web TV
E-mail

Home automation
Game on net
Entertainment

Computer centric
Communication centric,
Display centric

T.Sakurai
Performance Requirements for Multimedia

<table>
<thead>
<tr>
<th>Required Performance (MOPS)</th>
</tr>
</thead>
<tbody>
<tr>
<td>10</td>
</tr>
<tr>
<td>FAX/Modem</td>
</tr>
<tr>
<td>Sound</td>
</tr>
<tr>
<td>Speech recognition</td>
</tr>
<tr>
<td>2D/3D graphics</td>
</tr>
<tr>
<td>TV conf. (H.324...)</td>
</tr>
<tr>
<td>MPEG1 decoding</td>
</tr>
<tr>
<td>MPEG1 encoding</td>
</tr>
<tr>
<td>MPEG2 decoding</td>
</tr>
<tr>
<td>MPEG2 encoding</td>
</tr>
<tr>
<td>HDTV decoding</td>
</tr>
<tr>
<td>HDTV encoding</td>
</tr>
</tbody>
</table>

Present

Future

T. Sakurai
What sets the technology trend?

- NMOS → CMOS
  Cost up
- Bipolar → CMOS
  Speed down
- Not cost nor speed but power set the technology trend.
- Integration can achieve low cost and high speed as a system.
Expression for CMOS Power

\[ P = \alpha \cdot C_L \cdot V_S \cdot V_{DD} \cdot f_{CLK} \]

\[ + \alpha \cdot I_{SC} \cdot \Delta t_{SC} \cdot V_{DD} \cdot f_{CLK} \]

\[ + I_{DC} \cdot V_{DD} \]

\[ + I_{LEAK} \cdot V_{DD} \]

Charging & discharging
Crowbar current
Static current
Subthreshold leak current

\[ \alpha : \text{Switching probability} \]
\[ C_L : \text{Load capacitance} \]
\[ V_S : \text{Signal swing} \]
\[ V_{DD} : \text{Supply voltage} \]
\[ I_{SC} : \text{Mean crowbar current} \]
\[ \Delta t_{SC} : \text{Crowbar current duration} \]
\[ f_{CLK} : \text{Clock frequency} \]
\[ I_{DC} : \text{DC current} \]
\[ I_{LEAK} : \text{Subthreshold leak current} \]

\[ Q = C_L \cdot V_S \]

\[ \text{Charge} \]

\[ \text{Discharge} \]

\[ C_L \cdot V_S \text{ amount of charge loses } V_{DD} \text{ of potential} \]

\[ \rightarrow C_L \cdot V_{DD} \cdot V_S \text{ energy consumption per cycle} \]
Voltage waveform of CMOS inverter

\[ \frac{dV_{OUT}}{dt} = -I_{DN} = -I_{DON} \left( \frac{V_{GSN} - V_{THN}}{V_{DD} - V_{THN}} \right)^\alpha_N \]

\( C_{IN} = 10[^{\text{pF}}] \)

\( C_{OUT} = F \times C_{IN} \)
Short-circuit power dissipation formula

\[ P_S = \frac{k(v_{D0P}) f_{oP}^2 C_{IN} V_{DD}^2}{v_{D0P} g(v_T, \alpha) \frac{2k(v_{D0P})}{FO \beta_r + h(v_T, \alpha) f_{oP}}} \]

\[ g(v_T, \alpha) = \frac{\alpha_N + 1}{f(\alpha)} \frac{(1-v_{TN})^{\alpha_N} (1-v_{TP})^{\alpha_p/2}}{(1-v_{TN}-v_{TP})^{\alpha_p/2+\alpha_N+2}} \]

\[ h(v_T, \alpha) = 2^{\alpha_p} (\alpha_p + 1) \frac{(1-v_{TP})^{\alpha_p}}{(1-v_{TN}-v_{TP})^{\alpha_p+1}} \]

\[ k(v_{D0P}) = \frac{0.9}{0.8} + \frac{v_{D0P}}{0.8} \ln \frac{10v_{D0P}}{e} \]

\[ FO = \frac{C_{OUT}}{C_{IN}} \quad \text{(Fanout)} \]

\[ f_{oP} = \frac{I_{D0P}}{I_{D0PIN}} \quad \beta_r = \frac{I_{D0P}}{I_{D0N}} \]

Comparison between proposed formula and other formula

- Verumu et al’s formula deviates from SPICE simulation
  - fanout > 3
  - fanout is small (diverge to infinity)

Short-circuit power [pW]

Fanout : FO

Verumu formula

This work

SPICE simulation

Tech. A

\( f=1[\text{Hz}] \)

\( C_{IN}=10[\text{pF}] \)
The change of the short-circuit power dissipation with scaling

\[ \eta_p = \frac{P_S}{(P_D + P_S)} \]

Fanout = 1

For different values of \( V_{TH}/V_{DD} \):
- \( V_{TH}/V_{DD} = 0 \)
- \( V_{TH}/V_{DD} = 0.1 \)
- \( V_{TH}/V_{DD} = 0.2 \)
- \( V_{TH}/V_{DD} = 0.3 \)
Voltage dependent gate cap. effect

$v_{TH} = 0.3\,\text{V}$

- $V_{DS} = 0\,\text{V}$ (linear)
- $V_{DS} = 1\,\text{V}$ (saturation)
- $I(C_{OX})$

$W/L = 10\,\mu\text{m}/0.4\,\mu\text{m}$
Voltage dependent gate cap. effect

Average gate current (Average $C_{gate}$)

$V_{DD}=1V$
$V_{DD}=0.5V$
$I(C_{OX})$

Delay

$V_{TH}/V_{DD}$

$V_{THout}/V_{DD}$

$FO=5$
$V_{TH}=0.2$
$V_{TH}=V_{THOUT}$

Delay

Large $C$

T.Sakurai
Power & Delay Dependence on $V_{DD}$ & $V_{TH}$

**Power:**

$$P = P_t \cdot f_{CLK} \cdot C_L \cdot V_{DD}^2 + I_0 \cdot 10^{-0.4} \cdot V_{DD}$$

**Delay:**

$$\text{Delay} = \frac{k \cdot Q}{I} = \frac{k \cdot C_L \cdot V_{DD}}{(V_{DD} - V_{th})^{\alpha}} \quad (\alpha = 1.3)$$
Lowering Only Internal VDD (Example)

3V

3V Input

Level conv. 1

DC-DC Conv.

Internal VDD 1.5V

Level conv. 2

Output 3V

DC-DC Conv.

Swing Conv. 1

Level conv. 1

Swing Conv. 2

Switching DC-DC Converter

Efficiency

\[ \frac{V_{\text{DDINT}}}{V_{\text{DDEXT}}} \leq 50\% \]

\[ \text{Efficiency} > 95\% \]

\[ V_{\text{DDINT}} \]

\[ V_{\text{DDEXT}} \]

1.5V

3V

0~1.5V

0~3V

1.5V

0~1.5V

0~3V

1.5V

3V

leak

3V

Efficiency

T.Sakurai
are added to ensure reliability

- In standby mode and in IDDQ test, substrate bias is applied to increase VTH, which reduces leakage.
- In active mode, substrate bias is not applied to lower VTH, which ensures high speed.
Self-Adjusting Threshold-voltage Scheme (SATS)

- low Vth → large leakage → SSB ON → deep VBB → high Vth
- high Vth → little leakage → SSB OFF → shallow VBB → low Vth

- control Vth to adjust leakage current
- compensate Vth fluctuation
In active mode, low-$V_{TH}$ MOSFET’s achieve high speed.
In standby mode when St'by signal is high, high-$V_{TH}$ MOSFET’s in series to normal logic circuits cut off leakage current.
## VTCMOS / MTCMOS

<table>
<thead>
<tr>
<th>Principle</th>
<th>VTCMOS</th>
<th>MTCMOS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Threshold control with sub-bias</td>
<td>On-off control of internal VDD/VSS</td>
<td></td>
</tr>
</tbody>
</table>

### Merit/ Demerit

**VTCMOS**
- Low leakage in standby
- Needs circuit development
- Compensate Vth fluctuation
- IDDQ test
- No serial MOSFET
- Conventional design tools
- Reuse of existing design
- Triple well is desirable

**MTCMOS**
- Low leakage in standby
- Conceptually easier
- Compensate Vth fluctuation
- IDDQ test
- Large serial MOSFET
- Conventional design tools
- Special F/F's
- Two $V_{TH}$'s

---

T. Sakurai
Concept of Super Cut-off CMOS (SCCMOS)

- St'by: $V_{DD} + 0.4V$
- Active: $V_{SS}$

Virtual $V_{DD}$

Low-$V_{TH}$ cut-off MOSFET

Low-$V_{TH}$ logic circuit

pMOS insertion case

Super Cut-off CMOS Scheme (SCCMOS)

0.3µm, triple-metal CMOS process
$V_{TH}=0.2V$

100x100µm$^2$
pumping
freq=10kHz
0.1µA (@$V_{DD}=0.5V$)

T.Sakurai
Delay characteristics (inverter & NAND)

SCCMOS
0.2V $V_{TH}$ circuit with 0.2V $V_{TH}$ cut-off MOSFET

MTCMOS
0.2V $V_{TH}$ circuit with 0.6V $V_{TH}$ cut-off MOSFET

Conventional
All 0.6V circuit
No cut-off MOSFET

T. Sakurai
Dynamic Leakage Cut-off

V_{NWELL} Driver

V_{PWELL} Driver

V_{WL}

V_{WL+1}

# of selected bit at a time

V_{BLm-1}

V_{BL0}

Select

Disselect

2V_D\quad V_{NWELL}

V_D\quad V_{WL}

V_{SS}

-V_D\quad V_{PWELL}

V_{DD}

V_{SS}

\rightarrow t

T.Sakurai
Leakage Reduction of DLC SRAM

Total subthreshold leak of 1Mbit SRAM. At 1V VDD, VTH of the dormant cell is 0.25V while that of the active cell is 0V, keeping the total leakage power at 0.9mW.
Dynamic Leakage Cut-off (DLC) SRAM

Area Overhead of DLC SRAM

Memory capacity: 1MBit

Area Overhead vs. # of selected bit at a time

T. Sakurai
Clustered Voltage Scaling for Multiple $V_{DD}$’s

Conventional Design

Critcal Path

CVS Structure

FF Level-Shifting F/F

Critical Path

Lower $V_{DD}$ portion is shown as shaded

Once $V_L$ is applied to a logic gate, $V_L$ is applied to subsequent logic gates until F/F’s to eliminate DC current paths. F/F’s restore $V_H$.

Slave-Latch Level-Conversion F/F
Dual-VS Scheme

T.Sakurai
Optimum VL/VH is between 0.6~0.7 for any kinds of path-delay distribution functions.
Path-delay Distribution in Dual-VS

- RISC (5645 cells)
- DMA (1493 cells)
- MEC (2912 cells)
- MCB (1366 cells)
- DCT (5466 cells)
- VLD (3812 cells)
- IDCT (6227 cells)
- VLC (3462 cells)

$p(t)$ before and after
Dynamic Voltage Scaling Loop

Temperature effects on $I_{DS} - V_{GS}$

- Zero Temperature Coefficient (ZTC) point around $V_{GS} = 1.0\,V$
  - $V_{ZTC} \approx 1\,V$

- Temperature increases $V_{ZTC}$
  - $V_{DD} > V_{ZTC}$
    - $\text{Temp. coeff} < 0$
    - $\text{Temp. coeff} > 0$

Cause of positive temp. dependence of $I_{DS}$

- $\alpha$-power law model \hspace{1cm} (T = Temp. $\mu$ = Mobility)

$$I_{DS} \propto \mu(T) \left( V_{DD} - V_{TH}(T) \right)^\alpha$$

<table>
<thead>
<tr>
<th>$\mu(T)$</th>
<th>$\mu(T_0)(T / T_0)^{-m}$</th>
<th>T</th>
<th>T</th>
</tr>
</thead>
<tbody>
<tr>
<td>$V_{TH}(T)$</td>
<td>$V_{TH}(T_0) - \kappa(T - T_0)$</td>
<td>$\downarrow$</td>
<td>$\uparrow$</td>
</tr>
</tbody>
</table>

Typical Value: $\alpha = 1.5$, $m = 1.5$, $\kappa = 2.5$ [mV/T]

Effects of $V_{TH}$ and $\mu$ on $I_{DS}$ when temp. goes up 100[K]

<table>
<thead>
<tr>
<th>$V_{DD}$</th>
<th>$V_{TH}$ effect</th>
<th>$\mu$ effect</th>
</tr>
</thead>
<tbody>
<tr>
<td>2.5V</td>
<td>10%</td>
<td>35%</td>
</tr>
<tr>
<td>1.0V</td>
<td>55%</td>
<td>35%</td>
</tr>
</tbody>
</table>
Measurement of 32bit full adder

Photograph of 32bit FA
0.3\(\mu\)m CMOS

Normalized \(t_{pd}\) vs. \(V_{DD}\) [V] for different temperatures (20\(^\circ\)C, 50\(^\circ\)C, 90\(^\circ\)C).
Transient response of chip temperature

Better package is needed to avoid thermal runaway in low voltage.

Careful temperature design for low-voltage

$I_{DS}$ and gate speed shows positive temperature dependence in $V_{DD} < 1V$ region. This will change the design validation process for worst conditions.

In low-$V_{DD}$, low-$V_{th}$ designs, temperature goes up much more than the high-$V_{DD}$, high-$V_{TH}$ design, even if power consumption at room temperature and package are the same.
D-type CMOS

\[ t_{ LH } = K \frac{C_L V_{DD}}{(I_{PON} - I_{NOFF})} = K \frac{C_L V_{DD}}{(I_{P0} - I_{NLEAK})} \]

\[ t_{ HL } = K \frac{C_L V_{DD}}{(I_{NON} - I_{POFF})} = K \frac{C_L V_{DD}}{(I_{N0} - I_{PLEAK})} \]

\[ t_d = \frac{t_{ LH } + t_{ HL }}{2} \]

K \sim 1 \quad (K=0.91 \text{ in this case})

D-type leakage can not be neglected in the range \( V_{TH} < -0.2 \text{V} \).
Power Distribution in CMOS LSI's

- MPU1
  - Clock
  - Logic
  - Memory
  - I/O

- ASSP1
  - Clock
  - Logic
  - Memory
  - I/O

- MPU2
  - Clock
  - Logic
  - Memory
  - I/O

- ASSP2
  - Clock
  - Logic
  - Memory
  - I/O
Synthesis for low-power is not so effective.

Clock system is the key. In this respect, gated clock is one of the most efficient ways to reduce the power in current processors.

Gated clock is useful in reducing average power but not that effective in reducing peak power.

Circuit / device level is important.
Reduced Clock Swing Flip-Flop

(a) RCSFF
Voltage swing of CLK is reduced to Vclk down to 1V.

(b) Conventional F/F

H. Kawaguchi and T. Sakurai, "A Reduced Clock-Swing Flip-Flop (RCSFF) for 63% Clock Power Reduction," in Symp. on VLSI Circuits '97, June, 1997.

T. Sakurai
Layout Example

(a) RCSFF

(b) Conventional F/F
Delay and power comparison

Clock-to-Q Delay [ns] vs. Vclk [V]
- Wclk=6.5µm
- Wclk=10µm
- Wclk=20µm

Power per F/F [µW] vs. Vclk [V]
- Type A driver
- Type B driver
- VWELL=3.3V
- VWELL=6V

T. Sakurai
Modified Sense Amplifier-Based F/F

This can be used with RCSFF scheme.


T.Sakurai
Ultra Low-Voltage Operation

# Ultra Low-Voltage Operation

![Diagram of Ultra Low-Voltage Operation](image)

<table>
<thead>
<tr>
<th>Process</th>
<th>0.050</th>
<th>0.200</th>
<th>0.330</th>
<th>0.6</th>
<th>1.0</th>
<th>1.5</th>
<th>2.5</th>
<th>5.0</th>
<th>V</th>
</tr>
</thead>
<tbody>
<tr>
<td>std 2.0um 300K</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>0.6</td>
<td>12</td>
<td>35</td>
<td>80</td>
<td>150</td>
<td>MHz</td>
</tr>
<tr>
<td>ULP 2.0um 300K</td>
<td>-</td>
<td>20</td>
<td>33</td>
<td>60</td>
<td>100</td>
<td>150</td>
<td>200</td>
<td>280</td>
<td>MHz</td>
</tr>
<tr>
<td>ULP 1.5um 300K</td>
<td>-</td>
<td>27</td>
<td>50</td>
<td>96</td>
<td>160</td>
<td>219</td>
<td>306</td>
<td>434</td>
<td>MHz</td>
</tr>
<tr>
<td>ULP 1.5um 130K</td>
<td>18</td>
<td>92</td>
<td>140</td>
<td>240</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>MHz</td>
</tr>
<tr>
<td>ULP 1.5um 77K</td>
<td>-</td>
<td>82</td>
<td>150</td>
<td>260</td>
<td>358</td>
<td>436</td>
<td>531</td>
<td>640</td>
<td>MHz</td>
</tr>
</tbody>
</table>

T. Sakurai
Vth, Leff, tox Optimized Low-Power MOS

M. Kakumu et al., "Low-Voltage and Power CMOS Technology", SSDM, 1995, pp.213-
# SOI Processors in ISSCC’99

<table>
<thead>
<tr>
<th>Paper#</th>
<th>WP25.1</th>
<th>WP25.3</th>
<th>WP25.7</th>
<th>WP25.4</th>
</tr>
</thead>
<tbody>
<tr>
<td>Company</td>
<td>IBM (East Fishkill)</td>
<td>IBM (Essex &amp; Austin)</td>
<td>IBM (Rochester)</td>
<td>Samsung</td>
</tr>
<tr>
<td>Target</td>
<td>PowerPC 604e for Apple</td>
<td>PowerPC 750</td>
<td>PowerPC</td>
<td>Alpha</td>
</tr>
<tr>
<td></td>
<td>32b</td>
<td>64b</td>
<td>64b</td>
<td></td>
</tr>
<tr>
<td>PD/FD</td>
<td>PD</td>
<td>PD (SIMOX)</td>
<td>PD (SIMOX)</td>
<td>FD (SIMOX/Unibond no dep.)</td>
</tr>
<tr>
<td>Rule</td>
<td>0.25um</td>
<td>0.2um (Leff=0.12um)</td>
<td>0.25um</td>
<td></td>
</tr>
<tr>
<td>Interconnect</td>
<td>5 Al + W local</td>
<td>Cu</td>
<td>6 Cu</td>
<td>4 Al</td>
</tr>
<tr>
<td>Area</td>
<td>49mm²</td>
<td>139mm²</td>
<td>209mm²</td>
<td></td>
</tr>
<tr>
<td># of Tr's</td>
<td>6.5M</td>
<td>34M</td>
<td>9.7M</td>
<td></td>
</tr>
<tr>
<td>Freq.</td>
<td>500MHz</td>
<td>580MHz@85C, fast proc.</td>
<td>550MHz</td>
<td>600MHz</td>
</tr>
<tr>
<td>VDD</td>
<td>1.7V</td>
<td>2V</td>
<td>1.8V</td>
<td>1.5V (2V I/O)</td>
</tr>
<tr>
<td>Power</td>
<td>5.1W @2V,400MHz</td>
<td>24W</td>
<td>40W</td>
<td></td>
</tr>
<tr>
<td>Speed gain of</td>
<td>25-30%</td>
<td>20%</td>
<td>20%</td>
<td>30%@1.2V, 20%@1.5V SRAM</td>
</tr>
<tr>
<td></td>
<td>22% Ctotal reduction</td>
<td>12% by Cj</td>
<td>15-20% simple gates</td>
<td></td>
</tr>
<tr>
<td></td>
<td>10-15% more Ids</td>
<td>15-25% by less body-bias</td>
<td>25-40% complex gates</td>
<td></td>
</tr>
</tbody>
</table>
Hi-Speed is Low-Power

From URL: www.erniefernandez.com/html/soi.html

T. Sakurai
Advantage of SOI over Bulk CMOS

- Lower $C_J$ and $C_{GROUND}$ achieves 20% lower $C_{TOTAL}$. Good for hi-speed & low-power. (For interconnection limiting cases, less effective)

- 10-15% higher $I_{DS}$ due to lower $V_{TH}$ in turning-on and parasitic bipolar current (Effects reduced in $V_{DD}=0.6V$)

- Lower negative body-bias effect in pass-gates and series-connected MOS’s as in NAND’s achieves higher $I_{DS}$ and hence hi-speed.

- $s$ of 60mV/dec is achievable in FD and DTMOS. Lower $V_{TH}$ is possible with the same off-leak. (Less effective in lower $V_{TH}$ like 0.1V)

- Lower SER (Normal dynamic gates)

- 25-30% higher speed in total for 0.25um generation
Design Issues of PD-SOI

- History dependent delay (3-8% fluctuation)
- Pass-gate leakage by parasitic bipolar current (pull-down internal nodes)
- Lowered noise immunity in dynamic circuits (several techniques)
- Self-heating (only for circuits with DC current path)
- ESD protection (process/device & circuits remedies)
- Redesign efforts (higher for PD, lower for FD)
- Higher wafer cost
Dynamic Threshold MOSFET (DTMOS)


T. Sakurai
Pass Transistor Logic with SOI

For NMOS with VDD=0.5V
Gate is 0.5V → Body bias=0.5V → Vth= -0.05V
 Gate is 0V → Body bias=0V → Vth= 0.15V

DTMOS vs. Normal SOI

- Suppose DTMOS ≈ front gate + back gate
- $I_{DS}/C_G$ of back gate device < $I_{DS}/C_G$ of front gate device.
- DTMOS needs body contact area. FD SOI can use larger W.
- Both can achieve $s=60\text{mV/dec}$.
- With the same leakage and area, which is really faster?
- DTMOS is good in driving large $C_{\text{LOAD}}$.
- Pass transistor will show better performance with DTMOS.

T.Sakurai
Reduced number of transistors leads to low-power, high-speed and reduced area.
## History of Pass-Transistor Logic

<table>
<thead>
<tr>
<th>Logic</th>
<th>NMOS logic</th>
<th>Pass Transistor Logic</th>
</tr>
</thead>
<tbody>
<tr>
<td>Load</td>
<td>PMOS Cross</td>
<td>CVSL (IBM, 1984)</td>
</tr>
<tr>
<td></td>
<td></td>
<td>DSL (Philips, 1985)</td>
</tr>
<tr>
<td>CMOS inverter</td>
<td></td>
<td>CPL (Hitachi, 1993)</td>
</tr>
<tr>
<td>None</td>
<td></td>
<td>DPL (Hitachi, 1993)</td>
</tr>
<tr>
<td>CMOS latch</td>
<td></td>
<td>SRPL (Toshiba, 1994)</td>
</tr>
<tr>
<td>Sense-Amp.</td>
<td></td>
<td>SAPL (Toshiba, 1994)</td>
</tr>
</tbody>
</table>

Complementary output

Complementary input (gate input)

Pass variable or VDD/VSS (drain input)

T.Sakurai
Various Pass-Transistor Logic Circuits

0.4µm device (full adder)

<table>
<thead>
<tr>
<th>Circuit</th>
<th>Items</th>
<th>Tr. Count</th>
<th>Delay (ns)</th>
<th>Power (mW/100MHz)</th>
<th>P•D (Normalized)</th>
<th>E•D (Normalized)</th>
</tr>
</thead>
<tbody>
<tr>
<td>CMOS static</td>
<td></td>
<td>40</td>
<td>0.82</td>
<td>0.52</td>
<td>1.00</td>
<td>1.00</td>
</tr>
<tr>
<td>CPL</td>
<td></td>
<td>28</td>
<td>0.44</td>
<td>0.42</td>
<td>0.43</td>
<td>0.23</td>
</tr>
<tr>
<td>DCVSPG</td>
<td></td>
<td>24</td>
<td>0.53</td>
<td>0.30</td>
<td>0.37</td>
<td>0.24</td>
</tr>
<tr>
<td>SRPL</td>
<td></td>
<td>28</td>
<td>0.48</td>
<td>0.19</td>
<td>0.21</td>
<td>0.13</td>
</tr>
</tbody>
</table>
Pass-Tr. Logic Synthesis with BDD

BDD: Binary Decision Diagram

BDD for function f

BDD for function f

Truth table for f & \bar{f}

Rule 1
Collapse two nodes A1 and A2 whose right and left branch each point to the same node.

Rule 2
Eliminate a node A whose right and left branch point to the same node.
BDD Reduction Example

Reducing ◯ & ◯ by Rule 1

Reducing ◯ & ◯ by Rule 1

T. Sakurai
Mapping BDD to MOS Circuit

Mapping to MOS circuit

Introducing pass variables

\[ \begin{align*}
\square & \rightarrow V_{DD} \\
\square & \rightarrow V_{SS}
\end{align*} \]

\[ \begin{align*}
x & \text{ branch to } V_{DD} \\
x & \text{ branch to } V_{SS} \\
\bar{x} & \text{ branch to } V_{DD} \\
\bar{x} & \text{ branch to } V_{SS}
\end{align*} \] → pass variable \( x \)

→ pass variable \( \bar{x} \)

T. Sakurai
Approach to low-power LSI

Example of MPEG2 decoding

- Processor (software)
  \(\sim 25W\)

- DSP
  \(\sim 4W\)

- Dedicated system LSI (SW/HW)
  \(\sim 0.7W\)

High flexibility
Low-power

T.Sakurai
Power * Area vs. Performance

- µP + Multimedia extension
- Mediaprocessor for PC
- Mediaprocessor for AV

16bit performance (GOPS)

Power * Area (W mm²)

T. Sakurai
Homogeneous vs. Heterogeneous

Homogeneous Architecture
(High flexibility)

MPUMPU

Memory

I/F, Analog

Heterogeneous Architecture
(System LSI)
(Low-power, more efficient)

DSP

Memory

I/F, Analog

Special Engine

T. Sakurai
DRAM Embedding


- Two orders of magnitude improvement in bandwidth and power
Neural chip

3 orders of magnitude smaller power consumption for recognition compared to software implementation

**Energy of various operation**

Integration (system LSI) is the key to low-power operation.

<table>
<thead>
<tr>
<th>Operation</th>
<th>Energy/Op (pJ)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Add</td>
<td>7</td>
</tr>
<tr>
<td>3-2 Add</td>
<td>2</td>
</tr>
<tr>
<td>Multiply</td>
<td>40</td>
</tr>
<tr>
<td>Latch</td>
<td>1.8</td>
</tr>
<tr>
<td>Internal read</td>
<td>36</td>
</tr>
<tr>
<td>Internal write</td>
<td>71</td>
</tr>
<tr>
<td>I/O</td>
<td>80</td>
</tr>
<tr>
<td>External memory</td>
<td>16000</td>
</tr>
</tbody>
</table>


T. Sakurai
Software-Hardware cooperation

StrongArm-1100

(Clock frequency control instruction equipped, an encryption algorithm)

- Code optimization for power -> factor of 5 power reduction
- Adaptive V\text{DD} control together with frequency control -> factor of 3 further power reduction

Important technologies for low-power

\[ P = \alpha f C V_s V_{DD} + \text{leak power} \]

**Low-voltage**
- \( V_{TH} \) control, multi-\( V_{TH} \), SOI, leakage control
- \( V_{DD} \) control, multi-\( V_{DD} \), DC-DC conv.
- Ultra low voltage circuit (PLL, analog)
- Software control

**Low-swing**
- Bus, clock

**Low-C**
- Less # of Tr’s, fused digital-analog, pass-transistor
- Low-k (air isolation)
- System on a chip, memory embedding

**Low- \( \alpha f \)**
- Locally synch.-globally asynch., gated clock
- Low transition coding
Lorentz Force MOS (LMOS)

- Electrons deflected by $B_y$.
- Voltage difference between Vo1 and Vo2

Microphotograph of LMOS

$W_p : \begin{align*}
10\mu\text{m} & \quad 8\mu\text{m} & \quad 5\mu\text{m} & \quad 2\mu\text{m} 
\end{align*}$

10 parallel connection
Measured $\Delta V_D$ dependence on $I_P$

- $\Delta V_D$ is proportional to $I_P$.

- $W_p = 8 \mu m$
- $V_{DDT} = V_{GT} = 2V$

Graph shows $\Delta V_D$ in $\mu V$ vs. Power supply current in mA.
It is possible to measure the current of thousands LMOS.

Shift registers are used to control the gate of LMOS.
<table>
<thead>
<tr>
<th></th>
<th>$p_t$</th>
<th>$C_L$</th>
<th>$V_S$</th>
<th>$V_{DD}$</th>
<th>$f_{CLK}$</th>
<th>$I_{SC}$</th>
<th>$I_{DC}$</th>
<th>$I_{LEAK}$</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>General</strong></td>
<td></td>
<td></td>
<td>Small Signals</td>
<td>Low $V_{DD}$</td>
<td></td>
<td>Careful Design</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• device scaling</td>
<td></td>
<td>• DC-DC conv. 1)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>• 0.25V Q-Rail 2)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Clock</strong></td>
<td>• gated clock</td>
<td></td>
<td></td>
<td>1/2 swing 4)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• floorplan to reduce</td>
<td></td>
<td>• 1/4 swing 6)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>wire length</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• F/F sizing</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• Charge Recycling</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• C stacking 4)</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Bus</strong></td>
<td></td>
<td>• 3-state-buffer</td>
<td>Tr. Reduction</td>
<td>• pass-tr.</td>
<td>• parallel-</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>activated after data</td>
<td></td>
<td>(SAPL) 8)</td>
<td>ism</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>fix 5)</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• exclusive bus</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Data Path</strong></td>
<td>• latch insertion</td>
<td></td>
<td></td>
<td>• current</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>to deskew data</td>
<td></td>
<td></td>
<td>switch logic (MCML) 13)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>in 7)</td>
<td></td>
<td></td>
<td>• tr. sizing 11,12)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Random Logic</strong></td>
<td></td>
<td></td>
<td>library &amp; CAD</td>
<td>• tr.</td>
<td>Cut Current</td>
<td>• switched</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• permutation of series-connected tr. order 9)</td>
<td>for pass-tr. logic 10)</td>
<td>sizing 11,12)</td>
<td>Current S/A 15)</td>
<td>source impedance 16)</td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>Memory</strong></td>
<td></td>
<td>• memory hierarchy</td>
<td></td>
<td>• reduced</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>swing WL, BL</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>I/O</strong></td>
<td></td>
<td></td>
<td></td>
<td>• reduced</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• MCM 17)</td>
<td></td>
<td>swing I/O 18)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td>• area pad 17)</td>
<td></td>
<td>(GTL, LVDS)</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Reference for low-power design & System LSI

Low-power high-speed LSI design & technology

「低消費電力、高速LSI技術」

Realize publishing company, ¥56,000
Phone: +81-3-3815-8511, Fax: +81-3-3815-8529

System LSI – Applications and Technology

「システムLSI アプリケーションと技術」

Science Forum publishing, ¥48,000
Phone: +81-3-5689-5611, Fax: +81-3-5689-5622

T. Sakurai
Three crises in VLSI designs

- Power crisis
- Interconnection crisis
- Complexity crisis
Complex interconnect
Advances in interconnection technology

Interconnection in 1985

Interconnection in 1998
Interconnect determines cost & perf.

P: Power, D: Delay, A: Area, T: Turn-around

# of int. layers

Power [%]

Delay [%]

Process steps [%]

# of layers

'C95 2000 '5 '10 Year

'C95 2000 '5 '10 Year

'C95 2000 '5 '10 Year

'C95 2000 '5 '10 Year

'SIA'97

'C_int

'C_transistor

'C_gate

RC delay

Tr's

Int.

T. Sakurai
Interconnect parameters trend

Semiconductor Industry Association roadmap
http://notes.sematech.org/1997pub.htm
RC delay and gate delay

- Delay (sec)
- Year

Clock period
- 3mm
- 1mm
- 100µm
- 50µm

Gate delay
Receivers

Interconnect Delay (ns)

Interconnect length (cm)

0.1 1 10 100 1000

a) Without repeaters

b) With repeaters
Tradeoff between power and delay

**Delay optimization**

P: Power consumed by repeaters ($P_{\text{repeater}}$) is 0.6 times the power consumed by interconnect ($P_{\text{line}}$)

**Power • Delay optimization**

D: 9% increase from opt.
P: $P_{\text{repeater}} / P_{\text{line}}$ is 26%
PD: 24% decrease from delay opt. case
The further, the less

LSI

unit

block

World

company

group

Local memories

Hierarchy

T.Sakurai
Locality in space & time

Use of local memories

CPU

1st cache

2nd cache

Main mem.

Ext. mem.

Latency

3ns

20ns

100ns

10ms

Throughput

3ns

10ns

30ns

100ns
Capacitive Coupling Noise

- $\frac{C_{12}}{C_{20}}$
- ratio
- peak couple noise / signal voltage

Year
- 1996
- 2000
- 2004
- 2008
- 2012

T. Sakurai
Coupling noise in RC bus

\[ V_p \approx \frac{2Cc/C}{1+ 2Cc/C} \quad \text{(Bus)} \]

\[ V_p \approx \frac{2Cc/C}{2+ 3Cc/C} \quad \text{(Three lines)} \]

Noise on power supply lines

'0' is higher

• Smaller margin in single-ended circuits
• Erroneous discharge in dynamic circuits
• In-phase noise in differential circuits (no change in margin)

Noise on signal lines

Single-ended

Differential

T. Sakurai
Air Isolation

Before ashing

Spt. SiO$_2$ (50 nm)  Interconnect
Carbon

After 450C, 2H furnace ashing

Spt. SiO$_2$ (50 nm)  Interconnect
Gas

T.Sakurai
Air Isolation

**Isolation material**

- SiOF (k=3.7)
- Parylene-N (k=2.7)
- HSQ (k=2.2)
- Gas (Wire-wire)
- Gas (All)

**Delay (ps/stage)**

- SiOF: 41.7%
- Parylene-N: 49.5%
- HSQ: 26.7%
- Gas (Wire-wire): 20%
- Gas (All): 41.7%
Coupling among Interconnection

Difficulty in checking setup and hold time.
SA-F/F (Sense-Amplifying Flip-Flop) circuits

NMOS Dynamic Differential Logic

SA-F/F

T.Sakurai
Skin Effects for Signal Lines

- Skin depth
- Skin depth, interconnect width [m]
- Frequency (Hz)
- Hi-end clock freq.
- Low-end clock freq.
- Cu wire

T. Sakurai
Skin Depth and $R$ Increase

$R / R_0$: Increased $R$ by skin effect

D: skin depth

$\frac{a}{D}$
Now RC effects surmounts LC effects because $R > |j\omega L|$.

In the future, both of $R$ and $\omega L$ increase ($R$ increases more rapid?).

Exception in low-$R$ lines

Inductive effects in wide clock lines in a fast processor are claimed to be observed in simulation.

Clock lines are placed on power plane to reduce inductive effects.

Inductive Effects

\[ \frac{\omega L}{R} \]

- \( W=1 \ \mu m \)
- \( W=10 \ \mu m \)
- \( W=100 \ \mu m \)

Min. width (scaled)

Year:
- 1996
- 2000
- 2004
- 2008
- 2012
Inductive Effects in Clock Lines

Board design practice is imported in LSI.

Interconnect Cross-Section and Noise

Unscaled / anti-scaled
- Clock
- Long bus
- Power supply

Scaled interconnect
- Signal

1V 15W -> 15A current
5% noise -> 0.05V noise -> 3mΩ sheet R -> 10µm thick Al
Area pad + package, or thick layer on board is needed.

T.Sakurai
Possible solutions for interconnect issues

Architecture
- Hierarchical architecture, local memories (10~)

Circuit
- Repeater (5)
- Line width sizing (10)
- Sense amplifier (5)
- Interconnection pipelining (10)
- Differential circuit (10)

Device / Process
- Low-$r$ (Cu 1.3 (10 for EM)), Low-$\varepsilon$ (F 1.1, polymer 2, air 4)
- Multi-layer interconnection (un/anti-scaled layers 100)
- Area pads + thick package / board layers (10)

CAD
- $R$, $C$ extraction, fast simulation (1000)
- Optimization (repeater insertion...)
Three crises in VLSI designs

- Power crisis
- Interconnection crisis
- Complexity crisis
Designing a map of 10m wide roads for a world atlas
System LSI design complexity increases faster than productivity. (http://notes.sematech.org/97melec.htm)
Coping with complexity crisis

- Re-use and sharing of IP’s
- Design at high abstraction

IP; CPU, DSP, memories, analog, I/O, logic...
HW/FW/SW
Hot design topics initiates CAD tools

Total system design

S/W, H/W Co-design

Behavioral

RTL

Logic

Circuit

Physical (deep submicron)

New dimensions
- LSI/package/board
- Power
- RC delay
- Signal integrity
- Interconnect reliability
- Noise
- IR drop
- Distribution of parameters
- Memory embedding
- Analog-digital mix

...
<table>
<thead>
<tr>
<th>Year</th>
<th>Unit</th>
<th>1999</th>
<th>2014</th>
<th>Factor</th>
</tr>
</thead>
<tbody>
<tr>
<td>Design rule</td>
<td>µm</td>
<td>0.18</td>
<td>0.035</td>
<td>0.2</td>
</tr>
<tr>
<td>Tr. Density</td>
<td>/cm²</td>
<td>6.2M</td>
<td>390M</td>
<td>30</td>
</tr>
<tr>
<td>Chip size</td>
<td>mm²</td>
<td>340</td>
<td>900</td>
<td>2.6</td>
</tr>
<tr>
<td>Tr. Count per chip (µP)</td>
<td></td>
<td>21M</td>
<td>3.6G</td>
<td>170</td>
</tr>
<tr>
<td>DRAM capacity</td>
<td></td>
<td>1G</td>
<td>1T</td>
<td>256</td>
</tr>
<tr>
<td>Local clock on a chip</td>
<td>Hz</td>
<td>1.2G</td>
<td>17G</td>
<td>14</td>
</tr>
<tr>
<td>Global clock on a chip</td>
<td>Hz</td>
<td>1.2G</td>
<td>3.7G</td>
<td>3.1</td>
</tr>
<tr>
<td>Power</td>
<td>W</td>
<td>90</td>
<td>183</td>
<td>2.0</td>
</tr>
<tr>
<td>Supply voltage</td>
<td>V</td>
<td>1.5</td>
<td>0.37</td>
<td>0.2</td>
</tr>
<tr>
<td>Current</td>
<td>A</td>
<td>60</td>
<td>494.6</td>
<td>8</td>
</tr>
<tr>
<td>Interconnection levels</td>
<td></td>
<td>6</td>
<td>10</td>
<td>1.7</td>
</tr>
<tr>
<td>Mask count</td>
<td></td>
<td>22</td>
<td>28</td>
<td>1.3</td>
</tr>
<tr>
<td>Cost / tr. (packaged)</td>
<td>µcents</td>
<td>1735</td>
<td>22</td>
<td>0.01</td>
</tr>
<tr>
<td>Chip to board clock</td>
<td>Hz</td>
<td>500M</td>
<td>1.5G</td>
<td>3.0</td>
</tr>
<tr>
<td># of package pins</td>
<td></td>
<td>810</td>
<td>2700</td>
<td>3.3</td>
</tr>
<tr>
<td>Package cost</td>
<td>cents/pin</td>
<td>1.61</td>
<td>0.75</td>
<td>0.5</td>
</tr>
</tbody>
</table>
Chip in 2014

- Sensors/actuators on chip
- 0.035\(\mu\)m 3.6G Si FET’s with VTH & VDD control
- Locally synchronous 17GHz clock, globally asynchronous
- Chip / Package / Board system co-design

- Sensors
- Lots of IP’s (µP, mem Analog, ...)
- Programmable Array of Macros
- Micro-actuators (for display)
Summary

- Scaling law indicates power, interconnection and complexity crises.
- Low-voltage + threshold control and less-waste design for low-power
- Process, design guidelines and local memory for interconnection issues
- Design reuse and sharing + software programmability for complexity