# Two schemes to reduce interconnect delay in bi-directional and uni-directional buses

Koichi Nose and Takayasu Sakurai

Institute of Industrial Science, University of Tokyo, Tokyo, Japan 7-22-1 Roppongi, Minato-ku, Tokyo, 106-8558 Japan

# Introduction

As the device dimension is scaled down, interconnect RC delay becomes dominant performance limiter in high-performance VLSI's [1]. Another issue in the submicron interconnects is a drastic increase of coupling capacitance due to the higher aspect ratio to reduce the interconnect resistance. The increase of the coupling capacitance degrades signal integrity, inducing noise problems and delay fluctuation problems.

Buffer insertion (repeater insertion) is one of the most effective ways to decrease the interconnect delay. The original buffer insertion, however, cannot be applied to bi-directional buses because the buffer is uni-directional in nature. Some circuit configurations that can be applied to bi-directional buses have been proposed [2][3]. These circuits turn out to be prone to malfunctions when there is a noise from adjacent lines in scaled down interconnect systems where capacitive coupling is large. The quantitative discussion is in the following sections but a new buffer insertion scheme for bi-directional buses, namely dual-rail bus (DRB) scheme, which does not have noise problems is proposed and measured in this paper.

One more proposal is on a high-speed buffer insertion scheme for uni-directional buses by making use of staggered firing. The staggered firing bus (SFB) is proposed and measured.

# Dual-rail bus (DRB) for bi-directional buses

Figure 1 shows the schematic of the proposed dual-rail bus (DRB) scheme. This dual-rail bus consists of two buffered interconnects per bit, one of which is right-oriented and the other is left-oriented. When a certain I/O (I/O1) is output-enabled for a bus, the buffer B1 and B2 are forced to Hi-Z to get rid of the driving conflict between D1-B1 and D2-B2. It should be noted that all nodes before B1 and B2 are kept '0'. This is what the driving node does. At all other receiving nodes, the valid signal is constructed by 'OR'ing the signals of the right-oriented line and the left-oriented line. This is because one of the two lines carries a valid signal and the other line carries '0' at all location. If the bus is to be branched, the circuit shown in Fig. 2 should be used. By using three OR gates, the principle is observed that a valid signal flows in one line and '0' flows in the other line at all locations.

Some may think about a ring-structured bus by shorting right-oriented line and left-oriented line at both ends. The ring-structured bus, however, is slow because in the worst-case, the signal should travel twice as long as the length of the bus.

Figure 3 shows noise resiliency of the proposed DRB scheme and the previously published schemes [2][3]. Since the ratio of the coupling capacitance vs the grounding capacitance ( $C_C/C_G$ ) is now about 1.5 but will be increasing more than 3 in the future. This means that the noise induced by the coupling becomes larger. In the DRB, the noise resiliency is high because the line sections are shunted to '0' or '1' at each section. Other schemes, however, has smaller noise resiliency because the line is not shunted to '0' or '1' at each section. When coupling noise is applied, a victim line which wants to be static at '0' flips its state to '1' and flips back to '0' after a while. The transition takes several ns, which is too slow to take it as a glitch and error occurs in other schemes.

# Staggered firing bus (SFB) scheme for uni-directional buses

Figure 4 shows the delay fluctuation by the behavior of adjacent lines in capacitively coupled buses. If the adjacent lines switch in the out-phase fashion, the delay increases more than 30%. In order to decrease the worst-case bus delay, staggered firing bus (SFB) scheme is effective.

The schematic of the SFB scheme is shown in Fig. 5. The interconnects are driven at a different timing by applying additional delay (firing delay) at alternate lines. The firing delay can be tuned by a couple of ways, two of which are depicted in the figure. If the firing delay is applied, the delayed signal does not change when an aggressor signal is in propagation. This means that the aggressor signal does not slow down the victim which is the delayed signal. If the firing delay is too large, however, the firing delay itself increases the worst-case delay of the system. There is the optimum delay in the staggered firing, which is to be realized by the above-mentioned tunable delay buffer.

The staggered firing is not necessary for the DRB, since the left-oriented line and the right-oriented line are placed alternately and there is no chance for the adjacent signal to be out-phase.

## Measurement results

Experimental circuits of the proposed schemes are fabricated using  $0.6\mu$ m CMOS technology. A microphotograph of the test chip is shown in Fig. 6. The bus lines are 60mm in length and the 8 buffers are inserted. The delay measurement was carried out by a latch-to-latch delay measurement technique.

As for the dual-rail bus, the measured delay of the proposed scheme was 12.2ns while that of the conventional bus without any buffers was 17.7ns. In order to make the comparison fair, a wider interconnect is used in the conventional bus as shown in Fig. 8.

Figure 7 shows measured results for the staggered firing bus. The delay shows the minimum for an appropriate delay of about 1ns. If the firing delay is smaller than the optimum value, the delay is increased by the capacitive coupling. If the firing delay is larger than the optimum value, the firing delay itself delays the signal propagation.

#### **Future trend**

Since the measurement was carried out not with deep submicron interconnects and the real advantage of the proposed schemes increases as design rules scales down, the SPICE simulation is conducted to estimate the future benefit of the proposed schemes

Figure 8 is the SPICE simulation results for the dual-rail bus scheme. The interconnect parameters are taken from the ITRS Roadmap [1]. The length of the lines are assumed to be twice as long as a chip size. To make the comparison fair, the line width and spacing are assumed to be doubled for the conventional cases where no buffers are inserted and a single line is used per bit. As seen from Fig. 8, global interconnects can benefit from the use of the proposed scheme. In 2008 when  $0.07\mu m$  design rule is used, the delay is improved by an order of magnitude.

Figure 9 shows the future perspective of the effectiveness of the staggered firing bus scheme. As seen from this figure, the proposed scheme can suppress the delay by about 20% at  $0.18\mu$ m generation and beyond.

### Acknowledgement

The chip fabrication is supported by VLSI Design and Education Center (VDEC), the University of Tokyo with the collaboration by NTT Electronics Corp. and Dai Nippon Printing Corp.

# References

 International Technology Roadmap for Semiconductors, 1999.
T. Iima et al., "Capacitance coupling immune, transient sensitive accelerator for resistive interconnect signals of sub-quarter micron ULSI," VLSI Circuit Symp., pp.31-32, 1995.

[3] I. Dobbelaere et al., "Regenerative feedback repeaters for programmable interconnects," *Proc. of ISSCC*, pp.116-117, 1995



Fig. 1 Schematic diagram of dual-rail bus scheme



Fig. 2 Branch circuit for dual-rail bus



Fig. 3 Noise resiliency comparison among dual-rail bus scheme (DRB), transient sensitive accelerator (TSA)[2] and self-timed complementary regenerative feedback repeater (CRF)[3]



Fig. 4 Delay fluctuation by capacitive coupling



Fig. 5 Schematic diagram of staggered firing bus scheme



Fig. 6 Microphotograph

Fig. 7 measured result of staggered firing bus



Fig. 8 SPICE estimation of benefit of DRB scheme for the future



Fig. 9 SPICE estimation of benefit of SFB scheme for the future