Journal of Engineering and Applied Sciences 14 (10): 3283-3288, 2019 ISSN: 1816-949X © Medwell Journals, 2019 # Wire-demotion for Static Timing Optimization in Advanced Technology Nodes <sup>1</sup>Lekbir Cherif, <sup>1</sup>Mohammed Darmi, <sup>1</sup>Jalal Benallal, <sup>2</sup>Rachid Elgouri and <sup>1</sup>Nabil Hmina <sup>1</sup>Laboratory of Systems Engineering, <sup>2</sup>Laboratory of Electrical Engineering and Telecommunication Systems, National School of Applied Sciences, Ibn Tofail University, 14000 Kenitra, Morocco Abstract: In Integrated Circuits (ICs) conception, the timing optimization techniques continually need enhancements. One way to avoid area increase during hold optimization is to optimize routing in order to bring down the Worst Hold and the Total Hold Slacks (WHS/THS) before regular Hold optimization. High performance, low power and small area (PPA) are the most customer requirements from new technology nodes. This indicates that any new optimization technique should improve one or all of the aforementioned requirements. This study, consider the high resistance sensitivity to "Self-Aligned Double Patterning" (SADP) process as an advantage and suggest a new timing optimization technique based on wire promotion. It consists on driving the EDA tool to use high resistive SADP layers for wires on hold timing paths. Which will free-up less resistive No-SADP layers for wires on setup timing paths. The target nets are issued by a statistical approach that helps on getting the targets with the maximum benefit. Experience on multiple 7 nm-SADP designs shows 41% WHS and 37% THS improvement with 0% area increase, compared to baseline flow. As a consequence, the worst negative and the total negative slacks (WHS/THS) are also well conserved and even improved up to 24 and 83%, respectively in some test-cases. **Key words:** Static timing analysis, optimization, global route, wire delay, worst and total hold slack, worst and total negative slack, back-end-of-line, self-aligned double patterning, circuit performance, performance power area #### INTRODUCTION Timing closure with respect to all timing constraints is the big challenge in integrated circuit design. In order to achieve timing closure, conventionally, Electronic Design Automation (EDA) tools add delay by adding buffers or inverters. Meanwhile, in advanced technology node many chips take advantage of low voltage operation with corners close to 0.4 V. At such low voltage node, the place and route tool has to fix a very large number of hold violations (Tu et al., 2013). This huge number of inserted buffers/inverters causes area and power consumption increase. Multi-patterning lithography, like Self-Aligned Double Patterning (SADP) or Self-Aligned Quadruple Patterning (SAQP) is considered as the innovative technology in making many chips today ding. In fact it is still much cheaper to do Double Patterning (DP) or Triple Patterning (TP) than to do single patterning on Extreme Ultraviolet (EUV) (Abercrombie, 2014). The 7 nm SADP technology is expected to be faster and smaller with low power consumption. In other words, the mission is too complicated for an EDA tool because of all the new design rules which this new technology brings. Examples of complex rules are the end-of-line spacing and all rules related to multiple patterning. To respect all of these design rules and at the same time get a high performance, EDA tools must continually develop their optimization techniques with no-regression in the Quality of Results (QoR). On the other hand in advanced technology node, cells interconnection is one of main factors affecting the Integrated Circuit (IC) performance (Prasad *et al.*, 2016, 2017). Recent researches show that circuit performances are negatively impacted by interconnect parasitic (Annymous, 2017; Prasad *et al.*, 2015; Pan and Naeemi, 2015). Indeed, the in-circuit delay is more sensitive to interconnect parasitic Resistance and Capacitance (RC) delay compared to the gate delay. In addition to that, the variability on the SADP process impacts the wire resistance more than the wire capacitance (Prasad *et al.*, 2016; Pan *et al.*, 2015). A similar study shows that resistance has much higher sensitivity to SADP process variations than capacitance about 18% resistance variability compare to 5% capacitance variability (Pan *et al.*, 2015). From 22 nm technology node, the circuit performance degradation is mainly due to interconnect parasitic. This fact is driving EDA vendors to put an additional effort to improve the optimization on interconnect in order to target high circuit performance (Prasad *et al.*, 2015). Actually, all researches have proved that at advanced technology nodes, longer wire lengths and highly resistive metal layers have led to a dramatic increase in interconnect delays (Prasad *et al.*, 2015, 2016). This study, based on previous observation introduce a routing optimization technique that drives the routing engine to use SADP layers to minimize hold violations. Consequently, no-SADP layers becomes free for setup closure. The list of target nets is issued by doing statistic study to help getting the best target with the highest improvement potential. The final objective is to improve design performance by reducing the Worst Hold, Total Hold, Worst Negative and Total Negative Slacks (WHS/THS/WNS and TNS) without impacting the area and power. The rest of the study is organized as follows. Part 1 presents global trend of delay sensitivity in advanced technology nodes, followed by a description of the relationship between the wire delay and its resistance. Part 2 describes a new routing optimization method, SADP based that helps timing closure by optimizing the nets delay. Finally, part 3 will present the results after applying the new methods on multiple test-cases. The comparison between baseline and the new method was also included. ## MATERIALS AND METHODS #### Analysis of interconnect resistivity Wire resistivity on SADP technology node: For the advanced technologies, a huge difference was detected in terms of sheet resistance between M1:M3 "SADP-Layers" compared to M4:M6 "non-SADP-Layers"; this difference is much higher in 7-nm node. Indeed for 7-nm node, the sheet resistance is going from 2400 $\Omega/\mu m$ or 2200 $\Omega/\mu m$ for M1 and M2:M3, respectively, to 147 $\Omega/\mu m$ for M4:M6 metals as shown in Table 1 (Prasad *et al.*, 2017). In 7nm-SADP technology node, used for this study, the spectrum of metal layer resistivity is large, from the top lesser resistive layers to the bottom SADP layers (M0-M3). Table 2 shows the sheet resistance for each layer. Metal 0-1 are highly used by standard cells; the router Table 1: Interconnect assumptions for 11 and 7 nm technology nodes. T and W are the wire thickness and width in nanometers | Scenario/Metals | W | T | R (Ω/μ-m) | |-----------------|------|------|-----------| | 11 nm node | | | | | M1 | 17.4 | 32.5 | 413 | | M2:M3 | 17.4 | 35 | 378 | | M4:M6 | 35 | 70 | 35 | | 7 nm node | | | | | M1 | 10.8 | 20.2 | 2400 | | M2:M3 | 10.8 | 21.8 | 2200 | | M4:M6 | 21.8 | 43.6 | 147 | mainly uses layers from metal 2. From Table 2, changing wire from metal 2-4 reduces its sheet resistance by 54% and by 46% from metal 3-4. This is an interesting breakpoint, a potential reduction of the total wire resistance by more than 46%. For a specific net, a wire's layer changes from metal 4-2 increase its sheet resistance by 54 and 46% from metal 4-3, impacting an increase of the total wire resistance by more than 46%. The new route optimization technique takes this technology propriety as a solution to add delay for hold timing closure. Impact of interconnect resistance on the circuit performance: The net delay is the time difference between when a signal is first applied to the net and when it reaches the other devices connected to it. The net delay is a direct effect of the finite resistance and capacity of that network it is also known as the wire delay (Darmi *et al.*, 2017). The wire delay is a function of R net and C wire (Cnet+Cinput) (Sadrusham, 2008). For the delay calculation in EDA tools it is true that the delay is calculated using complex timing models such as Non-Linear Delay Model (NLDM) and Composite Current Source (CCS) but a simple model could be used for a rough estimation of delays like the "Elmore delay formula" (Rabaey *et al.*, 2003) in Fig. 1. It can be concluded that reducing (or increasing) wire resistance implies wire delay (\tauDN) reduction (or increase) (Darmi *et al.*, 2017; Dig *et al.*, 2016). As a conclusion, the wire resistance is a major factor in the wire delay which has a direct impact on circuit performance. Therefore, an increase of the net resistance is important for hold closure. On another side, a decrease of that net resistance is favorable for setup closure. A more intelligent layer-usage balancing helps for both setup and hold closure without additional area utilization. Wire optimization for advanced timing closure: To successfully implement challenging designs in 7 nm-SADP technology node, a new optimization techniques need to be developed, especially for routing. Table 2: Sheet resistance (Rs) in 7 nm node technology | Layers | M2 | М3 | M4 | M5 | M6 | <b>M</b> 7 | M8 | М9 | M10 | M11 | |-----------|-------|-------|-------|-------|-------|------------|-------|-------|-------|-------| | Rs (W/sq) | 2,774 | 2,389 | 1.279 | 0.904 | 0.904 | 0.904 | 0.904 | 0.344 | 0.344 | 0.034 | Fig. 1: Wire delay model-elmore delay formula Fig. 2: Post-CTS normal design flow vs new proposed flow Thus, layer optimization could be one of the innovative solutions for EDA tool to achieve fast timing closure. Physical design implementation flow is a set of organized scripts that cover different stages from floor-planning to post-route. It includes all needed commands and settings to implement a large variety of designs with different complexity (Anonymous, 2016a-c). Figure 2 (black rectangles) presents the existing physical design implementation flow. During post-CTS step, timing setup violations are fixed first by reducing data path delay. Then, timing hold violations are fixed mainly by adding delays (Anonymous, 2016a-c). The proposed flow (Fig. 2; Green rectangles), recommend adding an additional step: "Layer-optimization" that gives | (a) | | | | | | | | |---------------------------------------------------|---------------------------------|--|--|--|--|--|--| | r Net: i_rgx_uscpd_unified_store/rf_d_array_74127 | | | | | | | | | ! | SADP wires statistics | | | | | | | | | Total wire M1 M2 M3 N | | | | | | | | Wire length (µ) | 31.92 0.0 1.32 7.2 2 | | | | | | | | | 100.00 0 4 32 7 | | | | | | | | , Non_SADP_pc: non_SADP_threshold: 50.0 | | | | | | | | | (b) Net: i_rgx_uscpd_unified_store/n 10431 | | | | | | | | | SADP wires statistics | | | | | | | | | | Total wire M1 M2 M3 Non-SDAP | | | | | | | | Wire length (µ) | 81.0 0.0 0.0 0.0 81.0 | | | | | | | | Wire length (%) | 100.00 0 0 0 100 | | | | | | | | Non_SADP_pc: non_SADP_threshold: 50.0 | | | | | | | | Fig. 3: a) Net#1's wire length statistics and b) Net#2's wire length statistics the EDA tool the needed guidelines. This will drive the global router to better use the available layers depending on timing objective. In fact for critical nets with a hold violation, the global-router must use SADP layers. In addition to solving the hold issues, more freedom for setup closure to use non-SADP layers was given. The idea is to maximize the benefits of the rerouting transforms during the hold optimization before start applying sizing and buffering transforms. The following section gives detailed explanation to achieve the objective of this work: reducing THS by adding layer optimization pass during post-CTS stage to help improve timing. Along with improving time, this step helps in utilizing less area, consuming less power and assuring better rout-ability. The "Layer optimization" new step can be ordered as follows: ## **Identify the best targets** **SADP vs. non-SADP layer:** From all paths violated in hold, a first filter was applied based on statistics that gives the percentage of non-SADP layers utilization compared to SADP layers. That filter returns all nets that have a percentage of non-SADP wire length superior to a threshold (in this case 50%). In fact, each net that has a total non-SADP wire %>50% is a good target for wire optimization. In the first example, Fig. 3a in a total wire length of $31.92 \mu m$ , non-SADP wire length is $23.4 \mu m$ (73%) and SADP wire length is $8.52 \mu m$ (27%). This net is a good target: promoting its wires within SADP layers, the wire-delay will increase and the hold slack will decrease in consequence and it can be corrected. Same for the second example in Fig. 3b with 100% of its wire routed with non-SADP layers. **Setup costing:** Hold optimization must take into account the setup violations. Therefore, a "Setup costing" should be done on each selected nets from "a". Thus, hold violated net located on violated setup path are removed from the targeted nets. Therefore, the remaining target nets after the filters "a" and "b" were considered as the optimal targets which should give the best optimization of the hold timing without impacting others metrics. Apply appropriate 'Non-Default-Rule's (NDRs): The next step is to drive the router to use SADP layers as much as possible on target nets from "1" and keep having the good rout-ability. This can be achieved by using Non-Default-Rules (NDRs), to drive the Global Router (GR). Indeed, this NDRs can be summarized by: Give a preference to use SADP (M1-M3) layers when possible during the routing of target nets and if not possible, due to a congestion issue or to a specific routing rule, let the GR become free to use others non-SADP (up to M4) layers. Run the global route Native command: At this stage, all target nets from "2" have a NDR. Therefore, the global routing on the full design can be started. As described previously, the global router will try to honor all the applied NDRs. A good recipe for better optimization is to start by routing target nets first and then do a GR for others nets. # Check timing and reset NDR on nets violated in setup: This last step is important to recover any setup timing degradation caused by applying NDRs. It consists of removing the added NDRs from all newly violated nets. #### RESULTS AND DISCUSSION The existing "Nitro-SoC 7 nm" reference flow (Anonymous, 2016a-c) described in Fig. 2, that already include the best physical design receipt to keep a close eye on cell area growth while running native optimization techniques was updated with the new proposed flow described in previous study. To have a good measurement of the benefit of this new optimization method, the new flow was applied on diverse test-cases using 7 nm-SADP technology. Three block-level and one top design are used for this study; their main characteristics are summarized in Table 3 and Fig. 4 shows their floorplans snapshots. Fig. 4: Designs floorplan's: a) Design#1 b) Design#2; c) Design#3 and d) Design#4 Table 3: Characteristics of used designs | | | | Design#3 | | |--------------------------|----------|----------|----------|----------| | Design's characteristics | Design#1 | Design#2 | (Top) | Design#4 | | Physical area (mm²) | 0.116 | 0.1 | 0.7 | 0.4 | | Number of instances | 347971 | 377590 | 819935 | 1.41e+06 | | Number of macros | 12 | 0 | 86 | 80 | | Number of nets | 365756 | 374035 | 709089 | 1.49e+06 | | Max frequency (GHz) | 1.8 | 1.6 | 1.7 | 2 | The baseline used is the generic physical design implementation flow (Fig. 2) developed by Mentor Graphics to implement test-cases designs. In parallel, the new flow presented in this study was inserted after Post-CTS optimization. The rest of the flow is kept and named "Layer Opt" flow. The 2 sets of runs were started from the same floorplan and the used EDA tool for the implementation is Nitro-SoC from Mentor Graphics Company (Anonymous, 2016a-c). Summary of timing improvement achieved comparing "Layer Opt" to the baseline is shown in Table 4. In fact, switching from the reference flow to a flow taking advantage of smart layer optimization helps to reduce dramatically THS and also improves the TNS. Fig. 5: Violated nets number and non-SADP layer reduction: a) Design #1; b) Design #2; c) Design #3 and d) Design #4 | Table 4: Quality of Results (QoR) summary | | | | | | | |-------------------------------------------|----------|----------|----------|----------|--|--| | Variables | WNS (ps) | TNS (ns) | WHS (ps) | THS (ns) | | | | Design#1 | | | | | | | | Baseline | -112.9 | -93975.4 | -72.1 | -14538.2 | | | | LayerOpt | -110.7 | -39143.7 | -50.1 | -2695.8 | | | | Gain | -2% | -58% | -31% | -81% | | | | Design#2 | | | | | | | | Baseline | -159.5 | -237396 | -62.4 | -7982.9 | | | | LayerOpt | -164.5 | -239402 | -64 | -1922.1 | | | | Gain | 3% | 1% | 3% | -76% | | | | Design#3 | | | | | | | | Baseline | -22.53 | -1905.01 | -24.72 | -95.06 | | | | LayerOpt | -17.19 | -315.74 | -11.96 | -40.95 | | | | Gain | -24% | -83% | -52% | -57% | | | | Design#4 | | | | | | | | Baseline | -102 | -87587 | -48 | -573 | | | | LayerOpt | -93 | -83469 | -25 | -336 | | | | Gain | -9% | -5% | -48% | -41% | | | Bold values are optimise values Layer optimizations for timing reduction is so effective on THS reduction with a gain up to 80%. The TNS is also reduced significantly up to 80%, thanks to this layer optimization that frees no-SADP layers to give more chance to optimize setup timing. For WNS and WHS, if the worst timing path is filtered-out by the algorithm described in study 4, there is no window to improve that path. In all test-cases, the used area is not impacted and designs stay routable. Thus, the usage of this feature in the place and route flow is a must-have. Figure 5 by comparing the number of violated nets at the end of the place and route flow, an important reduction in the proposed new optimization method was detected compared to the reference and native optimization method. Nets number reduction is flowed by reduction of non-SADP layers used to route violated nets. #### CONCLUSION This study presents the physical implementation of the four circuit blocks, designs #1-4 the place and route for each design using an optimal flow, adding the proposed optimization method at the end of post-CTS step and the timing improvement was presented. All ICs must pass through timing closure before arriving at the manufacture. Hold timing fix to ensure design functionality and setup timing maximum optimization to achieve high circuit's performances. In advanced technology nodes with the parasitic of the wire become a major factor in interconnect delay, any technic that helps to reduce or increase the interconnect resistance is linked to have good impact on setup or hold optimization. #### SUGGESTIONS This study suggest a timing closure based on wire optimization, this technique consists of using SADP layers for critical violated hold nets. As a result, more non-SADP layers resources become free for setup closure. Compared to the existing reference flow, this layer optimization method can reduce THS by 40% up to 80 by keeping a good or even better WNS, TNS and WHS. Moreover, these experiments help to achieve design performance and also guarantee 0% area. #### ACKNOWLEDGMENT This research was supported by Mentor Graphics Corporation. The researchers thank Dr. Hazem El Tahawy (Mentor Graphics, Managing Director MENA Region) for initiating and supporting this study. #### REFERENCES - Abercrombie, D., 2014. Is Multi-Patterning Good for You?. SMG Inc., Crystal River, Florida. https://semien.gineering.com/is-multi-patterning-good-for-you/ - Anonymous, 2016a. Nitro-SoC<sup>™</sup> and olympus-SoC<sup>™</sup> software version. R1 Yamaha Motor Company, Iwata, Shizuoka Prefecture, Japan. - Anonymous, 2016b. Nitro-SoC<sup>™</sup> and olympus-SoC<sup>™</sup> user's manual. Software Company, San Francisco, California. - Anonymous, 2016c. Nitro-SoC<sup>™</sup> and olympus-SoC<sup>™</sup> advanced design flows guide. Software Company, San Francisco, California. - Anonymous, 2017. International Technology Roadmap for Semiconductors. ITRS Group Ltd Company, London, England. - Darmi, M., L. Cherif, J. Benallal, R. Elgouri and N. Hmina, 2017. Integrated circuit conception: A wire optimization technic reducing interconnection delay in advanced technology nodes. Electron., 6: 1-78. - Ding, Y., C. Chu and W.K. Mak, 2016. Self-aligned double patterning-aware detailed routing with double via insertion and via manufacturability consideration. Proceedinds of the 53th ACM/EDAC/IEEE International Conference on Design Automation (DAC), June 5-9, 2016, IEEE, Austin, Texas, USA., ISBN:978-1-4673-8730-9, pp. 1-6. - Pan, C. and A. Naeemi, 2015. A paradigm shift in local interconnect technology design in the era of nanoscale multigate and gate-all-around devices. IEEE. Electr. Device Lett., 36: 274-276. - Pan, C., R. Baert, I. Ciofi, Z. Tokei and A. Naeemi, 2015. System-level variation analysis for interconnection networks at sub-10-nm technology nodes using multiple patterning techniques. IEEE. Trans. Electr. Devices, 62: 2071-2077. - Prasad, D., A. Ceyhan, C. Pan and A. Naeemi, 2015. Adapting interconnect technology to multigate transistors for optimum performance. IEEE. Trans. Electr. Devices, 62: 3938-3944. - Prasad, D., C. Pan and A. Naeemi, 2016. Impact of interconnect variability on circuit performance in advanced technology nodes. Proceedings of the 17th International Symposium on Quality Electronic Design (ISQED), March 15-16, 2016, IEEE, Santa Clara, California, USA., ISBN:978-1-5090-1213-8, pp: 398-404. - Prasad, D., C. Pan and A. Naeemi, 2017. Modeling interconnect variability at advanced technology nodes and potential solutions. IEEE. Trans. Electr. Devices, 64: 1246-1253. - Rabaey, J.M., A.P. Chandrakasan and B. Nikolic, 2003. Digital Integrated Circuits: A Design Perspective. 2nd Edn., Pearson, New York, USA., Pages: 761. - Sadrusham, N.J., 2008. Net delay or interconnect delay or wire delay or extrinsic delay or flight time. ASIC Company, India. http://asic-soc. blogspot. com/ 2008/ 10/ net-delay.html - Tu, W.P., C.H. Chou, S.H. Huang, S.C. Chang and Y.T. Nieh et al., 2013. Low-power timing closure methodology for ultra-low voltage designs. Proceedings of the International Conference on Computer-Aided Design (ICCAD), November 18-21, 2013, IEEE Press, Piscataway, New Jersey, USA., ISBN:978-1-4799-1069-4, pp: 697-704.