#### Cache Design Under Spatio-Temporal Variability

#### <u>Shrikanth Ganapathy</u><sup>1</sup>, Ramon Canal<sup>1</sup>, Antonio Rubio<sup>1</sup>, Antonio Gonzalez<sup>1,2</sup>

<sup>1</sup>Universitat Politecnica de Catalunya, Spain <sup>2</sup>Intel Barcelona Research Center, Spain





# Motivation

- Manufacturing process induce variation in device parameters (Spatial).
- Adverse operating conditions make reliable operation tougher (Temporal).
- With reducing feature sizes, Memory designed from minimum geometry transistors will suffer the most from Intrinsic variations.
- Corner-case estimation of energy/delay imperative at design time.
- We propose to use a combination of Simulation and multivariate regression based curve fitting for analysis.
- Such idea also provides platform for simultaneous co-exploration of circuit-centric optimizations.



## **Delay Estimation**



 High dependence of delay on temperature translates to multiple PDFs as against single PDF suggested by SSTA techniques.

$$D_{i} = \begin{vmatrix} \delta_{l_{eff}}^{p(1)} & \delta_{v_{th}}^{p(1)} & \delta_{l_{eff}}^{n(1)} & \delta_{v_{th}}^{n(1)} \\ \delta_{l_{eff}}^{p(2)} & \delta_{v_{th}}^{p(2)} & \delta_{l_{eff}}^{n(2)} & \delta_{v_{th}}^{n(2)} \\ \vdots & \vdots & \vdots & \vdots \\ \delta_{l_{eff}}^{p(j)} & \delta_{v_{th}}^{p(j)} & \delta_{l_{eff}}^{n(j)} & \delta_{v_{th}}^{n(j)} \end{vmatrix} X \begin{vmatrix} m_{leff}^{pmos} \\ m_{vth}^{pmos} \\ m_{leff}^{nmos} \\ m_{vth}^{nmos} \end{vmatrix} \\ + j * (D_{nominal}^{pmos} + D_{nominal}^{nmos}) + m_{temp} * \delta_{temp} \end{vmatrix}$$

 As delay is linearly dependent on threshold, effective length, we begin with a first order polynomial and fit to the best curve.

## **Energy Estimation**

- The static and dynamic energy of every component in the array sub-block is estimated at different instants of time.
- Energy is estimated as a function of the integral of current through the supply (non-capacitive).

$$Cache_{Energy} = \begin{bmatrix} (E_{precharge} + E_{column-mux} + E_{Driver} + E_{Decoder}) + p * E_{(active/cell)} \\ + (n-p)E_{(wline-active/cell)} + (m-1)E_{(bline-active/cell)} + E_{control} \\ + (m-1)(n-p)E_{(inactive-cells/cell)} \end{bmatrix}$$

 $+f(m,n)*E_{inactive-block}$ 

 In order to reduce the dimensions of the equations resulting from fitting using a first-order curve, we use main-effect analysis.

$$\begin{split} \boldsymbol{\delta}_{Energy} &\cong \left[ f\left( \vec{x}_{0}, V_{dd-nom}, T \right) - f\left( \vec{x}_{0}, V_{dd-nom}, T_{nom} \right) \right] \\ &+ \left[ f\left( \vec{x}_{0}, V_{dd}, T_{nom} \right) - f\left( \vec{x}_{0}, V_{dd-nom}, T_{nom} \right) \right] \\ &+ \left[ f\left( \vec{x}, V_{dd-nom}, T_{nom} \right) - f\left( \vec{x}_{0}, V_{dd-nom}, T_{nom} \right) \right] \end{split}$$

#### **Error between Simulation & Model**



- The computed error is independent of temperature.
- Model performance is degraded by structures driving large loads (address decoder, sense amplifier).
- At higher temperatures, access time failures are high.
- Higher order splines can be used to eliminate the non-linearity observed in 0.7V range.

### **Usability in Circuit Optimizations**





- V<sub>th</sub> variability was exploited to assign dual-V<sub>th.</sub>
- Lower threshold was assigned to delay critical paths.
- Delay was reduced by nearly 18% at high temperatures with minimal increase in leakage.
- Similarly, standby supply voltage of unused array sub-blocks was reduced with simultaneous dualthreshold assignment yielding energy reduction of around 50% for a single access.