您的当前位置：首页 Low Overhead Time-Multiplexed Online Checking

Low Overhead Time-Multiplexed Online Checking

来源：图艺博知识网

2009 Asian Test Symposium

Low Overhead Time-Multiplexed Online Checking:

A Case Study of an H.2 Decoder

Ming Gao and Kwang-Ting (Tim) Cheng

Department of Electrical and Computer Engineering

University of California, Santa Barbara Santa Barbara, CA 93106, U. S. A. {mgao, timcheng}@ece.ucsb.edu

Abstract—To cope with increasing in-field failure rates for cost-sensitive electronic products, a low-overhead online checking methodology -- Time-Multiplexed Online Checking (TMOC) -- was proposed and demonstrated in [1]. In this paper, we study the area overhead required for employing TMOC in an embedded Field Programmable Gate Array (eFPGA) core. The overheads caused by the relatively low logic density of eFPGA and the interface routing between a design module and its TMOC checker are examined in detail. In a case study of an H.2 decoder design [2], TMOC is compared to a dedicated duplication-based online checking scheme [3], which typically incurs more than 100% area overhead. Experimental results show that TMOC provides significant chip area overhead reduction for online checkers. A reduction of 68% is achieved when one checker is shared by 62 design partitions, for example. TMOC can also help reduce dynamic power overhead of online checking by increasing the number of partitions, at the cost of increased fault detection latency in some partitions.

Keywords- online testing, time-multiplexed, in-field failure, embedded FPGA, low-cost resilience

I. INTRODUCTION

As semiconductor technology progresses toward the nanoscale, designs become more vulnerable to in-field failure sources such as infant mortality, soft errors, silicon aging, and electromigration. Increasing in-field failures incur expensive product return and service costs if a product is not protected by error-resilient schemes. The 16.4% in-field failure rate of the Xbox 360, for example, has cost Microsoft more than one billion dollars [4]. The company extended its product warranty on these game consoles from one year to three years in February 2008 after there were a large amount of failures reported from the initial shipments. Such a high failure rate also hurts brand reputation, which is priceless for businesses. The data provided by Consumer Reports magazine [4] show that consumer electronics typically have an average in-field failure rate of 15%. This situation is even worse for three- to four-year-old laptops for which failure rates of 43% have been reported in [4].

To increase in-field chip availability, in-field self-checking followed by a self-repair process could be a more effective alternative than simply increasing hardware redundancy. For example, a Triple Module Redundancy (TMR) system with a repair rate μ and a failure rate λ has a much greater availability

than does an identical TMR system without self-repair. When μ=0.1/hour, λ=10-6/hour, the Mean-Time-To-Failure (MTTF) of the TMR system can be increased by more than 20,000 times that of the one without self-repair [5].

As the preliminary step towards creating a self-repairable design, both online and offline in-field self-testing techniques have been studied in the last decade. Several in-field Built-In Self-Test (BIST) schemes [6][7] could offer a certain level of fault detection capability with a lower hardware overhead compared to the Concurrent Error Detection (CED) schemes. However, some inherent characteristics of BIST limit the application of these offline in-field self-test methods. The limitations include: 1) the need to interrupt system operations; 2) a low fault coverage due to limited on-chip test pattern generation; 3) required system idle time; 4) non-trivial fault detection latency; 5) inability to detect transient and intermittent faults; and 6) the limited applicability of some BIST schemes to testing FPGAs or processors only.

Therefore, some of recent researches have been focusing on cost-efficient online testing schemes for SoC in-field self-testing. However, CED-based online checking schemes in general incur high area overheads. Various online checker designs (e.g. parity checking, error detection code, etc.) have been proposed that trade some fault detection capability for reduced hardware overheads [8][9][10]. Some researchers focused on ad hoc design optimization for low-cost checker implementation on a specific platform such as an FPGA [11] or a Network-on-a-Chip [12]. Such methods cannot be easily extended to general heterogeneous SoCs. In addition, some schemes proposed for heterogeneous SoCs are only suitable for either coarse-grained [13] or fine-grained [14] implementations.

A recent cost analysis [15] shows there is great demand for high quality in-field testing for overall product failure cost minimization. To achieve cost reduction without compromising the fault coverage, a Time-Multiplexed Online Checking (TMOC) scheme was proposed which reduces the chip area overhead for online checking at a cost of increased fault detection latency [1]. The basic idea of TMOC is to utilize dynamic, in-field reconfiguration to periodically check various parts of the system in a time-multiplexed fashion. Fig. 1 shows an example of TMOC implementation where an SoC design is partitioned into three sub-systems – A, B, and C. The on-chip reconfigurable checker space will be configured as the online checkers to periodically monitor sub-system i when t = Ti.

123119

Figure 1. A conceptual example of TMOC.

TMOC offers some compelling advantages compared to existing approaches. There is no interruption to system operations and no requirement for the system to be idle. It is applicable not only to FPGAs and homogeneous SoCs, but also to heterogeneous SoCs, as well as applicable in both coarse- and fine-grained implementations. There is no fault coverage loss for permanent and intermittent faults. The only fault coverage loss would be for some transient errors. However, evaluation of the area overheads that takes into account 1) the logic density gap between the embedded FPGA (eFPGA) and the ASIC implementations, and 2) the interface between of a TMOC checker and the modules under checking were not conducted in [1]. The power consumption of employing TMOC was not investigated either.

This paper attempts to answer the following questions: 1) How to implement the TMOC checker in a heterogeneous logic design? Can commercial partitioning tools be used for TMOC partitioning? 2) What is the logic density gap between eFPGA cores and ASICs? Given such a gap, can TMOC implemented in eFPGA still provide significant area reduction, in comparison with a dedicated checker implemented in an ASIC? 3) When many design modules share one TMOC checker, how to effectively interface the checker with the logic modules? Will the routing overhead be a problem? 4) What would be the power consumption overhead of employing TMOC? Are the checker reconfiguration and synchronization power hungry processes? Will the power consumption of an online checker be reduced in proportional to the area overhead reductions?

In the next section, the design driver, the checker scheme and the partitioning tool will be briefly introduced. Section 3 presents the area overhead estimation model and experimental results. Section 4 discusses the dynamic power overhead of employing TMOC. Section 5 concludes the paper and introduces future work.

II. PRELIMINARY

A. Design Partitioning

As the first step in the TMOC design flow [1], a design is partitioned into a number of sub-systems and an online checker is designed for each of them. These checkers will be later deployed in a time-multiplexed fashion. Either functional partitioning or structural partitioning can be applied as long as the partitioning

criteria, balanced partitioning and min-cut partitioning, are met for checker area overhead minimization. A balanced partitioning algorithm partitions the design into similar-sized sub-modules. A balanced partitioning minimizes the checker space area since the size of a TMOC checker space is determined by the size of the largest checker that the checker space accommodates. A min-cut partitioning is also preferred. The signals connecting two partitions become part of the I/Os of these partitions and thus need to be connected to the checker space. For most checker schemes, the more signals that need to be monitored, the more hardware resources required.

In this study, Synplicity’s multi-FPGA prototyping tool Certify is used for design partitioning. Its Quick Partition Technique (QPT) can automatically partition an RTL level design into a user-defined number of RTL clusters with the total cut-size minimized. Each of the partitions will be pushed to meet the user-defined area constraint in the form of a percentage of the entire design.

B. Checker Design

In practice, there might be more than one reconfigurable checker space in a design which employs the TMOC scheme. Each checker space is only shared by its neighboring design modules in order to minimize the area overhead and the delay penalty incurred by long module-checker interconnections. Since the area overhead of multiple TMOC checker spaces can be estimated one by one, without loss of generality, we examine the scenario in which only one checker space is shared by multiple modules in this study. To facilitate the area overhead comparison, Duplex Checking is employed for the experiments. A duplex checker consists of a duplication of the Module Under Checking (MUC) and its output comparison logic. The analysis methods and the tool flow discussed in the following, which are based on the above two assumptions (i.e. a single checker space and using duplex checking), can be easily extended for more complex scenarios.

C. Design Driver

To target cost-sensitive applications, we use an open source A 5-stage pipeline single-port SRAM Based deblocking filter, NOVA [2], as a driver to develop and to illustrate the TMOC design. The synthesis report shows that the entire decoder consists of 35,235 four-input Look-Up Tables (LUTs), based on the Xilinx Virtex II Pro device cell library. Its ASIC implementation, which uses a six-metal-layer 180nm CMOS standard cell library, consists of around 170K gates. We used only the logic portion of the decoder for the synthesis and partitioning experiments in this paper.

III. AREA OVERHEAD

The Area Overhead (AO) includes all the extra chip area incurred by employing a TMOC scheme in a logic design. Since on-chip area overhead is the focus of this study, we do not include the off-chip hardware (mainly memory and some simple control) used for checker reconfigurations, in overhead calculations.

The area overhead of a duplex TMOC checker can be estimated by considering three main parts: duplication of design partitions (Sinterface (S).

dp), output comparison logic (Soc), and checker if AO = Sdp+ Soc + Sif (1)

120124

TABLE I. Number of Partitions

ESTIMATION OF α THROUGH QPT PARTITIONING

1 1.10 1.11 1.11 1.13 1.14 1.13 1.13 Largest Average Partition Partition Size Size (LUT) (LUT)

1 35235 35235 4 9730 8809 8 45 4404 12 3263 2936 16 2501 2202 32 1266 1101 62 6 568 119 336 296 A. Duplication of Design Partitions

The reconfigurable space for implementing duplications of design partitions, called duplication space, should be able to accommodate the duplication of the largest design partition. Therefore, two design parameters will significantly affect the chip area of the duplication space Sdp. One is the circuit partitioning strategy, which includes the number of partitions and the effort for balanced partitioning. The other is the choice of the hardware fabrics for implementing the duplication space. In general, greater hardware flexibility requires more hardware overhead.

Sdp =α*β*U/N (2)

Ideally, if a design were evenly partitioned into N modules of equal size, the duplication space size could be estimated as β*U/N. U denotes the size of the design, where each partition is of a size U/N. The duplication space requires a chip area that is β times the size of a partition. The variable β, greater than 1, reflects the fact that the reconfigurable fabrics have a lower logic density than an ASIC. However, β*U/N only gives the lower bound estimation of the duplication space since current CAD tools often cannot evenly partition a complex design into equal-sized modules. Thus, variable α is introduced in (2) for a more realistic estimation of the duplication space size, where the largest partition duplication is as α times large as the average size of all partition duplications.

Given a design with a known size U, it is necessary to know the common values of α and β in order to estimate Sdp. To examine α, we partition NOVA into N modules. In the experiments, N varies from 1 to 119. In practice, it would be unrealistic to have as many as a hundred of modules sharing one TMOC checker. For each number of partitions, α is estimated by the maximum-to-average ratio of partition sizes. The results are shown in Table I. For each row, the value in column two is divided by that in column three to generate the corresponding α. There are two important observations from this table: 1) α varies in a small range from 1.10 to 1.14. 2) Limited by the capability of the QPT algorithm, the finer-grained partitioning is likely to produce a result far from balanced partitioning.

The size of the duplication space is also very sensitive to the implementation fabrics used. In fact, there are many different possible scenarios. The checker could be: 1) placed in spare regions of the entire chip in an FPGA design; 2) constructed from one or more spare cores in a multi-core SoC design; 3) created using an FPGA die in an SiP or 3-D IC design; or 4) placed in an embedded field re-configurable hard block in a SoC design. Note that the duplication space has the same logic density as the

designs in scenarios 1 and 2, while the situation is quite different in scenarios 3 and 4 because of the significant gap in logic density between FPGA and ASIC implementations. In this study, we focus on placing the checker in an embedded field re-configurable hard block in an SoC design. All experimental results and analysis are based on this assumption.

According to the latest comparison between generic FPGA and standard cell ASIC [16], for circuits containing only combinational logic and flip-flops, a generic 90nm FPGA implementation is on average 40 times larger than a 90nm ASIC implementation. However, modern FPGAs also contain hard blocks, such as multipliers, accumulators and block memories and it was found that the existence of these hard blocks reduce the density gap significantly. The ASIC-to-FPGA logic density ratio can be as little as 21 on average according to the investigation reported in [16].

To minimize the logic density gap between FPGA and ASIC, some leading embedded FPGA IP vendors provide SoC designers with eFPGA cores that have a much higher logic density and a similar logic capacity compared to commercial FPGA chips. For instance, Aboundlogic, a.k.a. M2000, announced its 90nm FlexEOS eFPGA macros using their improved reconfigurable cell architecture in year 2005 [17]. The silicon density for logic functions of FlexEOS is up to 1,350 LUTs per square millimeter with a capacity of up to 98,304 LUTs. To convert the density unit from LUT per mm2 to an equivalent gate count per mm2, a conversion ratio for the LUT-to-gate mapping is needed. This ratio is highly dependent on the efficiency of eFPGA synthesis and mapping tools. Due to the lack of eFPGA EDA tools, we take the conversion ratio from the industry practice of another eFPGA vendor Menta [18]. Menta’s 90nm eFPGA core with a silicon density of around 600 LUTs per mm2 may accommodate random logic functions equivalent to between 8,000 and 15,000 gates through their eFPGA tool Niagara. Incorporated with the silicon density of FlexEOS, it implies that the state of the art eFPGA cores are able to achieve a logic density of up to 33,750 gates per mm2 using a 90nm CMOS technology. According to the datasheets from main foundries such as TSMC, UMC, NEC, and Toshiba, 90nm standard-cell-based ASIC designs may achieve logic densities of up to around 400k gates per mm2. In fact, many realistic implementations are not able to achieve such a high density without custom optimizations. For example, the DSP core reported in [19] has a logic density of around 240k gates per mm2. In summary, the logic density gap ratio between eFPGA and a standard-cell ASIC implementation is around 12, so that we use β = 12 in Equation (2) for the estimation of Sdp.

B. Interface of Embedded FPGA

The challenges and models of the eFPGA interface in SoC for general applications have been studied in [20]. The interface architectures proposed in previous studies are mainly switching networks that are able to provide all possible mappings from inputs to outputs. Although the concentrator networks proposed in [20] were able to route 7200 signals with only 5% area overhead in a 20-million-gate SoC design, they offer more flexibility than needed for the TMOC implementations. Almost all the signals between a TMOC checker and the design modules are inputs to the checker without any output traffic. Secondly, the the number of TMOC checker interface configurations is relatively small and their configurations are known. Therefore, the required re-configurability of signal connectivity can be pre-determined by the partitioning process and thus any flexibility beyond these known configurations is redundant. Due to these

121125

TABLE II. Number of Partition

Total Cut Size

AREA OVERHEAD OF DESIGN-TO-CHECKER

INTERFACE # of Signal Groups

Width of MUX Output

MUX size (gates)

Interface overhead

TABLE III.

OVERALL AREA OVERHEAD FOR EMPLOYING

TMOC

1 174 1 4 2020 2 8 3017 2 12 3604 2 16 3995 3 32 5611 4 62 7402 4 119 10639 6 0 0 0% 1210 2420 1.42% 1603 3206 1.% 1830 3660 2.15% 1405 5620 3.31% 1513 9078 5.34% 1932 11592 6.82% 1837 18370 10.81% Number

Sdp (%)Soc (%) Sif (%) AO (%)

Partition

1 1200.000.24 0.00 1200.244 330.00 1.66 1.42 333.08 8 166.50 2.20 1. 170.59 12 111.00 2.51 2.15 115.66 84.75 1.93 3.31 .98 16

42.75 2.08 5.34 50.17 32

21.87 2.65 6.82 31.34 62

11.39 2.52 10.81 24.72 119

reasons, in our investigation, we use multiplexers instead of

switching networks.

Ideally, if the placement and routing tool is able to route all the I/O signals to an eFPGA core through direct hardwiring without causing congestion, the area overhead for routing should be zero. In this case, all the connection signals will be routed directly through the metal layers on top of the logic cells or utilizing the white spaces of the cell layout. This statement is also verified by our experiments using Cadence SoC Encounter with a TSMC 180nm standard cell library. We created a rectangular black box of 1 mm2 to mimic an eFPGA core in a large design. When the connection signals increase from 300 to 2,950, the post place and route reports do not show any increase in area. The core density reported is constantly 88.6% when the utilization is targeted up to 90% of the entire die size. The density reaches 97.% when the target is pushed to 99%. However, the routing complexity may increase significantly when there are too many connection signals. Fig. 2 shows that the total cut size increases at the rate of the square root of N. A proper grouping of signals and a proper use of multiplexer tree may help significantly reduce the routing complexity while maintaining a low area overhead. Assuming that an eFPGA core can afford, at most, 2,000 direct hardwiring connections, the signals can be grouped following these criteria: 1) Each group should have no more than 2,000 signals; 2) Groups should be of as the same size as possible in terms of number of signals. 3) All signals from a partition should be placed in the same group. Some signals may be duplicated for different groups when these signals are shared by too many partitions.

Columns three and four of Table II shows details of the multiplexer tree created for each partitioning case. For example,

when N = 32, the 32 partitions are assigned into 4 groups. One of the 4 groups has the largest number of connection signals to the checker, which is 1,513. Therefore, the total number of gates consumed by the multiplexer tree can be estimated using the sum of 1,513 units of the 4-to-1 MUX. Synopsys’s Design Compiler is used to generate the gate count, which is given in Column 5. The synthesis results show that when less than 62 partitions share one TMOC checker, the checker interface overhead is only around 7% for this 170K-gate decoder.

C. Output Comparison Logic

In a duplex checker implementation, the logic to compare the design’s output and its duplication’s output can be simply implemented using a balanced binary tree of exclusive OR gates. This XOR tree can be implemented using standard cell XOR gates and reused for every configuration of a TMOC checker. The two groups of signals to be compared are from: 1) the output signals of the eFPGA interface MUX, which present the design output; and 2) the output from the design duplication in the eFPGA core. Assuming that the checker may produce, say, at most 2,000 signals to compare at a time, 3,999 XOR gates will be needed. The comparator incurs an area overhead of about 5.5% for NOVA using a TSMC 180nm standard-cell library. Although the overhead is relatively trivial in this example, designers should try to avoid comparing too many signals at a time since the overhead will grow linearly.

D. Overall Area Overhead

The overall area overhead incurred by employing a TMOC scheme for the NOVA decoder is summarized in Table III by incorporating the results from subsections 3.1 to 3.3. Fig. 3 highlights the contributions to overall area overhead from three main factors. There are three important observations based on these results.

First and foremost, the experimental results verify the claim that the TMOC scheme can significantly reduce the area overhead for employing concurrent online checkers in SoC designs [1]. In this case study, for example, when the design is partitioned into modules, the overall area overhead is 31.34% which is less than one-third the size of a traditional duplex online checker. Such a great cost reduction will make expensive traditional online checking schemes suitable for use in cost-sensitive applications to cope with in-field failures.

Second, note that the interface overhead is almost the same as the duplication space size when the number of partitions, N, is equal to the largest number, 119, in our experiment. The trend indicates that the interface overhead will dominate the area overhead Sdp once N grows beyond 119. This implies that there will be an optimal N for area overhead minimization of a given

Cut Size (# of signals)120001000080006000400020000020406080100120140y = 1159x0.456N (# of partitions)Figure 2. Total Cut-Size of Partitioning NOVA using QPT.

122126

180.00%160.00%140.00%ReconfigurationCheckingSynchronizationTimeArea Overhead120.00%100.00%80.00%60.00%40.00%20.00%0.00%812163262119Sif Soc Sdp ...TATBTCTA...TMOC Checker Operation Time LineFigure 4. Operation time line of a TMOC checker.

IV. POWER OVERHEAD

The impact on power consumption must be taken into account when employing online checking for cost-sensitive designs. Due to the lack of eFPGA design tools and power specification document, we use the measurements of generic FPGA fabrics to estimate the upper bound of power overheads. As shown in the example in Fig. 4, the power consumption of a TMOC checker could be estimated by considering the power consumed during reconfiguration, synchronization, and checking.

1) The period of time required for each re-configuration of a TMOC checker is highly dependent on the reconfiguration mechanism and the size of the checker. For an SRAM eFPGA core, it is usually in the order of milliseconds. During reconfiguration, an FPGA core consumes more power than that during functional operations. According to the measurements of Xilinx Virtex FPGAs reported in [22], on average, the power consumption during dynamic reconfiguration is about 12% more than that of functional operations. To minimize the power, reconfigurations should not be triggered too frequently.

2) In the functional simulation of NOVA decoding QCIF frames, all duplex checkers can be synchronized with the design partition within 10 clock cycles after the reconfiguration when N is greater than 4. The power consumption during synchronization should be similar to that of a normal checking operation.

3) According to the measurement results in [16], the dynamic power consumed by a function running on an FPGA would be 12 times of that running on an ASIC. The implementation results of NOVA shows that this FPGA-to-ASIC dynamic power gap ratio is around 10 for 180nm technology. The power overhead decreases when N increases, which is similar to the situation of area overhead reduction in Section 3. Moreover, even if the number of partitions is determined, designers can still adjust the average power overhead by carefully scheduling checking time for each partition. Take an extreme case as an example, given a design, the dynamic power of its ASIC implementation is Pd, and that of its FPGA implementation is, say, 12*Pd. The design is partitioned into two modules -- A and B. The dynamic power of checker A is 5%*12*Pd and that of checker B is 95%.*12*Pd. The reconfiguration and synchronization time is negligible compared to TA or TB. If we schedule TA=TB, the average power for checking on FPGA will be 600% of the checking power on ASIC. However, if we assign much more time to checker A, e.g. TA=49TB, the average dynamic power overhead ratio becomes 81.6%. However, with such low dynamic power overhead comes at the cost of a significant increase in fault detection latency for module B.

N (number of partitions)Figure 3. Area overhead contributions from Sdp, Soc, and Sif.

design. However, area overhead is not the only design optimization goal. Designers should also consider the increasing design complexity of interfacing the eFPGA core, signal grouping, and placing and routing when a large number of partitions share one checker. For instance, we may not want to choose 119 partitions sharing one checker, although it can achieve a lower AO than that of N=62.

Third, the large logic density gap between eFPGA and ASIC, i.e., β = 12, offsets the benefit of area reduction a great deal. However, the hardware flexibility provided by generic eFPGA cores is much more than what is needed for TMOC checking. Once the partitioning process is finished, a finite number of checking functions to be mapped into the reconfigurable checker are determined. Any additional flexibility/reconfigurability beyond these checker configurations would not be used. In principle, dynamically reconfigurable fabric customized for a small set of known functions should have a logic density, power consumption and performance closer to those of an ASIC, and better than of completely flexible/reconfigurable eFPGA. Such a multiple Application Specific Integrated Circuit (mASIC) implementation will make TMOC even more attractive in terms of low-overhead online checking. In addition, the data volume of the configuration bit stream will be significantly reduced if the reconfiguration can be implemented using on-chip logic instead of off-chip memory. On the other hand, the smaller the β value, the less N will be needed to achieve the same level of area reduction. Thus, fewer partitions sharing one checker will result in less interface overhead and less design complexity. Moreover, the higher performance of mASIC will help alleviate the possibility of having a slower checker for a faster design. Note that mASIC is a unique design problem different from those were targeted by domain-specific FPGA or mask programmable structured ASIC. The requirement of dynamic re-configurability distinguishes mASIC from mask programmable ASIC which can be configured once. Domain-specific FPGA is optimized for any possible functions in a known application domain rather than for a limited number of known functions of random logic. Researchers extended their optimization methodology for domain-specific FPGA to this mASIC design problem. The case study results reported in [21] claim that their hardware can be up to 12.3 times smaller than generic FPGA solutions.

V. CONCLUSION

We evaluate the area and power overheads of a recently proposed low-overhead online checking methodology – Time-Multiplexed Online Checking. Using an H.2 decoder as the

123127

design driver, the area overheads, including the overhead caused by the lower eFPGA logic density and the module-to-checker interface overhead, were examined in details. The experimental results showed that the TMOC technique could lead to significant chip area overhead reduction for online checking. Compared to the 100% overhead incurred by a dedicated duplex checker, the overhead of a TMOC duplex checker was only 31.34% when the checker implemented in an embedded FPGA core is shared by 62 design partitions. The study on dynamic power overhead shows a similar trend. It is also shown that designers can trade the fault detection latency for power hungry partitions in order to minimize the overall power consumption.

Our future research will focus on how to reuse the on-chip reconfigurable checking facilities to improve signal testability for validation and debug.

VI. ACKNOWLEDGMENTS

The authors acknowledge various valuable suggestions made by Peter Lisherness and Sherman Chang at the University of California, Santa Barbara. The authors also acknowledge the support of the Gigascale Systems Research Center (GSRC), one of five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation program.

VII. REFERENCES

[1] M. Gao, H.-M. Chang, P. Lisherness, and K.-T. Cheng, “Time-[2]

Multiplexed Online Checking: A Feasibility Study,” IEEE Asian Test Symposium (ATS), pp. 371-376, Nov. 2008.

K. Xu, et al., “A 5-stage Pipeline, 204 Cycles/MB, Single-port SRAM Based Deblocking Filter for H.2/AVC,” IEEE

Transactions on Circuits and Systems for Video Technology, vol. 18, issue 3, pp. 363-374, 2008.

S. Mitra, and E. J. McCluskey, “Which Concurrent Error Detection Scheme to Choose?” Proceedings of 2000 IEEE International Test Conference, pp.985-994, Oct. 2000.

B. Kuchera, “Xbox 360 failure rates worse than most consumer electronics,” http://arstechnica.com/news.ars/post/20080214-xbox-360-failure-rates-worse-than-most-consumer-electornics.html, February 14, 2008,

D. P. Siewiorek and R.S. Swarz, Reliable Computer Systems: Design and Evaluation, Digital Press, 2nd ed., 1992.

M. Abramovici, C. Stroud, S. Wijesuriya, C. Hamilton, and V. Verma, “Using roving STARs for online testing and diagnosis of FPGAs in faulttolerant applications,” Proc. Int. Test Conf., 1999, pp. 973-982.

P. Bernardi, M. Rebaudengo, and M. S. Reorda, “Exploiting an I-IP for In-Field SOC Test,” In Proceedings of the Defect and Fault Tolerance in VLSI Systems, 19th IEEE International Symposium (October 10 - 13, 2004). DFT. IEEE Computer Society, Washington, DC, 404-412.

C. Zeng, N. Saxena, and. E. J. McCluskey, “Finite State Machine Synthesis with Concurrent Error Detection,”

[3] [4]

[5] [6]

[7]

[8]

Proceedings of the 1999 IEEE International Test Conference, pp. 672-679, Oct. 1999.

[9] D. Das, and N. A. Touba, “Synthesis of Circuits with Low-Cost

Concurrent Error Detection based on Bose-Lin codes,” Proc. IEEE VLSI Test Symp., pp. 309-315, 1998.

[10] N. A. Touba, and E. J. McCluskey, “Logic Synthesis of Multilevel

Circuits with Concurrent Error Detection,” IEEE Trans. CAD, vol. 16, pp. 783-7, July 1997.

[11] S. Mitra, W.-J. Huang, N. R. Saxena, S.-Y. Yu, and E. J.

McCluskey, “Reconfigurable Architecture for Autonomous Self-Repair,” IEEE Design and Test of Computers, vol. 21, no.3, pp. 228-240, May-June 2004.

[12] P. S. Bhojwani, and R. N. Mahapatra, “A robust protocol for

concurrent on-line test (COLT) of NoC-based systems-on-a-chip,” In Proceedings of the 44th Annual Conference on Design Automation. DAC '07. ACM, New York, NY, pp. 670-675. [13] K. Katoh, A. Doumar, H. Ito, “Design of On-Line Testing for

SoC with IEEE P1500 Compliant Cores Using Reconfigurable Hardware and Scan Shift,” 11th IEEE International On-Line Testing Symposium, pp.203-204, 2005

[14] V. V. Kumar, and J. Lach, “Heterogeneous redundancy for fault

and defect tolerance with complexity independent area overhead,” In Proceedings of International Symposium on

Defect and Fault Tolerance in VLSI Systems, 2003, pp. 571-578. [15] S. Shamshiri, P. Lisherness, S.-J. Pan, and K.-T. Cheng, “A Cost

Analysis Framework for Multi-core Systems with Spares,” in IEEE International Test Conference (ITC), Oct. 2008. pp. 1-8. [16] I. Kuon, and J. Rose, “Measuring the gap between FPGAs and

ASICs,” In Proceedings of the 2006 ACM/SIGDA 14th

International Symposium on Field Programmable Gate Arrays. FPGA '06. ACM, New York, NY, pp. 21-30.

[17] M2000 Inc. “M2000 Intros Largest 90nm eFPGA, Design and

Reuse,” http://www.design-reuse.com/news/9614/m2000-intros-largest-90nm-efpga.html, Feb. 7, 2005

[18] Menta Inc. “MENTA eFPGA Core-II Data Sheet Brief,”

http://www.menta.fr/down/DatasheetBrief_eFPGA-core-II.pdf, Feb. 16, 2009

[19] M. Hsieh, and C. Huang, “An embedded infrastructure of debug

and trace interface for the DSP platform,” In Proceedings of the 45th Annual Conference on Design Automation. DAC '08. ACM, New York, NY, pp.866-871.

[20] B. Quinton, S. Wilton, “Concentrator Access Networks for

Programmable Logic Cores on SoCs,” In the IEEE International Symposium on Circuits and Systems, May 2005, pp. 45-48 [21] K. Compton, and S. Hauck, “Automatic Design of Area-Efficient Configurable ASIC Cores,” IEEE Trans. Comput. 56, 5 (May. 2007), pp. 662-672.

[22] J. Becker, M. Huebner, and M. Ullmann, “Power Estimation and

Power Measurement of Xilinx Virtex FPGAs: Trade-Offs and Limitations,” In Proceedings of the 16th Symposium on integrated Circuits and Systems Design (September 08 - 11, 2003). SBCCI. IEEE Computer Society, Washington, DC, 283.

124128

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文