A Virtual Prognostic Tool for Nuclear Power Electronics Reliability

Amer B. Dababneh¹, Ben Goerdt², Timothy Marler³ and Ibrahim T. Ozbolat⁴

¹The University of Iowa, amer-dababneh@uiowa.edu
²The University of Iowa, benjamin-goerdt@uiowa.edu
³The University of Iowa, tmarler@engineering.uiowa.edu
⁴The University of Iowa, ibrahim-ozbolat@uiowa.edu

ABSTRACT

This research highlights new developments in a high fidelity virtual environment that allows prediction of total life time, overall reliability and maintainability for circuit cards and their components, through a new simulation methodology. This work demonstrates the application of statistical models to circuit cards, and the ability to predict system and sub-system performance based on component data. Quantitative accelerated life tests are designed to quantify the life of circuit cards under different thermal stresses. This research allows the user to identify the components that contribute the most to downtime and to determine the effect of design alternatives on system performance in a cost-effective manner. Most significantly, this work has proven the feasibility of a novel platform for physics-based reliability analysis.

Keywords: printed circuit boards, reliability, maintainability, virtual manufacturing.

1. INTRODUCTION

Design for reliability in electronics products was accomplished with the introduction of reliability prediction tools in the 1960s. One of the most widely used tools was Mil-HDBK-217 [4,12]. Based on this standard, various commercial software applications were implemented to facilitate the estimation of the product reliability [10]. However, the lack of accuracy and slow pace of updating the databases have limited the usage of these methods [4]. Currently, MIL-HDBK-217F Notice 2, dated 28 Feb 1995, is an active military handbook; however, this handbook has not been modified since 1995.

Although a plethora of research has been conducted to improve printed circuit board (PCB) reliability in the context of mechanical reliability (i.e., solder joint reliability and its fatigue life) [1,5], limited research has been performed at the board level (i.e., the entire PCB). Circuit cards under operation fail mainly due to several stressor including thermal, vibrational, electromagnetic interference, aging and corrosion, to which the part is subjected. In particular, an increase in thermal stresses directly increases the failure rate and ultimately decreases the reliability dramatically [3]. High temperatures impose a severe stress on most electronic items since they can cause not only catastrophic failure (such as melting of solder joints), but also slow, progressive deterioration of performance levels due to chemical degradation effects. Kallis and Norris stated that “excessive temperature is the primary cause of poor reliability in electronic equipment” [9]. For example, for every 10°C Celsius rise in temperature, the failure rate of most electronic components doubles [11].

This research is an extension of our work that was originally reported in ISERC conference, 2013 [4]. It aims to develop a predictive remaining lifetime, reliability and maintainability analysis model of circuit cards for both component and system level. A new board-level methodology is developed to predict the reliability, maintainability, and lifetime for PCB components and then integrated within an immersive visual environment called Predictive Environment for Visualization of Electromechanical Virtual Validation (PREVIEW). PREVIEW is an interactive 3D environment that includes predictive physics based on capabilities to support virtual testing of PCBs [13]. It enables product designers to assess potential design shortcomings based on virtual physics-based test capabilities, thus reducing the time and cost associated with developing and testing several iterations of prototypes prior to production. This gives the benefit of flexibility
and capability to perform a large number of "what-if" computations for early evaluation of the occurrences and analysis of the causes, minimizing the risk of the flight test activities, simulating hazardous conditions, evaluating the manufacturing process, and performing capacity analysis. In this research, PREVIEW is used as a software package that displays the developed model and offers a versatile environment that accepts modifications. This will enable new applications and interfaces with tiered solutions that can be easily implemented and eventually provides significant improvement in the reliability, maintainability, and lifetime of the PCB and its components.

In this paper, the reliability and lifetime of PCBs and their components are modeled, and PREVIEW is used to display the results in a visual environment that gives the user the ability to predict the reliability, lifetime, and failure rate of the PCB under thermal stresses at any time. The rest of the paper is organized as follows: Section 2 discusses model development. In Section 3, implementation and interface development is presented. In Section 4, conclusions are drawn.

2. MODEL DEVELOPMENT

2.1. System- and Component-level Lifetime Analysis Using Simulation Methodology

In nature, the occurrence of some events is often imperfect; indeed, they may seem to occur at random. However, when an event is observed over a large sample or a long period of time, there may appear a definitive "mechanism" that causes the event to occur. The alternative is to estimate the behavior using techniques involving data sampling. One simple strategy to determine the data behavior is to know its distribution. Therefore, the collected time to fail (TTF) data is fitted in the cumulative distribution function $F(t)$ to find its best fit distribution [4]. In this research, the goodness-of-fit tests approach is used to check distribution assumptions. This approach is considered more formal and reliable for assessing the underlying distribution of a data set. The Kolmogorov-Simonov (KS) test is a distance goodness-of-fit test that can be used for small or large sample sizes. The KS test uses the cumulative distribution function (CDF) [7]. In a distance test, when the assumed distribution is correct, the theoretical (assumed) CDF (denoted by $F_0$) closely follows the empirical CDF (denoted by $F_n$) [7]. First, the TTF data for each PCB component is sorted in ascending order, and the empirical cumulative distribution function ($F_n$) is found for each PCB component. Then, Weibull, exponential, normal, and lognormal distribution parameters are estimated in order to find the theoretical cumulative distribution function ($F_0$) for each PCB component [4]. The maximum absolute distance between the theoretical and empirical distributions $|F_0 - F_n|$ is found using one of the mentioned distributions. Depending on the KS logic, the component TTF data set was likely to follow the assumed distribution if the maximum absolute distance between the theoretical and empirical distributions $|F_0 - F_n|$ of that distribution is less than other distributions’ maximum absolute distance $|F_0 - F_n|$. As a result, it represents the best fit distribution of that component TTF data [4]. In this way, a random number (between 0 and 1) was generated using a random number generator code (Monte Carlo simulation). This number was used as a cumulative probability under a component best fit assumed distribution, to find a new TTF (using inverse CDF) that represents the PCB components’ upcoming time to fail. After obtaining the random number, we inserted it into the performance function and computed a new TTF. The lifetime range for each component was calculated based on the TTF data sample using the following equation:

$$\bar{X} \pm 1.5(S) \quad (2.1.1)$$

where $\bar{X}$ is the mean time to fail (MTTF) and $S$ is the standard deviation of the TTF data for each component.

In this research, a new simulation methodology was established to calculate the life time, reliability and maintainability of the entire circuit, starting with the age of each of its component; where the time for the next failure for each component is equal to any time between the maximum TTF of that component and its current age [4]. If maximum TTF is less than the age, a new TTF is generated by the developed random number generator code. Once the simulation timer starts, the component with the smallest TTF fails first, resulting in a reduction in the time needed for other components to fail sequentially. Once the component with the smallest TTF fails, its position on the circuit card is checked. Series, parallel, series-parallel, parallel-series, and bridge configurations are considered during this simulation; as if the component is a part of a parallel cluster or bridge configuration, then its failure does not stop the operation of the entire card. Connectivity information of a PCB is obtained by directly reading the Standard for Exchange of Product (STEP) model data.

Approaches for component repair or replace, provided by this simulation give the user the ability to choose between different components based on experience and component history. Once the position of the component (with the min TTF) is determined, a "time to repair" or "time to replace" value is assigned to the component if its failure causes the failure of entire card, as follows:

- If the failed component is in a series configuration, and not in any parallel cluster, then a "time to repair" or "time to replace" value is assigned to that component, and the assigned "time to repair" or "time to replace" will decrease until it reaches zero. Then a new TTF value and a

new “time to repair” or “time to replace” value are assigned for that component, and the entire circuit resumes operation. While the circuit is in failure mode, the TTF values of other components freeze at the time where the failed component stopped operating. Once the circuit resumes operation, those components continue operating from the same point where the failed component stopped.

- On the other hand, if the failed component is part of a parallel cluster, then its failure does not cause the failure of the entire parallel cluster, as the entire parallel cluster fails only-and if only- all its networks stop operating. While each network includes one or more component, failure of one component in this network causes the failure of the entire network. However failure of one network does not cause the failure of the entire parallel cluster, where the parallel cluster keeps operating until the last network fails, which is the network with maximum TTF.

In conclusion, the entire card does not fail unless one of the components in series fails or the entire parallel cluster fails. “time to repair” or a “time to replace” value is assigned to the failed component if its failure causes the failure of the entire PCB by using a random number generates code and the best fit distribution of that component TTR. The discrete event simulation keeps running until the end of the simulation run length, which is assigned at the beginning of the run. The above methodology is presented in the flow chart in Figure 1 [4] to clarify the procedure.

2.2. Component and System Level Reliability
Reliability is a quantitative measure of non-failure, which is expressed by: $R(t) = 1 - F(t)$, where $F(t)$ is

![Simulation flow chart](image)

Fig. 1: Simulation flow chart that represents the new methodology for estimating overall system reliability and lifetime.
the best CDF for each component based on the TTF data set. Using CDF at a specific time for each component, reliability can be calculated and assigned for all PCB components. The use of appropriate data can help in ensuring adequate component life in a specific application, as well as in projecting anticipated component reliability. Therefore, the reliability is the probability of no failures in the interval [0, t], or it's the probability of failure after time t. In this research, in-service PCBs are considered for reliability modeling, where components had been used for a period of time so that each component has an age. Thus, reliability is calculated beyond the age of each component, not after time zero. The reliability of a component after that age is a conditional reliability, using Bayes’ rule: P [no failure | x, x + t] / P [no failure | 0, x)]. The reliability after time (t), thus, is equal to:

\[ R(x + t) / R(x) \]  

(2.2.1)

PREVIEW is used to display the component reliability in its visual environment, where the user has the ability to choose any component on the PCB by clicking that component, and its reliability over a specific period of time appears (see Figure 2).

![Fig. 2: A snapshot from PREVIEW showing PCB component reliability.](image)

On the other hand, we make use of the reliability probabilistic feature for the entire PCB; this allows one to calculate reliability quantitatively. Since one of the simulation outputs is the next failure time of the circuit card, we make use of the reliability probability property. Equal rank method (i/n) is used to calculate the reliability; as we count card failures occurred at a desired time (or higher) among all replications and divide the outcome by the number of replications. The result represents the card level reliability at that specific time. PREVIEW is used to display the entire PCB reliability. When the user clicks on the “Compute Reliability” button and then clicks on “Graph” under the system tab on the PREVIEW screen, the entire PCB reliability over a period of time appears.

2.3. Calculation of Reliability between Two Nodes

New criterion that creates a new feature in the reliability prediction model is presented in this paper; where reliability between any two nodes on a PCB can be calculated. This can give the user a new feature in calculating the reliability in any interested partition of the PCB, for example partition under thermal stress. In a reliability network, often referred to as a reliability block diagram (see Figure 3) [4], components are in series from a reliability point of view if they all must run for system success or if only one needs to fail for system failure. Reliability block diagram in this research is extracted in PREVIEW virtual environment. Let R_s represent the system reliability and Q_s represent the probability failure; reliability and probability failure can then be calculated by using the following equation [4, 6, 7]:

\[ R_S = \prod_{i=1}^{n} R_i \]  

(2.3.1)

At the beginning, all series components in each network in the PCB are found; if Terminal 2 of a component and Terminal 1 of another component are on the same network ID, then those two components are considered to be in a series configuration and Equation (2.3.1) is used to find their total reliability. As a result, all paths between any connected components are found and registered. A C++ code was established to find these paths based on the network ID of each component. Eventually, the reliability of those paths at a specific time is calculated by using Equation (2.3.1), where the highest path reliability represents the minimum reliability between the two nodes connected by those paths. PREVIEW Virtual Meter was created in order to display the reliability between any two nodes on a PCB. The user can place the two probes of the Virtual Meter on any two nodes on a PCB, and then the total reliability between those two nodes is calculated and displayed on the virtual meter (see Figure 4).

2.4. Component- and system-level Maintainability

On repairable system, maintenance actions can be carried out to restore system components to operate again when they fail. These actions should be taken into consideration when evaluating the behavior of the system, where monitoring the effectiveness of electronics maintenance is essential for implementation of the maintenance rules and policies. Maintainability determines the probability that a failed component can be restored to its normal operable state within a given time frame [6]. In maintainability, the random variable is “time-to-repair,” in the
same manner as “time-to-fail” is the random variable in reliability. Maintainability can be calculated by using CDF for “time to repair” (TTR). However, since the maintainability represents the probability of an event occurring while the reliability represents the probability of an event not occurring, the maintainability expression is the equivalent of the unreliability expression, \( (1-R) \).

In this research, PREVIEW is used to display the component maintainability, where the user has the ability to choose any component on a PCB by clicking on that component, and its maintainability graph over a specific time period is generated and displayed on PREVIEW display as shown in Figure 5.

For the entire PCB, we make use of the maintainability probabilistic feature. Since one of the simulation outputs is “time to repair” (for those components in which their failure leads to the entire PCB failure), we count “time to repair” values, which are equal to or lower than a desired time from all replications, and then divide the outcome over the number of replications. The result represents the maintainability at that desired time.

2.5. Thermal Acceleration Factor and its Effect on Lifetime

It was stated that excessive temperature is the primary cause of poor reliability in electronic equipment [4,9]. The Arrhenius life-stress model is the most common life-stress relationship utilized in accelerated life testing. It has been widely used when the stimulus or acceleration variable is thermal [2,11]. It is derived from the Arrhenius reaction rate equation [4]:

\[
R(T) = A e^{(-E_a/kT)}
\]

(2.5.1)

where \( R \) is the speed of reaction; \( A \) is a constant that depends on material characteristics; \( E_a \) is the activation energy (eV) (the energy that a molecule must have to participate in the reaction and a measure of the effect that temperature has on the reaction); \( k \) is Boltzmann’s constant \( (8.617 \times 10^{-5} \text{ eV K}^{-1}) \); and \( T \) (Kelvin) is the absolute temperature. The Arrhenius life-stress model is formulated by assuming that life is proportional to the inverse reaction rate of the process [2,12]. Thus, the Arrhenius life-stress relationship is given by [4]:

\[
L(T) = C \exp \left( \frac{B}{T} \right)
\]

(2.5.2)

where \( L \) represents a quantifiable life measure, such as mean life; \( T \) represents the stress level (formulated for temperature and temperature values in absolute units, i.e., degrees Kelvin); \( C \) is one of the model parameters to be determined, where \( C > 0 \); and \( B \) is another model parameter to be determined, where \( B = \frac{E_a}{k} \). In this formulation, the activation energy must be known. One method to alleviate the problem of selecting the most representative activation energy is to estimate the value based on collected data, as the distribution analysis helps in understanding the lifetime characteristics of a PCB. Most practitioners use the term “acceleration factor (AF)” to refer to the ratio of the life between the use level and a higher stress test level. Acceleration factors show how TTF at a particular operating stress level can be used to predict the equivalent TBF at a different operating stress level. In this paper, quantitative accelerated life tests (QALT) under thermal stresses are designed to quantify the life of the product and generate the data required for accelerated life data analysis [4].

\[
AF(T) = \frac{L(\text{use})}{L(\text{accelerated})} = \frac{MTBF(T_0)}{MTBF(T_1)}
\]

(2.5.3)

The analysis of accelerated tests relies extensively on data. In particular, analysis relies on life and stress
data or TTF data at a specific stress level. Finding the activation energy value is a complicated issue, which depends mainly on the variability of the dominant component failure mechanisms. Some reliability standards, such as the global methodology for reliability engineering in electronics (FIDES), solve this complexity by calculating the general activation energy based on all failure mechanisms, failure percentages, and component process technology (e.g., bipolar logic, CMOS logic). Based on such standards, they found that the typical activation energy is generally approximated as 0.7 eV [2]. By using the component lifetime under normal use, the lifetime of each PCB component can be calculated under any thermal stress using Equation (2.5.3). In this research, a methodology is established to determine temperature effect on PCB failures. The lifetime of each PCB component under thermal stress ($T_1$) can be found by using the lifetime of that component under normal conditions, which is found in the component level lifetime methodology presented in Section 2.1 of this paper.
that component and of the entire PCB appears on the PREVIEW display (see Figure 6). In Figure 6(a), a 3D graph represents how temperature (thermal stress) affects component level reliability, as the reliability slope gets steeper with the raise in temperature. In other words, component level reliability decreases by increased stress level on the component. As can be depicted from the graph, higher thermal stress triggers decrease in the reliability for a component. Figure 6(b) illustrates how the temperature affected the lifetime of circuit card components. The lifetime is decreased by approximately 50% with an increase in 10 °C in the temperature, where the x-axis represents the increase in temperature over the normal (used) temperature. Figure 6(c) illustrates the effect of 10°C increase in temperature of components under thermal stress on the system level reliability.

3. IMPLEMENTATION AND INTERFACE DEVELOPMENT

An often neglected challenge with analysis tools that generate large amounts of data is organizing and presenting such data. Thus, with this work, significant effort went into implementation of the above-mentioned capabilities with in a growing software platform, as well as the actual interface development. The following summary is based on the work by Goerdt et al. [8]. As a foundation, the proposed software platform leverages advances with gaming technology, which unrelentingly pushes forward the fields of graphics and visualization. The infusion of these technologies with engineering has advanced the development of analysis tools. Thus, PREVIEW has been developed using a game-prototyping rendering engine called Virtools. With this engine, PCBs as well as printed circuit assemblies (PCAs) can be displayed in high-resolution 3D for detailed visual analysis and testing, as shown in Figure 7.

Before actually interacting with a PCB or PCA, it is necessary to import mechanical and electrical models. Thus, a module has been developed to read STEP files. STEP files provide a standardized method of representing product model data and are utilized in PREVIEW. Each STEP file contains a multitude of information about the PCBs and assemblies it refers to, often reaching 50,000 or more lines of data. A method for automated interpretation of these files, called a STEP post-processor, consists of a parser and an object-oriented database (OODB) object creator. For PREVIEW, the STEP-OODB objects that represent geometric objects are retrieved and translated into Open Computer Aided Software for Computer Aided Design and Engineering (Open CASCADE) objects, and then converted to Virtools objects so that they can be displayed within the Virtools environment. PREVIEW has many capabilities that make it unique among STEP visualizers. The ability to store objects in a
database for concurrent engineering, while still taking advantage of the capabilities of Open CASCADe for rendering, the ability to view each layer of the board separately while being able to rotate and zoom in, and allowing for connectivity traceable through the product structure tree in a multi-board context all contribute to this. Most significantly, PREVIEW allows one to view mechanical and electrical design, which can often be mismatched as a result of uncoupled design processes.

Despite its advantages for quickly developing and testing CAD capabilities, Virtiools does present some challenges. It is not ideal for developing Graphical User Interfaces (GUIs). Virtiools also does not support parallel development for projects with multiple team members. Consequently, updates, fixes, and additions must be consolidated manually. Finally, there is no explicit functionality for creating graphs in Virtiools, so external tools for displaying reliability data had to be explored and tested. Various tools were investigated including Mathematica, Boost, MathGL, and GNUplot. MATLAB was selected due to the ability to write code in C++ that calls the MATLAB engine in order to graph data externally, as shown in Figure 6.
Given the ability to import various electromechanical systems and visualize them, a new interface was developed for the proposed reliability-analysis capabilities and is shown in Figure 8. Note that the numerical feedback has been expanded in this figure for clearer illustration. Once a PCB and its components have been rendered, the user can click 'Compute Reliability' to run the reliability simulation and 'Compute Maintainability' for further output. Useful data will be displayed the system as well as any component that is selected. System and component reliability and maintainability plots are provided with the graphing feature.

One of the advantages of the newly developed capabilities is the ability to study cause-and-effect relationships in real time. Any component’s average lifetime, time to repair, time to replace, age, and thermal stress temperature can manipulated in order to see the resulting effects on the reliability data for the components and the system, as well as in plots displaying the effect of thermal stress on reliability and lifetime. In this way, the effects of potential replacements or stresses can be tested quickly in a risk-free environment.

A significant amount of useful information can be ascertained using PREVIEW’s component-based thermal analysis feature. For any thermally modeled component, temperature-distribution data are obtainable. Each type of circuit card component is unique in its construction, so they all have different material properties and internal geometries. The components that are shown with PREVIEW interface are cylindrical resistors, cylindrical diodes, cylindrical capacitors, and a specific integrated circuit.

In addition to visualizing reliability data, it can be useful to see the results of each component-based
thermal analysis, which provides input to the reliability model. However, there were several challenges in the integration of the component thermal analysis formulation with PREVIEW, the foremost being the visualization of point clouds within the Virtools rendering environment. In Virtools, the data points within a point cloud are opaque, which means that when all points are displayed, only the outermost layer can actually be seen. To overcome this, a method called point cloud segmentation was used. Point cloud segmentation is a sorting algorithm used to represent one large point cloud as a series of smaller point clouds. It was implemented slightly differently for each method of visualizing temperature distribution. For the first method, the data points were segmented by temperature. With the second method, the data points were segmented by the coordinates.

PREVIEW offers two primary methods of visualizing temperature distribution within a component. The first method involves viewing each temperature range as a colored point cloud (a 3D mapping of data points) and overlaying all of the point clouds on top of each other. This formulation allows the user to view individual temperature ranges simply by selecting the corresponding check boxes on the sidebar. Displayed next to each check box is the value of the temperature range it represents (Figure 9(a)). This can be useful in situations where it is important to know if the component, or even an element within the component, has reached a certain temperature.

The second method of visualization involves splitting the temperature data into point clouds organized by location (see Figure 9(b)). In this case, the check boxes on the sidebar allow the user to select which quadrant of the point cloud they wish to view. The advantage of this method is that it allows the user to create geometric slices through the part.

4. CONCLUSION

In this paper, we have researched, developed, and demonstrated the technology that responds to the limitations of current virtual testing and predictive remaining lifetime, reliability and maintainability analysis of PCB. The developed virtual environment allows prediction of total life time, overall reliability and maintainability for entire circuit cards (system level) and their components (component level) through a new simulation methodology. Component repair or replace approaches are developed within this simulation tool, which gives the user the ability to choose between them based on experience and component history. In addition, this research provides a better understanding of overall system failure characteristics for any given configuration. It allows the user to identify components which contribute the most to downtime and to determine the effect of design alternatives on system performance in a cost-effective manner.

This research provides effective methodologies for determining where corrective action may be particularly helpful, and it helps predict the overall system failure characteristics for any given configuration. It provides a powerful process that utilizes failure information from a system's component in order to develop probability distributions for whether the system will be able to perform its intended function. It helps in identifying components that contribute the most to downtime and in determining the effect of design alternatives on system performance in a cost-effective manner (i.e., using virtual modeling rather than prototype testing). While excessive temperature is the primary cause of poor reliability in electronic equipment, quantitative accelerated life tests are designed to quantify the life of the PCB under different thermal stresses and produce the data required for accelerated life data analysis. This thermal stress methodology can help in making design decisions that meet the system reliability requirements, as well as determining the maximum allowable component temperature.

ACKNOWLEDGEMENT

This research is funded by the Electric Power Research Institute (EPRI) with the grant number “EP-P42490/C18498.”

REFERENCES


