A counting process representing the number of failures experienced in a given period of time by a system is proposed as a stochastic model for studying the reliability of the developed software. A Non Homogeneous Poisson Process (NHPP) with its mean value function specifed by a Pareto model is considered. Its parameters are estimated to assess the reliability of a software system. The results are illustrated for a live software failure data.
INTRODUCTION
Several studies have been undertaken to investigate the software error occurrence phenomenon. The objective of such studies is to improve software performance. These studies can be placed in one of the two categories. The first category emphasizes empirical analysis of data collected from software projects. The second category deals with the development of models for quantitative assessment of software performance.
Software reliability engineering is a discipline that ensures failure free operation of software at the user end by employing scientific techniques to remove the maximum number of faults. The quality of the software system has many attributes such as maintainability, portability, usability, security, reliability, availability, etc. Software reliability is the most dynamic attribute which can measure and predict the operational quality of the product.
The process of locating the faults in software to remove them is called the debugging process. The chronology of failure occurrence and fault removal can be utilized to provide an estimate of the software reliability and the level of fault content. Theory of probability plays a major role in software reliability model building. A software system is subject to failures at random times caused by errors present in the system. Let {N(t), t>0} be a counting process representing the cumulative number of failures by time t. Since, there are no failures at t = 0, we have:
![]() |
(1) |
It is reasonable to assume that the number of software failures during non overlapping time intervals do not affect each other. In other words for any finite collection of times t1<t2< .... <tn, the n random variables N(t1), {N(t2) - N(t1)}, .... {N(tn) - N(tn - 1)} are independent. This implies that the counting process {N(t), t>0} has independent increments. Several studies have been undertaken to investigate the software error occurrence phenomenon.
The objective of such studies is to improve software performance. These studies can be placed in one of the 2 categories. The first category emphasizes empirical analysis of data collected from software projects. The second category deals with the development of models for quantitative assessment of software performance. Let m(t) represent the expected number of software failures by time t. Since, the expected number of errors remaining in the system at any time is finite, m(t) is a bounded, non decreasing function of t with the following boundary conditions:
![]() |
(2) |
where, a is the expected number of software errors to be eventually detected. Suppose N(t) is known to have a Poisson probability mass function with parameter m(t):
![]() |
(3) |
Then N(t) is called an NHPP. Various time domain models have appeared in the literature which describe the stochastic failure process by an NHPP which differ in the mean value functions m(t). Some of them are due to Goel and Okumoto (1979), Littlewood (1981), Yamada et al. (1986), Musa et al. (1987), Hossain and Dahiya (1993) and Pham (2005). The mean value function is a non negative non decreasing function of t with a limit as t → ∞. Specifically, all these characteristics are through with the cumulative distribution function of a continuous random variable with the additional property as its limit is 1 as t→∞. Exploring this property of a distribution function multiplied by a positive constant is considered as the mean value function by many researchers to develop a number of software reliability growth models through NHPP. On these lines following Wood (1996) motivated by Littlewood (1981), the researchers have considered an NHPP with mean value function defined through the cumulative distribution of a Pareto random variable to propose it as an SRGM.
THE PROPOSED SRGM
In this study, we consider m(t) as given by:
![]() |
(4) |
where, [m(t)/a] is the cumulative distribution function of Pareto distribution of type IV (Johnson et al., 2004) for the present choice :
![]() |
(5) |
Which is also a Poisson model with mean a. Let be the number of errors remaining in the system at time t:
![]() |
(6) |
Let Sk be the time between (k - 1)th and kth failures of the software product. Let Xk be the time up to the Kth failure. Then the probability that Sk exceeds a real number s, given that the total time up to the (k-1)th failure is equal to x is:
![]() |
(7) |
This expression is called software reliability and is denoted by:
![]() |
(8) |
ESTIMATION OF PARAMETERS
Let S1, S2 be a sequence of times between successive software failures associated with an NHPP N{t}. Let Xk be equal to:
![]() |
which represents the time to failure K. Suppose we are given n software failure times say x1, x2, ..., xn (i.e.), there are n time instants at which the 1st, 2nd, 3rd, ...., nth failures of a software are observed. This is a special case of a life testing experiment in which only one product is put to test and its successive failures are recorded alternately separated by error detections and debugging.
The likelihood function of such sample data is:
![]() |
(9) |
![]() |
(10) |
Then, the log likelihood equations to estimate the unknown parameters a, α and σ are given by:
![]() |
(11) |
![]() |
(12) |
![]() |
(13) |
The Eq. 11-13 are to be solved iteratively. When α is assumed to be known only one equation that of σ has to be solved by numerical methods to proceed for further evaluation of reliability measures. In order to overcome, the iterative technique for solving σ, one may go for the method of modified ML procedure with the following approximations:
![]() |
Where;
![]() |
The log likelihood Eq. 13 for σ can be written as:
![]() |
With the suggested approximation it becomes:
![]() |
(14) |
We can solve Eq. 14 for σ to get MMLE. The corresponding estimate of a can be obtained by the Eq. 11. An estimator of software reliability can also be obtained from Eq. 7.
APPLICATIONS
The procedures narrated and derived are explained by software failure data taken from Jelinsky and Moranda (1972). The data are originally from the U.S. Navy Fleet computer programming centre and consist of the errors in the development of software for the real-time, multi computer complex which forms the core of the Naval Tactical Data System (NTDS).
The NTDS software consisted of some 38 different modules. Each module is supposed to follow 3 stages; the production (development) phase, the test phase and the user phase.
The data are based on the trouble reports or software anomaly reports for one of the larger modules, denoted as A-module. The times (days) between software failures and additional information for this module are shown in Table 1.
Total 26 software errors are found during production phase and 5 additional errors during the test phase. One error is observed during the user phase and 2 more errors are noticed in a subsequent test phase indicating that a network of the module has taken place after the user error is found. The 26 data points are considered as an ordered random sample of size 26 supposed to have come from a Pareto distribution with parameters a, σ.
Table 1: | NTDS data |
![]() |
|
1: Error Number N, 2: Time between errors Sk days, 3: Cumulative time xn = ΣSk days |
This supposition is verified using the q-q plot-correlation coefficient method at a = 2, 3, 4. It is found that Pareto distribution first reasonably well to the present data. Hence, the log likelihood equation of a and σ are solved simultaneously by Newton-Raphson method at a = 2, 3, 4 in succession at a = 2, 3, 4. The ML estimates so obtained are at:
![]() |
The value of L, the likelihood function is maximum at the triplet:
![]() |
Hence, we may accept the 3 values as ML estimates of a, α, σ. The estimator of the reliability function from Eq. 7 at any time x beyond 250 days is given by:
![]() |
COMPARISON
For the NTDS data Goel and Okumoto (1979) have obtained ML estimates of a and σ, respectively as:
![]() |
For the same data with the SRGM (Pareto approach):
![]() |
A choice between these two models for the data under consideration can be made with reference to a number of criteria. One of them is that model which gives maximum joint probability with the data and the corresponding estimates. In other words that model which gives more value for the likelihood function can be decided as a better fit. We evaluated, the values of likelihood functions under Goel and Okumoto (1979). NHPP and its estimates, Pareto structure and its estimates. These are as fallows: Natural log of Goel and Okumoto (1979) structure at the sample and the associated estimates is:
![]() |
For the same structure, the value for the proposed model is -72.295455. This shows that the present model is a better fit than that of Goel and Okumoto (1979).
CONCLUSION
The researchers proposed an SRGM through NHPP based on Pareto type mean value function and suggested a modified ML estimator of its parameters to assess software reliability given a software failure data. Its goodness of fit for a live data is compared. For the same data with a popular SRGM namely Goel and Okumoto (1979) with the help of likelihood functions, we found that the proposed model has more likelihood value than that of Goel and Okumoto (1979) claiming to be a better fit.
Khurshid Ahmad Mir. A Software Reliability Growth Model.
DOI: https://doi.org/10.36478/jmmstat.2011.13.16
URL: https://www.makhillpublications.co/view-article/1994-5388/jmmstat.2011.13.16