In: IEEE Trans. Comput., Vol. 38, No. 6, pages 775-787. June 1989.
Abstract: Several different models for predicting coverage (the probability that a system recovers when a fault occurs) in a fault-tolerant system are developed, especially extended stochastic Petri nets. Two types of events that interfere with recovery are examined and methods for modeling such events are given. The sensitivity of system reliability/availability to the coverage parameter and the sensitivity of the coverage parameter to various error-handling strategies are investigated. It is found that a policy of attempting transient recovery upon detection of an error can actually increase the unreliability of the system.
Keywords: coverage prediction; failure prediction; dependability analysis; fault-tolerant system; extended stochastic net; error-handling strategy.