In: Journal of Parallel and Distributed Computing 15, pages 238-254. 1992.
Abstract: Dependability evaluation is an important, but difficult, aspect of the design of fault-tolerant parallel and distributed computing systems. One possible technique is to use Markov models but, if applied directly to realistic designs, this often results in large and intractable models. Many authors have investigated methods to avoid this explosive state-space growth, but have typically either solved the problem for a specific system design, or required manipulation of the model at the state-space level. Stochastic activity networks (SANs), a stochastic extension of Petri nets, together with recently developed reduced based model construction techniques, have the potential to avoid this state-space growth at the SAN level for many parallel and distributed systems. This paper investigates this claim by considering their application to three different systems: a fault-tolerant parallel computing system, a distributed database architecture, and a multiprocessor-multimemory system. We show that this method does indeed result in tractable Markov models for these systems, and argue that it can be applied to the dependability evalution of many parallel and distributed systems.