Software-intensive systems have become pervasive in media facilities, and as systems and their components fail in new ways, the traditional design strategy of reliability through hardware redundancy has become less effective. Designers must consider not only hardware failures arising from physical phenomena, which tend to be independent between units or correlated only with factors such as equipment age and temperature, but also software failures, which are more likely to affect multiple units simultaneously and to propagate spontaneously between units. This tutorial paper includes a brief review of reliability engineering fundamentals, including terminology and basic measures; reliability of series, parallel, and R out of N systems; failure probability distributions and the “bathtub” life-characteristic curve; and traditional reliability prediction methods for electronic equipment and systems based on part failure probabilities.
