Software fault tolerance by design diversity

Fault tolerant system dependabilityexplicit modeling of hardware and software componentinteractions. Despite more and more improvements in fault preventing techniques, it is a fact that faults remain in every complex software system. Why fault tolerance 1 in spite of fault avoidance, design errors in both hardware and software components will exist system testing can never be exhaustive and remove all potential faults. Dd has been said to be orthogonal to design diversity 8. The assumption is the design diversity of software, which itself is difficult to achieve. Thus, in the most simple case we have the well known duplex system. Three major design issues need to be considered while building software fault tolerant. Existing methods to provide fault tolerance at execution time rely on redundant software written to the same specifications. Fault tolerant systems are also widely used in sectors such as distribution and logistics, electric power plants, heavy manufacturing, industrial control systems and.

Multiplecomputations are implemented by nfold n 2 2 replications in three domains. Although the replication is a very important concept in faulttolerant systems, when it is necessary to create faulttolerance in software, the solution of replication is insufficient. It considers the theoretical and experimental research undertaken in this eld together with some of the more. In order to complement design diversity in the quest for faulttolerance software, there exits several data diversity techniques which are similar to the aforementioned for the design diversity approach. Nversion programming nvp is one of the software fault tolerance techniques based on design. This chapter concentrates on software fault tolerance based on design diversity. Abstractnowadays the reliability of software is often the main goal in the software development process.

If design fault detection is required, design diversity in the software has to be used, too. Basic fault tolerant software techniques geeksforgeeks. Fault tolerance in distributed systems, pankaj jalote, ptr printice hall, 1994. Checkpoint and restart using data diversity with input reexpression. A test can only be used to show the presence of faults, not their absence.

To tolerate faults, both of these techniques rely on design diversity, i. Coverage includes fault tolerance techniques through hardware, software, information and time redundancy. The different areas of software diversity are discussed in surveys on diversity for fault tolerance or for security. Cost a fault tolerant system can be costly, as it requires the continuous operation and maintenance of additional, redundant components. Backgroundover recent years, software developers have been evaluating the benefits of both serviceoriented architecture soa and software fault tolerance techniques based on design diversity. Fault tolerance through automated diversity in the management. Each channel is designed to provide the same function, and a method is provided to identify if one channel deviates unacceptably from the others. It offers you a thorough understanding of the operation of critical software fault tolerance techniques and guides you through their design, operation and performance. Software fault tolerance is a necessary part of a system with high reliability. Software fault tolerance techniques are designed to allow a system to tolerate software faults that remain in the system after its development. Apr 20, 2012 the complete text of software fault tolerance, written by michael r. Such techniques use design diversity to tolerate residual faults.

Systematic and design diversity software techniques for. Fault tolerance is a required design specification for computer equipment used in online transaction processing systems, such as airline flight control and reservations systems. Look to this innovative resource for the most comprehensive coverage of software fault tolerance techniques available in a single volume. Assessment of data diversity methods for software fault. Software fault tolerance techniques and implementation. This is because program faults often cause failure only under. Design diversity is a solution to software fault tolerance only so far as it is possible to create diverse and equivalent specifications so that programmers can create software which has different enough designs that they dont share similar failure modes. Both schemes are based on software redundancy assuming that the events of coincidental software failures are rare. Software fault tolerance cmuece carnegie mellon university. Such techniques use datu diversity to tolerate residual faults. While there is clear evidence that the approach can be expected to deliver some increase in reliability compared to a single version, there is no agreement about the extent of this.

We suggest the combined utilization of so called systematic diversity and design diversity in a timeredundant system instead of the. Recent developments in year 2000 and beyond benoit baudry 1 and martin monperrusy2 1inria, france 2university of lille, france abstract early experiments with software diversity in the mid 1970s investigated nversion programming and recovery blocks to increase the reliability of embedded systems. Fault tolerance via diversity for offtheshelf products. Study a specific software fault tolerance scheme middleware or application using software fault tolerance e. Fault tolerance can be achieved by the following techniques.

They include the recovery block scheme rbs programming, consensus recovery block programming, nversion programming nvp, n selfchecking programming nscp and data diversity. Assessment of data diversity methods for software fault tolerance. Design diversity has been used for many years now as a means of achieving a degree of fault tolerance in software based systems. Fault tolerance through automated diversity in the management of distributed systems jorg prei. Therefore fault tolerance is achieved by using diversity in the data space. It is sometimes impossible to test under realistic conditions. The main design diversity and data diversity techniques have been summarized in. Data diversity fault tolerance design the software ft architecture in this research uses dd, a complementary approach to design diversity. Designing faulttolerant soa based on design diversity.

Software engineers assume that the different implementations use different designs. Design diversity has been used for many years now as a means of achieving a degree of fault tolerance in softwarebased systems. The design diversity experiments testbed dedix has thus two aspects. Both avionics and space systems tend to use design diversity, i. The two bestknown methods of building fault tolerant software are nversion programming 3 and recovery blocks 7. Unlike hardware faults, all software faults are design and implementation errors. The nversion approach to faulttolerant software ieee. Fault tol erance is a function of computing systems that serves to as. Tolerance of design faults in software and in hardware is the challenge of the eighties. The aim of this paper is to cover past and present approaches to software implemented fault tolerance that rely on both software design diversity and on single but. Such systems focus strongly on design faults, where the term. There are two basic techniques for obtaining faulttolerant software. Design and analysis of fault tolerant digital systems by b. Software fault tolerance by design diversity 1995 cached.

These faults are usually found in either the software or hardware of the system in which the software is running in order to provide service in. For example, if component b performs some operation based on the output from component a, then fault tolerance in b can hide a problem with a. Diversity in the data space can also provide fault tolerance. Design diversity was not a concept applied to the solutions to hardware fault tolerance, and to this end, nway redundant systems solved many single errors by. An approach called design diversity combines hardware and software faulttolerance by implementing a faulttolerant computer system using different hardware and software in redundant channels. Closer to us, after its reinception in the 70s elm72, fis75, ran75, che78, software fault tolerance by design diversity has become a reality, as witnessed by the reallife systems and the experiments reported in vog88. The two bestknown methods for building design redundant software are nversion programming i and recovery blocks, z there are several technical problems with design redundant software that design fault tolerance 27 should be recognized by practitioners but frequently are not.

While there is clear evidence that the approach can be expected to. Whilst there is clear evidence that the approach can be expected to deliver some increase in reliability compared with a single version, there is not agreement about the extent of this. If an offtheshelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. An introduction to software engineering and fault tolerance.

Software engineering software fault tolerance javatpoint. The adoption of software fault tolerance techniques based on design diversity has been advocated as a means of coping with residual software design faults in operational software lee and anderson 1990. The adoption of software fault tolerance techniques based on design diversity has been advocated as a means of coping with residual software. Therefore faulttolerance is achieved by using diversity in the data space. Software fault tolerance by design diversity 1995 citeseerx. In order to discuss software fault tolerance, we must first establish or obtain an abstract model of describing.

Citeseerx software fault tolerance by design diversity. In order to make measurements in a multi version software experiment, a testbed was needed. Because of this, a wide range of issues affects software reliability. In previous work, we conducted a software project with realworld application for investigation on software testing and fault tolerance for design diversity. Such redundancy can be implemented in static, dynamic, or hybrid configurations.

Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. Nvp is based on the principle of design diversity, that is coding a software module by. Definition and analysis of hardware and softwarefault. Fault tolerant strategies fault tolerance in computer system is achieved through redundancy in hardware, software, information, andor time.

As more and more complex systems get designed and built, especially safety critical systems, software fault tolerance and the next generation of hardware fault tolerance will need to evolve to be able to solve the design fault problem. Data diverse software fault tolerance techniques n complements design diversity by compensating for design diversity s limitations n involves obtaining a related set of points in the program data space, executing the same software on those points in the program data space, and then using a decision algorithm to determine the resulting output. Designfault tolerance by means of design diversity is a concept that traces back to the very early age of informatics. A much stronger assumption is that ideal diverse software would exhibit. Part of the dependable computing and faulttolerant systems book series dependablecomp, volume 3 fault tolerance techniques for coping with the occurrence and effects of anticipated hardware component failures are now well established and form a vital part of any reliable computing system. The multiple computation approach and its extension to design diversity multiple computation is a fundamental method employedto attain fault tolerance. A survey of software fault tolerance techniques zaipeng xie, hongyu sun and kewal saluja. Part three is devoted to summarizing newly deployed techniques in sre such as software reliability simulation, software testing, fault tree analysis and neural networks. Faulttolerant software assures system reliability by using protective redundancy at the software level.

Dependable and fault tolerant systems and networks. Fault tolerance and recovery 4 sources of faults which can. Index termsdata diversity, design diversity, ncopy pro gramming, nversion programming, recovery blocks, retry blocks, software faults, software fault tolerance. An approach called design diversity combines hardware and software fault tolerance by implementing a fault tolerant computer system using different hardware and software in redundant channels. Fault tolerance relies on power supply backups, as well as hardware or software that can detect failures and instantly switch to redundant components. Structuring redundancy for software fault tolerance robust software. Reliability and fault correlation are two main concerns for design diversity, yet empirical data are limited in investigating these two. Software fault tolerance is an immature area of research. Software fault tolerance carnegie mellon university. Software fault tolerance using data diversity attention.

Design diversity is the generation of different implementations codes from. This chapter focuses specifically on fault tolerance techniques, rather than the myriad of fault avoidance techniques. We have several software fault tolerance schemes as proposed in 46,47,48,49,50 are based on software design diversity in order to tolerate software design bugs. Assessment of data diversity methods for software fault tolerance based on. Fault masking is any process that prevents faults in a system.

The first part presents in a unified way the methods for software fault tolerance by design diversity. Software fault tolerance is the ability of a software to detect and recover from a fault that is happening or has already happened. The root cause of software design errors is the complexity of the systems. Limited degrees of fault tolerance in software defensive programming are common, but systematic application of fault tolerance for design faults is still rare and mostly limited to highly critical systems. Schemes and an implementation framework software fault tolerance, in the context of this paper, is concerned with all the techniques necessary to enable a system to tolerate software design faults. Fault tolerance through automated diversity in the. This is achieved by creating faulttolerant composite services that leverage functionallyequivalent services. In order to complement design diversity in the quest for fault tolerance software, there exits several data diversity techniques which are similar to the aforementioned for the design diversity approach.

Software fault tolerance techniques are employed during the procurement, or development, of the software. It is assumed that such diverse faults will minimize the likelihood of coincident failures. A recent survey emphasizes on the most recent advances in the field. This is achieved by creating fault tolerant composite services that leverage functionallyequivalent services. When a fault occurs, these techniques provide mechanisms to. Software fault tolerance by design diversity cuhk cse. City research online modeling software design diversity. Developing ft for design faults in software requires the. Sc high integrity system university of applied sciences, frankfurt am main 2. Over recent years, software developers have been evaluating the benefits of both serviceoriented architecture and software fault tolerance techniques based on design diversity by. A basic requirement was to simulate the environments in which design diversity should be used. Designing faulttolerant soa based on design diversity springerlink. If component b is later changed to a less fault tolerant design the system may fail suddenly, making it appear that the new component b is the problem. Software fault tolerance is the ability for software to detect and recover from a fault that is happening or has already happened in either the software or hardware in the system in which the software is running to provide service by the specification.

1419 511 157 791 13 1125 106 680 347 75 1300 206 691 1467 1093 223 105 388 1355 229 1414 1329 45 1414 509 1034 966 1111 146 1142 1434 981 38 804 1388 1120 565 545 398 48 767 384 1178