Amazon provides good documentation on well-architected AWS solutions in terms of the following SPORC perspectives:
Reliability refers to the ability of a workload to perform its intended function correctly and consistently. Fault tolerance and resiliency are dependent factors that contribute to reliability. Availability is a common quantitative measure of reliability.
A workload is “a collection of resources and code that delivers business value, such as a customer-facing application or a backend process. A workload is available for use means that it performs its agreed function successfully when required.” (Amazon)
Fault-tolerant designs render fault tolerance that avoids disruptions and sustains services in alternative modes.
Fault tolerance is “a property of a system that allows proper operation even if components fail.” (NISTIR 8202)
Resiliency is the ability of a workload to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions, such as misconfigurations or transient network issues. (Amazon)
Availability is “the percentage of time that a workload is available for use.” (Amazon) It is a commonly used metric to measure resiliency quantitatively. Specifically, availability is a percentage uptime over a period of time. For example, the “five nines” shorthand translates to the availability, 99.999%.
Infrastructure As Code
Infrastructure As Code uses declarative programming or scripting to automate the provisioning and management of infrastructure.