All servers are virtual machines. The total infrastructure consists of 1 VPN server, 1 Management & Monitoring server, 2 Cardano relays and 2 Cardano block-producers (1 in standby mode). The Cardano nodes have 4GB of RAM each, 2 CPU’s (Intel Skylake 3Ghz) and 128GB NVMe SSD. The other servers are running on the same kind of CPU’s and the same kind of storage, however their memory, number of CPU’s and amount of storage have been scaled according to their functionality. The servers can be easily upgraded to a higher plan providing more CPU’s and memory if necessary. All servers are running on Centos 8 and have been hardened according to best practices. All servers are protected by their built-in firewalls (Operating System level), protected by firewalls (Cloud provider level). The Cardano relays are furthermore protected by a DDOS mitigation service. The Cardano relays allows incoming traffic from the internet on the port where the Cardano-node service is running. The VPN server allows traffic from a limited amount of static IP’s after I have changed the firewall rule set (Just In Time principle). The Management & Monitoring server allows traffic from a limited amount of static IP’s to provide Live views on how the environment is doing. All other traffic is running over a private network. Only the keys that are necessary are kept on the Cardano block-producing nodes. The Management & Monitoring server sends me alerts by email and SMS. The VPN server tests the Management & Monitoring server on availability and sends alerts by email and SMS. A Talariax Entera at my home office analyzes the syslogs which are continuously being send to him and alerts me by SMS and email if necessary. The Cloud provider that I have chosen maintains a large network worldwide with several 10Gbps fiber internet connections per datacenter divided over 2 or more Telco’s.
Snapshots of the Cardano nodes are being made every 4 hours whilst the 2 latest snapshots are being kept. Each Cardano node syncs her block-chain every minute to additional SSD-based Block storage. Snapshots are taken every month of all other servers. Disaster recovery has been tested a couple of times and gave me a RTO (recovery time objective) of 1 hour and a RPO (recovery point objective) of 5 minutes.
The relay nodes and block producing nodes are spread over 2 different datacenters within Europe. This should give the stakepool an availability of 99,95% because the stakepool would still be up and running even when 2 servers are down at the same time.