Cluster Resources Planning

The runtime environment's compute resources mainly include Worker node resources, Proxy node resources, and Controller node resources.

Proxy nodes and Controller nodes are elastically scaled by the system based on resource usage. When creating the runtime environment, a minimal configuration that can run is created. Controller nodes default to creating 2 machines distributed across two availability zones. The instance type is scheduled according to the configuration center's settings. Proxy nodes are scheduled according to the availability zones selected for Worker nodes, consistent with Worker nodes. Each availability zone starts one machine by default. The instance type is scheduled according to the configuration center's settings. Worker nodes currently require manual intervention for scaling operations. In addition to planning your cache instances, you also need to evaluate the runtime environment resources for Caches.

This guide primarily helps you plan your runtime environment resources to meet business needs.

Choose Your VPC

When planning your runtime environment resources, you first need to select the VPC network where your business runs. This ensures that your business and Montplex Caches run within the same VPC.

Select Availability Zones for Business Use

When your cache requires high availability at the availability zone level, you need to select at least two or more availability zones. The more availability zones, the greater the cross-zone traffic. You need to balance between business availability and traffic cost expenditure. It's recommended that Worker component availability zones include at least the availability zone where your business is located to reduce cross-zone traffic costs for your business. Create at least 1 worker load in each availability zone, running multiple EC2 instances of the same type.

Choose an Appropriate Spot Instance Percentage

Reference for setting Spot percentage:

  1. Workload Type For stateless, interruptible workloads, a higher proportion of Spot instances can be used. For stateful, interruption-sensitive workloads, the proportion of Spot instances should be lower.
  2. Spot Instance Availability Check the availability of Spot instances in your AWS region and instance type, and choose sufficiently stable Spot instances.
  3. Cost Saving Goals Determine your cost-saving goals through the use of Spot instances, which will affect the proportion of Spot instances used.
  4. Fault Tolerance Decide on the tolerable level of interruption for Spot instances based on the application's fault tolerance. More fault-tolerant applications can use a higher proportion of Spot instances.

Considering these factors comprehensively, the following principles can generally be followed:

  • For stateless batch processing workloads, 80-100% Spot instances can be used.
  • For stateless web services, 50-80% Spot instances can be used.
  • For stateful DB services, 20-50% Spot instances can be used.

Setting the Spot Percentage for Montplex Cache

In Montplex Cache, currently only the Proxy group supports deploying mixed instances (supporting Spot), and Proxy is a stateless application.

  • The default configuration is 50% Spot instances.
  • Additional requirements can be requested to adjust this ratio.

Set the Memory Capacity of Workloads

The instance type and number of workloads directly determine the memory capacity of the runtime environment. The Controller node group defaults to using m6i series instance types and does not run cache instances. The Worker node group defaults to using R5n series instance types. This configuration is stored in the configuration center and it's not recommended for users to modify it themselves. Users only need to set the memory capacity for each availability zone.

Runtime Environment Memory Capacity Assessment Reference

For example: If you expect to use 10 sets of Montplex Cache instances in one VPC, with each cache expected to use 100 GB of memory.

Calculation formula:

Actual required storage memory = 10 * 100GB = 1000 GB;

Reserved memory = 1000GB * 20% = 200GB;

Runtime environment memory capacity = Actual required storage memory + Reserved memory

Thus, the runtime environment memory capacity would be (1000 GB + 200 GB) = 1200 GB

Note: Reserved memory includes memory for business surge usage, system process usage, cache fragment usage, etc.

It's recommended that the reserved memory ratio is greater than or equal to 20% of the actual required storage memory.

How to distribute across multiple availability zones

If you expect all caches to be stored in the same availability zone and don't need disaster recovery across availability zones, you only need to select one availability zone. Set the memory capacity of that availability zone to 1200GB, and the system will automatically calculate the number of machines needed.

If you expect all caches to be stored across two availability zones and don't care about traffic issues, you only need to select two availability zones and set the memory capacity of each availability zone to 600GB. The system will automatically calculate the number of machines needed for each availability zone.

If you expect the majority of caches to be stored in availability zone A and a small amount in availability zone B, you only need to select these two availability zones and set the memory capacity of availability zone A to 1000GB and availability zone B to 200GB.