You are currently viewing Architecture of an Aria Operations Cluster: Nodes, availability methods and more

Architecture of an Aria Operations Cluster: Nodes, availability methods and more

  • Post author:
  • Post last modified:November 30, 2024

Welcome back to our learning series on VMware Aria Operations! In this episode, we will look at the basic architecture of an Aria Operations Cluster.

In order to create an optimal cluster architecture, an understanding of availability methods and the different node types is essential. We clarify the differences between the availability concepts, the roles of the individual nodes in the cluster, the maximum number of nodes and the configuration of these availability methods. By the end of this episode, you will be able to not only understand the architecture of an Aria Operations cluster, but also design it effectively.

What types of nodes are there?
An Aria Operations Cluster consists of different types of nodes that fulfill specific tasks to ensure optimal performance and scalability. Each of these nodes has its own role. The tasks of the nodes are essentially divided into the following:

  • Primary node (master):
    The primary node is automatically created as the first node of a cluster. In a standalone cluster – i.e. an environment with only one node – this node is responsible for managing the cluster, analyzing the data and managing the database. It forms the heart of every cluster.
  • Replica Node:
    The replica node serves as a backup for the primary node and takes over its tasks seamlessly if the primary fails.This concept is similar to how a vSAN backup works (vSAN also uses the principle of a master, backup and slaves).The replica node analyzes data and contributes to cluster availability. A prerequisite for the use of a replica node is the activation of HA (High Availability) or CA (Continuous Availability). More on this later.
  • Data nodes:
    These nodes are the “workers” of the cluster.Their main task is to collect, analyze and process data.Data nodes can be added flexibly (within limits) to keep pace with growing data volumes.
  • Witness Node:
    The witness node provides the quorum to guarantee cluster consistency in the event of a fault domain failure. We only need a witness node if we activate CA.
  • Cloud Proxy:
    Formerly known as the Remote Collector, the Cloud Proxy enables data to be collected in remote networks. The collected data is forwarded to the analytic cluster. Cloud proxies require fewer firewall permissions, but have no UI and cannot analyze data. They are rolled out when, for example, we want to collect data at remote locations or in environments where firewall activations are more complicated. CPs can save a lot of work here. CPs do not store data. If they fail, the data they were collecting at that moment is lost.

In addition, every primary, replica and data node can analyze data and provides a UI to access the cluster.Cloud proxies are the exception here.

Availability methods – what are they?
Availability methods are required to ensure that SLAs (for VMs, hosts and applications) can be maintained and that the cluster is always available. Similar to vSphere HA or comparable clustered solutions, these ensure that failing workloads can be intercepted. This means that the cluster remains available at all times thanks to its cluster members. We essentially differentiate between three basic approaches:

  • HA (High Availability): This method is used within a single data center or rack. In the event of a node failure, the replica node automatically takes over the tasks of the primary node. This concept is comparable to a one-site vSAN as the latencies between the nodes must be extremely low.
  • CA (Continuous Availability): This method enables a type of stretched cluster across two sites. The nodes are divided into fault domains and a witness node is implemented at a third location. CA is particularly suitable for environments that need to be protected against the failure of an entire data center and is most reminiscent of a stretched cluster.
  • Standalone:Standalone/no availability: If you only want to operate one node, this also works. There is then no availability.

How to configure HA or CA?
Configuring availability methods is easy in Aria Operations. First, you need to make sure that you have selected the desired design. While HA does not require any additional components, CA requires a Witness instance. This is deployed like a normal node and then added to the cluster.

The configuration is done via the Admin UI. After logging in, you can activate the desired availability method (HA or CA). A wizard guides you through the process and the configuration is seamlessly integrated into the cluster. In the following screenshot you can see an example of how a cluster with an HA configuration can look finished.

Conclusion
With this understanding of the architecture of an Aria Operations Cluster, you can optimally adapt your cluster to your requirements. The different node types and availability methods offer a high degree of flexibility to support both small and large environments. If you have any questions, please feel free to contact me via the contact form or LinkedIn.

The next post will be the first deep dive post on my blog. However, it will only appear after my vacation. Until then, I wish you all the best from Florida!