Agents and gateways

Agents and gateway are distributed platform components that discover and monitor infrastructure resources. Agents monitor servers and applications, and gateways monitor non-server devices, such as network and storage devices.

An agent is an executable application that runs on managed resources within both on-premise and cloud infrastructures.

A gateway is a virtual appliance that provides secure communication, non-server resource monitoring, and limited data storage during connectivity failure.

Automation

You can use automation to act on resource faults, remediating issues in response to events, or performing routine maintenance tasks. There are two automation models:

Automate discrete tasksUse this model for a single task that needs to be executed on multiple servers.
Automate a sequence of tasksUse this model to execute a sequence of tasks across multiple resources. This model is called a process automation workflow.

Availability

An up/down state indicates resource availability for providing the prescribed service. Evaluating metrics or using a simple acknowledgment from a resource can be used to determine the up/down state of the resource.

Correlation

Alerts that can be inferred to be due to the same cause are grouped into similar types and two types of alert correlation are performed:

DeduplicationRepeated alerting occurs for an alert that is currently unresolved, such as network devices sending SNMP traps for as long as an issue persists. Repeated alerts are deduplicated.
InferencingDifferent alerts originating from different IT resources but it can infer the alerts are likely due to the same cause.

Dashboard

A dashboard is a collection of widgets that provide visualizations of collected metrics.

Partner-scoped dashboards are visible only to users defined for the partner. Client-scoped dashboards are only visible to users who are client members.

Discovery

Discovery is the process of finding resources deployed in the enterprise. Resources need to be discovered before they can be monitored and metrics collected. When discovering resources, a model that includes all resources is dynamically built and is used to interpret and present the state of the environment.

Event management

Events are activities of operational significance that occur on a monitored resource. Examples of events include:

  • Hardware failures
  • Server CPU utilization thresholds exceeded
  • Application failures
  • Configuration change

The following mechanisms are used to detect events:

  • Native instrumentation
  • Self-diagnostics
  • Third-party reporting by integrated third-party monitors

The goal of event management is to minimize the time spent responding to an event. The following event management lifecycle standardizes and automates the efficient handling of events:

  1. Ingestion
  2. Interpretation
  3. Correlation
  4. First Response
  5. Escalation

First response

The initial alert response can be governed by:

  • Inferred seasonal patterns, so the alert might be suppressed if it remains open past a historical norm.
  • Learning algorithms, which can be trained to suppress alerts that match specific patterns.

Metric threshold

Metrics can be evaluated against threshold limits. Two types of thresholds are supported. A static threshold is a fixed value that represents a fault condition when exceeded. A change-based threshold is a computed value that measures unexpected changes in the threshold value. Change-based thresholds are more applicable to metrics where a static value is difficult to determine.

Monitoring

The goal of monitoring is to assess the availability and performance of managed resources. This is done by collecting, storing, and evaluating resource metrics.

Performance

Resource performance is the measure of whether the resource is operating within user-defined limits. Fault conditions such as exceeding predefined thresholds can indicate performance issues.

Service maps

Service maps organize resources into a hierarchical structure. This makes it possible to associate resource health with the level of user and business impact.

Tenancy

Tenancy divides the enterprise into independent management domains, called tenants, where each tenant is a logical container of managed resources. Dashboards, management policies, and integrations are scoped to a tenant.

The tenancy model defines two core constructs:

  • A partner is a master tenant and is associated with your account.
  • A client is a partner sub-tenant. Different management policies can be applied to different clients.

Partners and clients can each have separate sets of user accounts and a user account can be part of one and only partner or client tenant.

User privileges within a tenant can be specified using the following RBAC criteria:

UserAn account within a tenant.
User GroupA group of users.
PermissionAuthorization controls limiting user access and activities.
RoleAn association of a user or user group with permissions against managed resources. A user or user group can be permitted specific actions on specific resources.

Topology maps

A topology map is built from relationships determined during discovery. Each node in a topology map represents a managed resource and an edge between nodes represents the type of connection between those resources. With a topology map, you can visualize and explore your infrastructure, drilling down to an increasingly greater level of detail. Topology maps can also be used to model the impact of planned changes.