Metric Policy
Overview
The Metric policy can be configured to send out an alert whenever a metric of a monitor goes above or below a certain threshold value. Let us consider a scenario where this can policy can be used. The metric policies can further be divided into:
- Threshold Alert
- Baseline Alert
- Threshold Alert
- Baseline Alert
Threshold Alert
Use-Case
Suppose, you want to monitor the performance of an EC2 instance in your AWS cloud infrastructure. You need Motadata AIOps to raise an alert whenever the metric measuring the CPU percentage goes above a certain threshold value. You can create a metric policy to do the same.
Navigation
Go to Menu, Select Settings . After that, Go to Policy Settings . Select Metric/Log/Flow policy. The list of the created policies is now displayed.
Click on to start creating a policy. From the panel on the left side of the screen, click on the Metric tab to start creating a metric policy. The interface to create a Metric Policy is now displayed.
Configuring Threshold Metric policy
Enter the details of the following parameters to create a threshold metric policy:
Field | Description |
---|---|
Policy Name | Enter a unique name of the policy you want to create. |
Tag | Enter a name to logically categorize the policy. You can quickly and easily identify a policy based on the tag assigned to it. |
Threshold Alert/Baseline Alert | Select the parameter as per the type of policy you want to create. In this case, we will select Threshold Alert to move forward. |
Set Conditions
Field | Description |
---|---|
Counter | Select the metric for which you want to create the policy. Click on the dropdown to view the available options. You can also search the specific metric you are looking for from the search bar. |
Source Filter | - Select Monitor if you want to create the policy for specific monitor(s). - Select Group if you want to create the policy for a group of monitors. In case you create the policy for a group, it is configured for all the monitors present in the group individually. - Select Everywhere if you want to create the policy for all the monitors created in the system. This option is selected by default. - Select Tag if you want to create the policy for all the monitors assigned with the same Tags. |
Source | Select the specific Monitor, Group, or the Tag for which you want to create the policy. This dropdown will show results based on the option you have selected in the previous option. You can leave this field blank if you have selected 'Everywhere' in the previous option. |
Critical/Major/Warning | Kindly use these fields to set the criteria under which the alert will be triggered. Here, you can also decide the alert severity based on the conditions you set. |
Assumption Based Scenario
In the context of a Threshold alert, consider the following scenario:
Abnormality Occurrence: Set to 3, indicating that the threshold breach should happen for three consecutive occurrences.
Notify if Threshold Value Breach Within: Configured as 5 Minutes, defining the time window within which the consecutive threshold breaches must occur.
When threshold is breached, the alert will trigger with varying severity levels:
Warning: Triggered when the CPU utilization percent goes above 60% thrice consecutively within 5 minutes.
Critical: Triggered when the CPU utilization percent goes above 80% thrice consecutively within 5 minutes.
In case the metric in the policy crosses the value for multiple severity thresholds, the alert will be raised with the highest severity applicable. As shown in the diagram above, values above 80% CPU utilisation qualify for both Warning and Critical severity. In this case, the alert will be raised with Critical severity because it is the highest qualified severity.
We will discuss the other conditions for the alert to be triggered now.
Field | Description |
---|---|
Notify if Threshold value breach within | Specify the time-period during which the policy will check the for the metric you selected above. This is the evaluation window in which the AIOps will check if the polling value crosses the threshold values configured in the policy. |
Abnormality occurrence | Specify the number of times the conditions set within the policy should be met consecutively within the evaluation window specified in the previous field. |
Auto Clear | Kindly enter the time in which you want the alert to be cleared irrespective of any other conditions. |
Notify Team
Field | Description |
---|---|
Notify | There are two ways you can populate this field: |
If severity is | Select the severity level using individual checkboxes in the dropdown.You can select multiple, all, or a single option as per your requirement. You can also have different recipients notified at different severity levels. For instance, you can notify johndoe@motadata.com when severity level hits Critical and send an alert notification to janedoe@motadata.com when severity level is Major. |
Play Sound | Activate this toggle to enable sound notifications when an alert is triggered. |
If Severity is | Choose the severity level at which the sound notification should be triggered. This option becomes visible only when the Play Sound toggle is switched ON. |
Renotification | Turning on the toggle will resend the alert at a specific interval defined by the user if the alert severity is not changed for the time specified. If turned off, Motadata AIOps will not renotify about the alert. |
Renotify | Similar to Notify Team field, enter the username or email address of the recipient. Also choose a preset duration for renotification along with the severity level at which they system will renotify you if the alert severity is not changed. |
Do not renotify if acknowledged | If the toggle is turned on, Motadata AIOps will not send a renotification to the recipient if they mark the alert as acknowledged. |
Take Action
Field | Description |
---|---|
Action to be taken | Select a runbook from the dropdown to be executed when the alert is triggered. |
Create New | Select this button to start creating a new runbook which you might want to assign to the policy you are creating. |
Declare Incident
Field | Description |
---|---|
When Alert Severity is | Select an Alert severity from the dropdown. |
Select Integration Profile to Trigger | Choose an Integration profile to be executed from the dropdown when the chosen alert severity is reached. |
Create Integration Profile | Select this button to start creating a new Integration profile that you might want to assign to the policy you are creating. |
You can click on to define a new Alert severity and Integration Profile combination. You can have separate Integration profile triggered at different levels of Alert severity.
Select the Create Policy button to create the policy based on the details entered.
Select the Reset button to erase all the current field values, if required.
Now let us look into the Baseline Alert.
Baseline Alert
The Baseline Alert is a powerful feature within Motadata AIOps that enables proactive monitoring of metrics by comparing their real-time values to a dynamically generated baseline. By analyzing historical data and establishing a baseline range, this alerting mechanism helps identify deviations in metric behavior, allowing organizations to detect potential issues and take preventive action.
With the Baseline Alert, organizations gain deeper insights into the normal behavior of their metrics by leveraging historical data from the past 15 days. By dynamically adjusting the baseline range using advanced statistical techniques, the alerting mechanism adapts to changes in metric behavior, ensuring accurate detection of deviations in real-time.
When a metric value breaches the dynamic baseline threshold during a policy evaluation, the Baseline Alert triggers a policy violation. Organizations can customize the actions taken when a violation occurs, such as sending notifications to stakeholders, or executing Runbooks to ensure corrective action.
Use-Case
The Baseline Alert is particularly useful in scenarios where it's crucial to maintain optimal performance and prevent critical problems. By continuously evaluating metric values against the established baseline, it provides early warning signs of performance bottlenecks, abnormal patterns, or unexpected variations in key metrics. This proactive approach empowers IT teams to address potential issues before they escalate, minimizing downtime, optimizing resource utilization, and enhancing overall operational efficiency.
Baseline Metric Policy Mechanism
- Policy Evaluation
The Baseline policy evaluations start as soon as the policy is created. During each evaluation, the system considers the metric data from the last 15 days to create a baseline range.
- Baseline Calculation
To generate the baseline range, the system utilizes advanced statistical methods. These techniques analyze the historical data points within the last 15 days and produces a dynamically adjusted baseline.
- Baseline Threshold Violation
Baseline threshold violation occurs when a metric's data point deviates from the expected behavior, surpassing the predefined acceptable range known as the baseline threshold. This violation triggers an alert based on the configured parameters within the baseline policy.
In the baseline policy configuration, users set the criteria for triggering an alert by specifying the number of times a data point can deviate from the baseline threshold within a defined time window, as configured in the Abnormality Occurrence and Notify if the Threshold value breach within fields, respectively.
- Actions on Baseline Policy Triggering
Once the baseline alert is triggered, you can define specific actions to be taken. These actions may include sending notifications to relevant stakeholders, executing scripts to automate remedial tasks.
By leveraging thme Baseline Alert, organizations can proactively monitor critical metrics, detect unusual patterns, and take prompt action to ensure smooth operations and optimal performance.
Navigation
Go to Menu, Select Settings . After that, Go to Policy Settings . Select Metric/Log/Flow policy. The list of the created policies is now displayed.
Click on to start creating a policy. From the panel on the left side of the screen, click on the Metric tab to start creating a metric policy. The interface to create a Metric Policy is now displayed.
Configuring Baseline Metric policy
Enter the details of the following parameters to create a Baseline Metric Policy:
Field | Description |
---|---|
Policy Name | Enter a unique name of the policy you want to create. |
Tag | Enter a name to logically categorize the policy. You can quickly and easily identify a policy based on the tag assigned to it. |
Threshold Alert/Baseline Alert | Select the parameter as per the type of policy you want to create. In this case, we will select Baseline Alert to move forward. |
Set Conditions
Field | Description |
---|---|
Select Metric | Select the metric for which you want to create the policy. Click on the dropdown to view the available options. |
Source Filter | - Select Monitor if you want to create the policy for specific monitor(s). - Select Group if you want to create the policy for a group of monitors. In case you create the policy for a group, it is configured for all the monitors present in the group individually. - Select Everywhere if you want to create the policy for all the monitors created in the system. This option is selected by default. - Select Tag if you want to create the policy for all the monitors assigned with the same Tags. |
Select Monitor/Select Group | Select the specific Monitor or the Group for which you want to create the policy. This dropdown will show results based on the option you have selected in the previous option. You can leave this field blank if you have selected 'Everywhere' in the previous option. |
Absolute/Relative | - Select Absolute to define specific numerical values. When selecting this option, users specify the exact values that metrics should deviate from the baseline to trigger an alert. - Select Relative to set thresholds based on percentages. This option allows users to define deviations from the baseline as a percentage, triggering alerts when metrics deviate by the specified percentage. |
Critical/Major/Warning | Kindly use these fields to set the criteria under which the alert will be triggered. Here, you can also decide the alert severity based on the conditions you set. |
Notify if the Threshold value breach within | Specify the time window during which the policy will check the monitor/instance to breach the threshold value for the metric you selected above. |
Abnormality occurrence | Specify the number of times the threshold value should be breached consecutively within the evaluation window specified above to trigger an alert. |
Auto Clear | Kindly enter the time in which you want the alert to be cleared irrespective of any other conditions. |
Assumption Based Scenario
In the context of a Relative Baseline alert, consider the following scenario:
Abnormality Occurrence: Set to 2, indicating that the metric must deviate from the baseline threshold for two consecutive occurrences.
Notify if Threshold Value Breach Within: Configured as 5 Minutes, defining the time window within which the consecutive deviations must occur.
If the actual value of the used memory bytes exceeds specific percentage thresholds compared to the baseline value, the alert will trigger with varying severity levels:
Critical: Triggered when the actual value surpasses 50% compared to the baseline value twice consecutively within 5 minutes.
Major: Triggered when the actual value exceeds 30% compared to the baseline value twice consecutively within 5 minutes.
Warning: Triggered when the actual value goes below 10% compared to the baseline value twice consecutively within 5 minutes.
In case the metric in the policy crosses the value for multiple severity thresholds, the alert will be raised with the highest severity applicable. As shown in the diagram above, values above 30% qualify for both Major and Critical severity. In this case, the alert will be raised with Critical severity because it is the highest qualified severity.
Notify Team
Field | Description |
---|---|
Notify | There are two ways you can populate this field: |
If severity is | Select the severity level using individual checkboxes in the dropdown.You can select multiple, all, or a single option as per your requirement. You can also have different recipients notified at different severity levels. For instance, you can notify johndoe@motadata.com when severity level hits Critical and send an alert notification to janedoe@motadata.com when severity level is Major. |
Play Sound | Activate this toggle to enable sound notifications when an alert is triggered. |
If Severity is | Choose the severity level at which the sound notification should be triggered. This option becomes visible only when the Play Sound toggle is switched ON. |
Renotification | Turning on the toggle will resend the alert at a specific interval defined by the user if the alert severity is not changed for the time specified. If turned off, Motadata AIOps will not renotify about the alert. |
Renotify | Similar to Notify Team field, enter the username or email address of the recipient. Also choose a preset duration for renotification along with the severity level at which they system will renotify you if the alert severity is not changed. |
Do not renotify if acknowledged | If the toggle is turned on, Motadata AIOps will not send a renotification to the recipient if they mark the alert as acknowledged. |
Take Action
Field | Description |
---|---|
Action to be taken | Select a runbook from the dropdown to be executed when the alert is triggered. |
Create New | Select this button to start creating a new runbook which you might want to assign to the policy you are creating. |
Select the Create Policy button to create the policy based on the details entered.
Select the Reset button to erase all the current field values, if required.