Alert Escalation and Notificat...
The Events Profile
Event Profile Alert Rule Types
8min
all rules within the event profile fall under one of two rule types boolean rule type threshold rule type boolean rule type the "boolean rule type" is used for static states, there is no meaningful scale of degradation, but the state is simply ‘ok’ or not for example, the broadcaster configuration error rule is parsed when a broadcaster’s configuration does not match the configuration expected by zen master historical data or previous confirmation results are not used in this case when this error occurs, it must be addressed this type of rule offers three options that can be enabled/disabled generate ok notification if enabled, the system will generate an ‘ok’ alert email and send it to the user and groups who have email privileges for this object generate error notification if enabled, the system will generate an ‘error’ alert email and send it to the user and groups who have email privileges for this object ignore object state for notification generation – zen master objects can have multiple issues, e g , a broadcaster with high cpu and full hard drive an object already in an error state may later have cause to error for a secondary reason by default, the initial email notification is sufficient, but in cases where a specific issue warrants a secondary email, select this option to force a notification even if a previous email has been sent boolean rules type with escalation control some boolean rules provide further controls for how the system manages and alerts on errors states within a definable time window these rules are designed to serve two purposes to address flapping states flapping states occur when a problem is intermittent for example, when a monitored object jumps continually between good and bad states users can be overloaded and inundated with email to address continuous warning states it guarantees that an object will not sit in a continual warning state, never escalating to an error status or generating an alert email the rule ensures operator are made aware the issue the "escalation rule" provides additional controls under the event type toggle to tune zen masters propensity to either subdue or escalate warnings to full errors and send an email notification when the event type toggle is set to ‘error’ the rule will immediately escalate an issue to an error status, bypassing any intermediate warning state when it is set to ‘ warning’ , the rule provides two additional controls event escalation count – the number of events needed to occur within a window for the system to escalate to a full error escalation/de escalation window – specifies the timeframe to escalate/de escalate in minutes if the event type configuration is set to ‘warning’ , the following will occur the object status will elevate to a ‘warning’ upon the first and any subsequent continuous occurrences of the event it will return to ‘ok’ if the state recovers subsequent events within the allotted a time window will again limit the escalation to a warning status the object will escalate to an error status when the total number of event occurrences within the time window configured first reaches the event count escalation based on number occurrences and time window if the event does not occur again in the time window, then it will de escalate back to an ok state if the event type configuration is set to ‘error’ the following will occur it will be an error event upon the first occurrence of the event when event type is set to error there is no de escalation process a notification is sent the ‘track only, do not update object state’ option specifies that any escalation event is to be registered in the event log only this option is useful when the issue occurrence itself if not a primary reason to alert the operator, just useful information to log for a later time if the event type configuration is set to ‘track only’ , the following will occur zen master alert system will obey the escalation settings, as described above the issue triggering the rule and any resulting escalation, will not escalate the object itself to a warning or error the threshold rule type the threshold rule provides controls for events where both thresholds are relevant, of which there are 2 categories some thresholds cannot be noisy, or flap above and below the defined threshold for example, the broadcaster license expiration warning can be set to a specific number of days in advance of expiration the number of days remain on the license can only decrease in such a case the object is escalated to a warning status when the first threshold is met and escalated to an error with corresponding notification emails when the second threshold is met the second category of threshold rules is used for volatile metrics and provides controls for events where both thresholds are relevant, and de escalation can occur, such as cpu usage for this type of event, zen master will again escalate the object to a warning or error status when the corresponding threshold is crossed however, once the value falls back and stays below the threshold for the specified de escalation window duration, the object will de escalate back to the state of the most recent confirmation result when user notifications are enabled, they will occur on ‘error’ and / or ‘ok’ events