This guide walks you through setting up alerts for manifest deployment failures with Slack notifications. You’ll create a Slack webhook and configure an alert rule to monitor specific clusters.
What You’ll Build
By the end of this guide, you’ll have:- A Slack webhook integration connected to Ankra
- An alert rule that triggers when manifest jobs fail
- Notifications sent to your Slack channel when deployments fail
- Automatic AI-powered root cause analysis for every failure
Prerequisites
Before you begin, ensure you have:- An Ankra account with access to your organization’s alerts
- Admin access to a Slack workspace (to create an app)
- At least one cluster with manifests deployed
Part 1: Create a Slack Webhook
Step 1: Create a Slack App
- Go to api.slack.com/apps
- Click Create New App
- Choose From scratch
- Enter the app details:
- App Name:
Ankra Alerts - Workspace: Select your workspace
- App Name:
- Click Create App

Step 2: Enable Incoming Webhooks
- In your app settings, click Incoming Webhooks in the left sidebar
- Toggle Activate Incoming Webhooks to On
- Click Add New Webhook to Workspace
- Select the channel for alerts (e.g.,
#deploymentsor#alerts) - Click Allow
Step 3: Copy the Webhook URL
After authorization, copy your webhook URL. It will look like:Step 4: Create the Webhook in Ankra
- In Ankra, navigate to Alerts in the sidebar
- Click the Integrations tab
- Click Create Webhook
- Fill in the details:
| Field | Value |
|---|---|
| Name | Slack Deployment Alerts |
| URL | Paste your Slack webhook URL |
| Description | Notifications for manifest failures |
| Template | Select Slack |
- Click Create
Step 5: Test the Webhook
- On the webhook detail page, click Send Test
- Check your Slack channel for the test message
- If it appears, your webhook is ready!
Part 2: Create the Alert Rule
Now create an alert rule that monitors manifest deployments on specific clusters.Step 1: Start Creating an Alert
- Go to Alerts in Ankra
- Click the Alert Rules tab
- Click Create Alert
Step 2: Configure Basic Details
In the Details step:| Field | Value |
|---|---|
| Name | Manifest Deployment Failed |
| Description | Alert when manifest jobs fail on production clusters |
| Enabled | Toggle On |
Step 3: Select Resource Type
In the Resource step:- Resource Type: Select Cluster Resource
- Cluster Resource Kind: Select Manifest
Step 4: Select Specific Clusters
Under Clusters, select the clusters you want to monitor:- Click the cluster dropdown
- Check the boxes for your target clusters (e.g.,
production-cluster,staging-cluster) - Only selected clusters will trigger this alert
Leave all clusters unchecked to monitor manifests across your entire organization.
Step 5: Configure the Condition
In the Conditions section, set up the failure condition: Condition 1:| Field | Value |
|---|---|
| Condition Type | Job Status |
| Operator | Equals |
| Value | Failed |
Failed state.
Optional: Add additional conditions
Optional: Add additional conditions
You can add multiple conditions with AND/OR logic:Example: Alert on both failures and timeouts
- Condition 1: Job Status = Failed
- OR
- Condition 2: Job Status = Timeout
- Condition 1: Job Status = Failed
- AND
- Condition 2: Stuck in State > 5 minutes
Step 6: Set Severity and Cooldown
In the Severity step:| Field | Value | Description |
|---|---|---|
| Severity | critical | High priority for deployment failures |
| Cooldown | 15 minutes | Prevent repeated alerts for the same issue |
Step 7: Attach the Slack Webhook
In the Notifications step:- Under Webhooks, find your
Slack Deployment Alertswebhook - Check the box to enable it
- The alert will now send notifications to Slack when triggered
What Happens When a Manifest Fails
When a manifest deployment fails on one of your selected clusters:Step-by-Step Breakdown
- Ankra detects the failure - Job status changes to
Failed - Alert rule evaluates - Condition matches (Job Status = Failed)
- Cooldown check - If no recent alert for this resource, proceed
- Webhook fires - Slack receives the notification immediately
- AI Analysis Resource created - Ankra creates a tracking resource for the analysis
- AI Analysis Job scheduled - A background job starts collecting data
- Data collection - The job gathers:
- Pod status and container states
- Kubernetes events (warnings, errors)
- Container logs (last 50 lines)
- Job results and error messages
- AI Incident generated - Claude AI analyzes the data and produces:
- Root cause identification
- Severity assessment
- Recommended actions
- Links to affected resources
View AI Incidents: Navigate to Alerts → AI Incidents tab to see all generated analyses. Each incident includes the AI’s findings, affected resources, and suggested next steps. You can also ask follow-up questions directly in the incident view.
Example Slack Notification
Your Slack channel will receive a message like:Verifying Your Setup
Test your configuration end-to-end:- Create a test manifest with an intentional error (e.g., invalid YAML)
- Deploy the manifest to one of your monitored clusters
- Watch for the failure in the Operations page
- Check Slack for the alert notification
- Review the AI Incident for root cause analysis
Additional Alert Scenarios
Alert on manifest state changes
Alert on manifest state changes
To alert when manifests enter an unhealthy state (not just job failures):
This triggers when a manifest’s health status becomes “Down”.
| Field | Value |
|---|---|
| Condition Type | Resource State |
| Operator | Equals |
| Value | Down |
Alert on stuck deployments
Alert on stuck deployments
To alert when manifests are stuck in a deploying state:
This triggers when a manifest has been in a transitional state for too long.
| Field | Value |
|---|---|
| Condition Type | Stuck in State (duration) |
| Operator | Greater than |
| Value | 10 (minutes) |
Alert on repeated failures
Alert on repeated failures
To alert only after multiple failures:
This triggers after 3 or more failed jobs on the same manifest.
| Field | Value |
|---|---|
| Condition Type | Failed Job Count |
| Operator | Greater than or equal |
| Value | 3 |
Troubleshooting
Alert not triggering
- Verify the alert rule is Enabled
- Check that the correct clusters are selected
- Ensure the manifest is in a cluster you’re monitoring
- Confirm the condition matches (Job Status = Failed)
Slack message not appearing
- Test the webhook using the Send Test button
- Verify the webhook URL is correct
- Check that the Slack app still has channel access
- Look for errors in the webhook’s delivery history
Too many alerts
- Increase the cooldown period (e.g., 30 minutes)
- Add more specific conditions to filter alerts
- Consider monitoring only production clusters
Related
- Alerts - Full alert rule configuration reference
- Webhooks - Template variables and custom payloads
- AI Incidents - Automatic root cause analysis
- Operations - View job history and logs
Still have questions? Join our Slack community and we’ll help out.