CPU Overload Protection (Kubernetes)
Introduction
By default, this policy detects when the Kubernetes Pod is overloaded using the pod CPU utilization metric. The policy is based on the adaptive load scheduling component.
All the Kubernetes related metrics are collected by the
Kubeletstats OpenTelemetry Collector
so if the system under observation requires using different metrics for the
overload confirmation, the
list of available metrics
can be used to configure the policy. The following PromQL query (with
appropriate filters) is used as SIGNAL
for the load scheduler:
avg(k8s_pod_cpu_utilization_ratio)
Configuration
Blueprint name: load-scheduling/cpu-overload-protection-k8s
Parameters
policy
Parameter | policy.components |
Description | List of additional circuit components. |
Type | Array of Object (aperture.spec.v1.Component) |
Default Value | Expand
|
Parameter | policy.policy_name |
Description | Name of the policy. |
Type | string |
Default Value | __REQUIRED_FIELD__ |
Parameter | policy.resources |
Description | Additional resources. |
Type | Object (aperture.spec.v1.Resources) |
Default Value | Expand
|
policy.load_scheduling_core
Parameter | policy.load_scheduling_core.dry_run |
Description | Default configuration for setting dry run mode on Load Scheduler. In dry run mode, the Load Scheduler acts as a passthrough and does not throttle flows. This config can be updated at runtime without restarting the policy. |
Type | Boolean |
Default Value | false |
Parameter | policy.load_scheduling_core.kubelet_overload_confirmations |
Description | Overload confirmation signals from kubelet. |
Type | Object (kubelet_overload_confirmations) |
Default Value | Expand
|
Parameter | policy.load_scheduling_core.overload_confirmations |
Description | List of overload confirmation criteria. Load scheduler can throttle flows when all of the specified overload confirmation criteria are met. |
Type | Array of Object (overload_confirmation) |
Default Value | Expand
|
Parameter | policy.load_scheduling_core.aiad_load_scheduler |
Description | Parameters for AIMD throttling strategy. |
Type | Object (aperture.spec.v1.AIADLoadSchedulerParameters) |
Default Value | Expand
|
Parameter | policy.load_scheduling_core.setpoint |
Description | Setpoint. |
Type | Number (double) |
Default Value | __REQUIRED_FIELD__ |
policy.kubernetes_object_selector
Parameter | policy.kubernetes_object_selector.api_version |
Description | API version of the object to protect. |
Type | string |
Default Value | apps/v1 |
Parameter | policy.kubernetes_object_selector.kind |
Description | Kind of the object to protect. |
Type | string |
Default Value | Deployment |
Parameter | policy.kubernetes_object_selector.name |
Description | Name of the object to protect. |
Type | string |
Default Value | __REQUIRED_FIELD__ |
Parameter | policy.kubernetes_object_selector.namespace |
Description | Namespace of the object to protect. |
Type | string |
Default Value | __REQUIRED_FIELD__ |
Schemas
driver_criteria
Parameter | enabled |
Description | Enables the driver. |
Type | Boolean |
Default Value | __REQUIRED_FIELD__ |
Parameter | threshold |
Description | Threshold for the driver. |
Type | Number (double) |
Default Value | __REQUIRED_FIELD__ |
overload_confirmation_driver
Parameter | pod_cpu |
Description | The driver for using CPU usage as overload confirmation. |
Type | Object (driver_criteria) |
Default Value | Expand
|
Parameter | pod_memory |
Description | The driver for using memory usage as overload confirmation. |
Type | Object (driver_criteria) |
Default Value | Expand
|
kubelet_overload_confirmations
Parameter | criteria |
Description | Criteria for overload confirmation. |
Type | Object (overload_confirmation_driver) |
Default Value | __REQUIRED_FIELD__ |
Parameter | infra_context |
Description | Kubernetes selector for scraping metrics. |
Type | Object (aperture.spec.v1.KubernetesObjectSelector) |
Default Value | __REQUIRED_FIELD__ |
overload_confirmation
Parameter | operator |
Description | The operator for the overload confirmation criteria. oneof: `gt | lt | gte | lte | eq | neq` |
Type | string |
Default Value |
|
Parameter | query_string |
Description | The Prometheus query to be run. Must return a scalar or a vector with a single element. |
Type | string |
Default Value |
|
Parameter | threshold |
Description | The threshold for the overload confirmation criteria. |
Type | Number (double) |
Default Value |
|
Dynamic Configuration
note
The following configuration parameters can be dynamically configured at runtime, without reloading the policy.
Parameters
Parameter | dry_run |
Description | Dynamic configuration for setting dry run mode at runtime without restarting this policy. In dry run mode the scheduler acts as pass through to all flow and does not queue flows. It is useful for observing the behavior of load scheduler without disrupting any real traffic. |
Type | Boolean |
Default Value | __REQUIRED_FIELD__ |