Configuring Knative Services
Every Knative service has a configuration and one or more revisions. The following are significant when configuring a service.
-
Scaling a Knative service
-
Routing traffic to revisions
Scaling Knative Services
The Knative Pod Autoscaler (KPA) is the primary component responsible for ensuring that a Knative service has sufficient capacity to handle the current demand. Knative services automatically scale down to zero by default. This setting saves resources during periods of light traffic. However, the autoscaling behavior is configurable to better suit the needs of the service.
The KPA has many configuration options, some of which can only be set system wide. Setting the scaling limits is a practical subset of those options, all of which can be set at the service level to control the autoscaling behavior.
Set Scaling Limits
The scaling limits provide control over the number of instances created to handle the requests for a revision. Although scaling down to zero saves resources, it comes at the cost of a delay when going from zero instances to one instance. Going from zero to one instance is a cold start because the request must wait for Kubernetes to create a new pod with the service. This results in a delay before processing the first request. Subsequent requests are processed without the initial delay because the pod is already available to handle the requests. If the business requirements of an application cannot tolerate the delay, then you can use the scale-min
option to set a lower bound other than zero.
Setting scale-min
to a non zero value, turns off a significant cost saving feature of Knative. Avoid changing this value if an application can tolerate the slow response time from a cold start. However, the scale-max
option is nearly always worth using. For example, setting the scale-max
value protects the system from creating too many instances of a service in the case of a bug that causes slow response or a DDOS attack. It might prevent a single piece of software from consuming excessive amounts of cluster resources.
The following example sets both of these scaling options:
kn service update example-svc --scale-min 1 --scale-max 100
After running the command, use the kn revision describe
command to verify the setting as illustrated below.
kn revision describe example-svc-00004
Name: example-svc-00004 Namespace: my_namespace-kn-app Annotations: autoscaling.knative.dev/max-scale=100, autoscaling.knative.dev/min-scale=1 ... output omitted ...
In the output, the max-scale
and the min-scale
annotations are set by the --scale-max
and --scale-min
options, respectively.
Set Concurrency Limits
The scaling limits control the number of instances created for a revision. At the next level of granularity, the concurrency limits control the number of requests that an instance can process at any given time.
If the average number of requests per instance rises above the concurrency-limit
, then the KPA creates a new instance to handle the current load. This is an approximate description of how the KPA works. The KPA determines the number of available instances based upon the traffic patterns for the service over time. The following example sets the concurrency-limit
to ten because the service has proven to reliably handle 10 requests per instance:
kn service update example-svc --concurrency-limit 10
In this example, the KPA attempts to ensure there is a least one revision instance for every 10 concurrent requests. For example, if the application receives approximately 50 concurrent requests, then Knative attempts to scale the revision up to five instances. The parameter sets approximately the minimum number of instances of a revision.
The scale-target
option sets the required level of concurrency for requests per revision instance. For some applications, hitting the concurrency limit could result in unacceptable response times. Setting the scale-target
to a lower value instructs the KPA to start creating more instances before hitting the limit.
When the concurrency-limit
is set, Knative sets the scale-target
to the same value by default. In the following example, the scale-target
is explicitly set to a smaller value to instruct the KPA to scale up before reaching limit value:
kn service update example-svc --concurrency-limit 10 --scale-target 5
In this example, the KPA considers creating more instances of a revision when the number of requests per instance exceeds 5, even though an instance can handle up to ten concurrent requests. In this way, the service starts scaling up in advance, before problems arise, due to concurrency limits.
Note |
This description of the KPA is an approximation because the actual behavior differs based on a number of factors including the number of current instances and the rate of requests received. |
Routing Traffic
Upon creating or updating a service, Knative creates a new revision for the service. Use the kn revision list
command to see a list of available revisions. The output of this command has a column called TRAFFIC
that shows the percentage of requests routed to that revision:
kn revision list
NAME SERVICE TRAFFIC TAGS ...
example-svc-00003 example-svc 100% 3
example-svc-00002 example-svc 2
example-svc-00001 example-svc 1
By default, Knative maps all requests to the latest revision of the service.
Routes describe how to map incoming HTTP requests to a specific revision in Knative. Use kn route list
to see the routes:
kn route list
NAME URL READY example-svc https://<service-name>-<project>-<host-name>.com True
After developing a new version of the service, you can update the service with the update
sub-command as follows:
kn service update example-svc --image <your-registry-service>/example-svc-new
This creates a new revision for the service and routes all traffic to the new revision. However, perhaps you only want 5% of all traffic to hit the new revision. This would give you a small test sample to ensure that the revision is stable. The following command illustrates how to use the --traffic
option to achieve this desired state:
kn service update example-svc --traffic example-svc-00003=95 --traffic example-svc-00004=5
In each --traffic
option of this command, you specify the percentage of traffic routed to the given revision. After executing this command, the previous revision receives 95% of all traffic, while the new revision receives 5%. The command can use as many --traffic
options as you need, but the values must sum to 100.
Use the kn route describe
command to confirm the results of the change.
kn route describe example-svc
Name: example-svc Namespace: my_namespace-kn-app Age: 18d URL: https://example-svc-my_namespace-kn-app.hostname.com Service: example-svc Traffic Targets: 95% example-svc-00003 5% example-svc-00004 ... output omitted ...
The Traffic Targets
section of the output confirms that the new revision receives only 5% of all requests while the previous revision still handles the rest. Once you are satisfied that the latest revision is stable, use the following command to route all traffic to the latest revision:
kn service update example-svc --traffic @latest=100
The --traffic
option enables many deployment patterns including Blue/Green, Canary, and progressive deployments.