Configuring Knative Services

Every Knative service has a configuration and one or more revisions. The following are significant when configuring a service.

Scaling a Knative service
Routing traffic to revisions

Scaling Knative Services

The Knative Pod Autoscaler (KPA) is the primary component responsible for ensuring that a Knative service has sufficient capacity to handle the current demand. Knative services automatically scale down to zero by default. This setting saves resources during periods of light traffic. However, the autoscaling behavior is configurable to better suit the needs of the service.

The KPA has many configuration options, some of which can only be set system wide. Setting the scaling limits is a practical subset of those options, all of which can be set at the service level to control the autoscaling behavior.

Set Scaling Limits

The scaling limits provide control over the number of instances created to handle the requests for a revision. Although scaling down to zero saves resources, it comes at the cost of a delay when going from zero instances to one instance. Going from zero to one instance is a cold start because the request must wait for Kubernetes to create a new pod with the service. This results in a delay before processing the first request. Subsequent requests are processed without the initial delay because the pod is already available to handle the requests. If the business requirements of an application cannot tolerate the delay, then you can use the scale-min option to set a lower bound other than zero.

Setting scale-min to a non zero value, turns off a significant cost saving feature of Knative. Avoid changing this value if an application can tolerate the slow response time from a cold start. However, the scale-max option is nearly always worth using. For example, setting the scale-max value protects the system from creating too many instances of a service in the case of a bug that causes slow response or a DDOS attack. It might prevent a single piece of software from consuming excessive amounts of cluster resources.

The following example sets both of these scaling options:

kn service update example-svc --scale-min 1 --scale-max 100

After running the command, use the kn revision describe command to verify the setting as illustrated below.

kn revision describe example-svc-00004

Name:         example-svc-00004
Namespace:    my_namespace-kn-app
Annotations:  autoscaling.knative.dev/max-scale=100, autoscaling.knative.dev/min-scale=1
... output omitted ...

In the output, the max-scale and the min-scale annotations are set by the --scale-max and --scale-min options, respectively.

Set Concurrency Limits

The scaling limits control the number of instances created for a revision. At the next level of granularity, the concurrency limits control the number of requests that an instance can process at any given time.

If the average number of requests per instance rises above the concurrency-limit, then the KPA creates a new instance to handle the current load. This is an approximate description of how the KPA works. The KPA determines the number of available instances based upon the traffic patterns for the service over time. The following example sets the concurrency-limit to ten because the service has proven to reliably handle 10 requests per instance:

kn service update example-svc --concurrency-limit 10

In this example, the KPA attempts to ensure there is a least one revision instance for every 10 concurrent requests. For example, if the application receives approximately 50 concurrent requests, then Knative attempts to scale the revision up to five instances. The parameter sets approximately the minimum number of instances of a revision.

The scale-target option sets the required level of concurrency for requests per revision instance. For some applications, hitting the concurrency limit could result in unacceptable response times. Setting the scale-target to a lower value instructs the KPA to start creating more instances before hitting the limit.

When the concurrency-limit is set, Knative sets the scale-target to the same value by default. In the following example, the scale-target is explicitly set to a smaller value to instruct the KPA to scale up before reaching limit value:

kn service update example-svc --concurrency-limit 10 --scale-target 5

In this example, the KPA considers creating more instances of a revision when the number of requests per instance exceeds 5, even though an instance can handle up to ten concurrent requests. In this way, the service starts scaling up in advance, before problems arise, due to concurrency limits.

Note	This description of the KPA is an approximation because the actual behavior differs based on a number of factors including the number of current instances and the rate of requests received.

Routing Traffic

Upon creating or updating a service, Knative creates a new revision for the service. Use the kn revision list command to see a list of available revisions. The output of this command has a column called TRAFFIC that shows the percentage of requests routed to that revision:

kn revision list

NAME                SERVICE       TRAFFIC       TAGS        ...
example-svc-00003   example-svc   100%             3         
example-svc-00002   example-svc                    2        
example-svc-00001   example-svc                    1

By default, Knative maps all requests to the latest revision of the service.

Routes describe how to map incoming HTTP requests to a specific revision in Knative. Use kn route list to see the routes:

kn route list

NAME            URL                                                READY
example-svc   https://<service-name>-<project>-<host-name>.com   True

After developing a new version of the service, you can update the service with the update sub-command as follows:

kn service update example-svc --image <your-registry-service>/example-svc-new

This creates a new revision for the service and routes all traffic to the new revision. However, perhaps you only want 5% of all traffic to hit the new revision. This would give you a small test sample to ensure that the revision is stable. The following command illustrates how to use the --traffic option to achieve this desired state:

kn service update example-svc --traffic example-svc-00003=95 --traffic example-svc-00004=5

In each --traffic option of this command, you specify the percentage of traffic routed to the given revision. After executing this command, the previous revision receives 95% of all traffic, while the new revision receives 5%. The command can use as many --traffic options as you need, but the values must sum to 100.

Use the kn route describe command to confirm the results of the change.

kn route describe example-svc

Name:       example-svc
Namespace:  my_namespace-kn-app
Age:        18d
URL:        https://example-svc-my_namespace-kn-app.hostname.com
Service:    example-svc

Traffic Targets:
   95%  example-svc-00003
    5%  example-svc-00004

... output omitted ...

The Traffic Targets section of the output confirms that the new revision receives only 5% of all requests while the previous revision still handles the rest. Once you are satisfied that the latest revision is stable, use the following command to route all traffic to the latest revision:

kn service update example-svc --traffic @latest=100

The --traffic option enables many deployment patterns including Blue/Green, Canary, and progressive deployments.