Every Knative service has a configuration and one or more revisions. The following are significant when configuring a service.
Scaling a Knative service
Routing traffic to revisions
Scaling Knative Services
The Knative Pod Autoscaler (KPA) is the primary component responsible for ensuring that a Knative service has sufficient capacity to handle the current demand. Knative services automatically scale down to zero by default. This setting saves resources during periods of light traffic. However, the autoscaling behavior is configurable to better suit the needs of the service.
The KPA has many configuration options, some of which can only be set system wide. Setting the scaling limits is a practical subset of those options, all of which can be set at the service level to control the autoscaling behavior.
Set Scaling Limits
The scaling limits provide control over the number of instances created to handle the requests for a revision. Although scaling down to zero saves resources, it comes at the cost of a delay when going from zero instances to one instance. Going from zero to one instance is a cold start because the request must wait for Kubernetes to create a new pod with the service. This results in a delay before processing the first request. Subsequent requests are processed without the initial delay because the pod is already available to handle the requests. If the business requirements of an application cannot tolerate the delay, then you can use the
scale-min option to set a lower bound other than zero.
scale-min to a non zero value, turns off a significant cost saving feature of Knative. Avoid changing this value if an application can tolerate the slow response time from a cold start. However, the
scale-max option is nearly always worth using. For example, setting the
scale-max value protects the system from creating too many instances of a service in the case of a bug that causes slow response or a DDOS attack. It might prevent a single piece of software from consuming excessive amounts of cluster resources.
The following example sets both of these scaling options:
kn service update example-svc --scale-min 1 --scale-max 100
After running the command, use the
kn revision describe command to verify the setting as illustrated below.
kn revision describe example-svc-00004
Name: example-svc-00004 Namespace: my_namespace-kn-app Annotations: autoscaling.knative.dev/max-scale=100, autoscaling.knative.dev/min-scale=1 ... output omitted ...
In the output, the
max-scale and the
min-scale annotations are set by the
--scale-min options, respectively.
Set Concurrency Limits
The scaling limits control the number of instances created for a revision. At the next level of granularity, the concurrency limits control the number of requests that an instance can process at any given time.
If the average number of requests per instance rises above the
concurrency-limit, then the KPA creates a new instance to handle the current load. This is an approximate description of how the KPA works. The KPA determines the number of available instances based upon the traffic patterns for the service over time. The following example sets the
concurrency-limit to ten because the service has proven to reliably handle 10 requests per instance:
kn service update example-svc --concurrency-limit 10
In this example, the KPA attempts to ensure there is a least one revision instance for every 10 concurrent requests. For example, if the application receives approximately 50 concurrent requests, then Knative attempts to scale the revision up to five instances. The parameter sets approximately the minimum number of instances of a revision.
scale-target option sets the required level of concurrency for requests per revision instance. For some applications, hitting the concurrency limit could result in unacceptable response times. Setting the
scale-target to a lower value instructs the KPA to start creating more instances before hitting the limit.
concurrency-limit is set, Knative sets the
scale-target to the same value by default. In the following example, the
scale-target is explicitly set to a smaller value to instruct the KPA to scale up before reaching limit value:
kn service update example-svc --concurrency-limit 10 --scale-target 5
In this example, the KPA considers creating more instances of a revision when the number of requests per instance exceeds 5, even though an instance can handle up to ten concurrent requests. In this way, the service starts scaling up in advance, before problems arise, due to concurrency limits.
This description of the KPA is an approximation because the actual behavior differs based on a number of factors including the number of current instances and the rate of requests received.
Upon creating or updating a service, Knative creates a new revision for the service. Use the
kn revision list command to see a list of available revisions. The output of this command has a column called
TRAFFIC that shows the percentage of requests routed to that revision:
kn revision list
NAME SERVICE TRAFFIC TAGS ... example-svc-00003 example-svc 100% 3 example-svc-00002 example-svc 2 example-svc-00001 example-svc 1
By default, Knative maps all requests to the latest revision of the service.
Routes describe how to map incoming HTTP requests to a specific revision in Knative. Use
kn route list to see the routes:
kn route list
NAME URL READY example-svc https://<service-name>-<project>-<host-name>.com True
After developing a new version of the service, you can update the service with the
update sub-command as follows:
kn service update example-svc --image <your-registry-service>/example-svc-new
This creates a new revision for the service and routes all traffic to the new revision. However, perhaps you only want 5% of all traffic to hit the new revision. This would give you a small test sample to ensure that the revision is stable. The following command illustrates how to use the
--traffic option to achieve this desired state:
kn service update example-svc --traffic example-svc-00003=95 --traffic example-svc-00004=5
--traffic option of this command, you specify the percentage of traffic routed to the given revision. After executing this command, the previous revision receives 95% of all traffic, while the new revision receives 5%. The command can use as many
--traffic options as you need, but the values must sum to 100.
kn route describe command to confirm the results of the change.
kn route describe example-svc
Name: example-svc Namespace: my_namespace-kn-app Age: 18d URL: https://example-svc-my_namespace-kn-app.hostname.com Service: example-svc Traffic Targets: 95% example-svc-00003 5% example-svc-00004 ... output omitted ...
Traffic Targets section of the output confirms that the new revision receives only 5% of all requests while the previous revision still handles the rest. Once you are satisfied that the latest revision is stable, use the following command to route all traffic to the latest revision:
kn service update example-svc --traffic @latest=100
--traffic option enables many deployment patterns including Blue/Green, Canary, and progressive deployments.