Observer Service¶
This core service is used to observe the progress of workflows and more generally to provide observability points.
It exposes user-facing endpoints that are used by clients.
There are three service-specific configuration file options.
Environment variables¶
Logging¶
You can set the OBSERVER_DEBUG_LEVEL
(all upper-cased) or DEBUG_LEVEL
environment variables
to DEBUG
to add additional information in the console for the launched service. It defaults to
INFO
. (Please note that setting DEBUG_LEVEL
to DEBUG
will produce tons of logs.)
The possible values are NOTSET
, DEBUG
, INFO
, WARNING
, ERROR
, and FATAL
. Those values
are from the most verbose, NOTSET
, which shows all logs, to the least verbose, FATAL
, which
only shows fatal errors.
If OBSERVER_DEBUG_LEVEL
is not defined then the value of DEBUG_LEVEL
is used (or INFO
if
DEBUG_LEVEL
is not defined either).
Access logs are only shown at NOTSET
and DEBUG
levels.
Configuration options¶
You can also set the three configuration options using environment variables:
OBSERVER_MAX_RETENTION_PERIOD_MINUTES
sets the maximum retention period for workflow events.OBSERVER_MAX_RETENTION_WORKFLOW_COUNT
sets the maximum number of completed workflows the service keeps.OBSERVER_RETENTION_PERIOD_MINUTES
sets the default retention period for workflow events.
If those environment variables are defined, they override the values in the configuration file.
Retention policy specification¶
You can set the OBSERVER_RETENTION_POLICY
environment variable to specify the retention policy.
If defined, it must refer to an existing file that contains retention policy definitions.
If no retention policy is specified, the default retention policy is used: completed workflows
are kept for 60 minutes (or for the duration specified by retention_period_minutes
).
See “Retention policy” below for more information.
Tip
If the content of this referred file changes, the retention policy will be updated automatically.
Configuration file¶
This service has a configuration file (observer.yaml
by default) that describes the host,
port, ssl_context, and trusted_authorities to use. It can also enable insecure logins.
If no configuration file is found it will default to the following values:
apiVersion: opentestfactory.org/v1beta2
kind: ServiceConfig
current-context: default
contexts:
- context:
port: 443
host: 127.0.0.1
ssl_context: adhoc
eventbus:
endpoint: https://127.0.0.1:38368
token: invalid
retention_period_minutes: 60
max_retention_period_minutes: 0
max_retention_workflow_count: 0
name: default
The configuration included in the ‘allinone’ image is described in “Common settings.” The
listening port is 7775
and the bind address is 0.0.0.0
as the service exposes user-facing endpoints.
It has three service-specific configuration options besides the common ones.
Default retention period¶
This period must be an integer. If the entry is missing, the default value will be assumed. It must be defined in the currently-used context.
retention_period_minutes
sets the default retention period for workflow events. Events
for a given workflow are kept for retention_period_minutes
minutes after the
reception of an end-of-workflow event. If not specified, defaults to 60 minutes.
If set to 0, the retention period is infinite (subject to maximum retention period and maximum workflow limit settings).
Maximum retention period¶
This period must be an integer. If the entry is missing, the default value will be assumed. It must be defined in the currently-used context.
max_retention_period_minutes
sets the maximum retention period for workflow events. Events
for a given workflow are kept for retention_period_minutes
minutes after the
reception of an end-of-workflow event. If not specified, defaults to 0 (no maximum limit).
Maximum workflow limit¶
This limit must be an integer. If the entry is missing, the default value will be assumed. It must be defined in the currently-used context.
max_retention_workflow_count
sets the maximum number of completed workflows the service
keeps. If not specified, defaults to 0 (no limit).
If there are more completed workflows than the limit, the oldest ones will be deleted.
Retention policy¶
Note
The retention policy is applied on all known completed workflows whenever a workflow completes.
Please note that the just-completed workflow is not taken into account, hence it will be kept till another workflow completes.
The default retention policy is to keep a completed workflow retention_period_minutes
minutes.
This is the policy applied if no other policy applies.
A completed workflow is a workflow that is no longer running. It includes canceled workflows and failed workflows.
If the number of kept completed workflows exceeds the limit set by max_retention_workflow_count
,
completed workflows with the oldest completion time are forgotten, in order,
until the count reaches the maximum retention count.
If this default retention policy does not fit your needs, you can specify a retention policy
file with the OBSERVER_RETENTION_POLICY
environment variable.
Using a retention policy file¶
A retention policy file is a YAML file with a list of policies.
You can specify a retention policy file by setting the OBSERVER_RETENTION_POLICY
environment
variable while deploying the service so that it refers to a file.
Tip
If the content of this referred file changes, the retention policy will be updated automatically.
If the content of the file is invalid, or if the file does not exist, or is removed, the default retention policy will be used (and there will be error messages in the service logs).
Each policy is an object with the following entries:
name
(required): a string, the name of the policyscope
(required): an expression evaluating totrue
orfalse
weight
(optional): an integer (1 if not specified, can be negative)retentionPeriod
(required): a string (forever
or of the form{x}d{y}h{y}m
, that is, for example,5d
,1h
,90m
or1h30m
.)
The policies only apply to completed workflows. The first policy that applies defines the retention period.
scope
is an expression (as described in “Expressions”)
and uses the workflow
context, which contains information about the completed workflow.
If the retention period is forever
, the workflow is never forgotten if the number of
kept completed workflows does not exceed the limit set by max_retention_workflow_count
.
If the retention period is a duration, the workflow is kept for that duration after its
completion time (or for the duration specified by max_retention_period_minutes
if it is
defined and lower).
If no policy applies to a completed workflow, its retention period will be the default
retention period, retention_period_minutes
, and its weight will be 1.
Completed workflows that have exceeded their retention period are forgotten.
If the number of kept completed workflows exceeds the limit set by max_retention_workflow_count
,
completed workflows with the lowest weight and the oldest completion time are forgotten, in order,
until the count reaches the maximum retention count.
A policy file looks like this:
retentionPolicy:
- name: junk workflows
scope: workflow.namespace == 'junk'
retentionPeriod: 5m
- name: weekend failed workflows
scope: >-
workflow.status == 'failure'
&& ((dayOfWeek(workflow.completionTimestamp) == 'saturday')
|| (dayOfWeek(workflow.completionTimestamp) == 'sunday'))
weight: 123
retentionPeriod: 3d
- name: weekend ok workflows
scope: >-
(dayOfWeek(workflow.completionTimestamp) == 'saturday')
|| (dayOfWeek(workflow.completionTimestamp) == 'sunday')
retentionPeriod: 3d
A workflow
context is available for expressions. It has the following entries: name
,
namespace
, status
, creationTimestamp
, and completionTimeStamp
.
Examples¶
No max_retention_workflow_count
¶
Assuming the default configuration and the above-mentioned policy file:
A workflow running a Saturday or a Sunday will be kept for 3 days (except if it was running in the ‘junk’ namespace).
A workflow running any other day will be kept for 1 hour (the default retention period).
Workflows running in the ‘junk’ namespace will only be kept for 5 minutes.
Warning
The order of the policies in the file is important. The first policy that applies defines the retention period.
In the example above, if the policy for weekend workflows was before the policy for junk workflows, the junk workflows running over the weekend would be kept for 3 days, not 5 minutes.
Weight and max_retention_workflow_count
¶
Assuming a max_retention_workflow_count
of 10 and the above-mentioned policy file:
If 5 workflows failed over the weekend and 10 succeeded over the weekend, the 5 failed workflows would be kept for 3 days, and the 5 most recently completed succesfull workflows would be kept for 3 days too.
If there were 15 failed workflows over the weekend, the 5 failed workflows with the oldest completion time would be forgotten, and all 10 successful workflows would be forgotten too.
This is because failed workflows have a weight
of 123, while successful
ones have no specified weight, and hence have the default weight of 1.
Subscriptions¶
The observer service subscribes to the following events:
kind |
apiVersion |
---|---|
ExecutionCommand |
opentestfactory.org/v1 |
ExecutionError |
opentestfactory.org/v1alpha1 |
ExecutionResult |
opentestfactory.org/v1alpha1 |
GeneratorCommand |
opentestfactory.org/v1alpha1 |
GeneratorResult |
opentestfactory.org/v1 |
Notification |
opentestfactory.org/v1alpha1 |
ProviderCommand |
opentestfactory.org/v1beta1 |
ProviderResult |
opentestfactory.org/v1 |
Workflow |
opentestfactory.org/v1 |
WorkflowCanceled |
opentestfactory.org/v1 |
WorkflowCancellation |
opentestfactory.org/v1 |
WorkflowCompleted |
opentestfactory.org/v1 |
WorkflowResult |
opentestfactory.org/v1alpha1 |
The observer service exposes an /inbox
endpoint that is used by the event bus to post relevant events.
Launch command¶
If you want to manually start the observer service, use the following command:
python -m opentf.core.observer [--context context] [--config configfile]
Additional command-line options are available and described in “Command-line options.”