Skip to content

Observer Service

This core service is used to observe the progress of workflows and more generally to provide observability points.

It exposes user-facing endpoints that are used by clients.

There are three service-specific configuration file options.

Environment variables

Logging

You can set the OBSERVER_DEBUG_LEVEL (all upper-cased) or DEBUG_LEVEL environment variables to DEBUG to add additional information in the console for the launched service. It defaults to INFO. (Please note that setting DEBUG_LEVEL to DEBUG will produce tons of logs.)

The possible values are NOTSET, DEBUG, INFO, WARNING, ERROR, and FATAL. Those values are from the most verbose, NOTSET, which shows all logs, to the least verbose, FATAL, which only shows fatal errors.

If OBSERVER_DEBUG_LEVEL is not defined then the value of DEBUG_LEVEL is used (or INFO if DEBUG_LEVEL is not defined either).

Access logs are only shown at NOTSET and DEBUG levels.

Retention policy specification

You can set the OBSERVER_RETENTION_POLICY environment variable to specify the retention policy. If defined, it must refer to an existing file that contains retention policy definitions.

If no retention policy is specified, the default retention policy is used: completed workflows are kept for 60 minutes (or for the duration specified by retention_period_minutes).

See “Retention policy” below for more information.

Tip

If the content of this referred file changes, the retention policy will be updated automatically.

Configuration file

This service has a configuration file (observer.yaml by default) that describes the host, port, ssl_context, and trusted_authorities to use. It can also enable insecure logins.

If no configuration file is found it will default to the following values:

apiVersion: opentestfactory.org/v1beta2
kind: ServiceConfig
current-context: default
contexts:
- context:
    port: 443
    host: 127.0.0.1
    ssl_context: adhoc
    eventbus:
      endpoint: https://127.0.0.1:38368
      token: invalid
    retention_period_minutes: 60
    max_retention_period_minutes: 0
    max_retention_workflow_count: 0
  name: default

The configuration included in the ‘allinone’ image is described in “Common settings.” The listening port is 7775 and the bind address is 0.0.0.0 as the service exposes user-facing endpoints.

It has three service-specific configuration options besides the common ones.

Default retention period

This period must be an integer. If the entry is missing, the default value will be assumed. It must be defined in the currently-used context.

retention_period_minutes sets the default retention period for workflow events. Events for a given workflow are kept for retention_period_minutes minutes after the reception of an end-of-workflow event. If not specified, defaults to 60 minutes.

If set to 0, the retention period is infinite (subject to maximum retention period and maximum workflow limit settings).

Maximum retention period

This period must be an integer. If the entry is missing, the default value will be assumed. It must be defined in the currently-used context.

max_retention_period_minutes sets the maximum retention period for workflow events. Events for a given workflow are kept for retention_period_minutes minutes after the reception of an end-of-workflow event. If not specified, defaults to 0 (no maximum limit).

Maximum workflow limit

This limit must be an integer. If the entry is missing, the default value will be assumed. It must be defined in the currently-used context.

max_retention_workflow_count sets the maximum number of completed workflows the service keeps. If not specified, defaults to 0 (no limit).

If there are more completed workflows than the limit, the oldest ones will be deleted.

Retention policy

Note

The retention policy is applied on all known completed workflows whenever a workflow completes.

Please note that the just-completed workflow is not taken into account, hence it will be kept till another workflow completes.

The default retention policy is to keep a completed workflow retention_period_minutes minutes. This is the policy applied if no other policy applies.

A completed workflow is a workflow that is no longer running. It includes canceled workflows and failed workflows.

If the number of kept completed workflows exceeds the limit set by max_retention_workflow_count, completed workflows with the oldest completion time are forgotten, in order, until the count reaches the maximum retention count.

If this default retention policy does not fit your needs, you can specify a retention policy file with the OBSERVER_RETENTION_POLICY environment variable.

Using a retention policy file

A retention policy file is a YAML file with a list of policies.

You can specify a retention policy file by setting the OBSERVER_RETENTION_POLICY environment variable while deploying the service so that it refers to a file.

Tip

If the content of this referred file changes, the retention policy will be updated automatically.

If the content of the file is invalid, or if the file does not exist, or is removed, the default retention policy will be used (and there will be error messages in the service logs).

Each policy is an object with the following entries:

  • name (required): a string, the name of the policy
  • scope (required): an expression evaluating to true or false
  • weight (optional): an integer (1 if not specified, can be negative)
  • retentionPeriod (required): a string (forever or of the form {x}d{y}h{y}m, that is, for example, 5d, 1h, 90m or 1h30m.)

The policies only apply to completed workflows. The first policy that applies defines the retention period.

scope is an expression (as described in “Expressions”) and uses the workflow context, which contains information about the completed workflow.

If the retention period is forever, the workflow is never forgotten if the number of kept completed workflows does not exceed the limit set by max_retention_workflow_count.

If the retention period is a duration, the workflow is kept for that duration after its completion time (or for the duration specified by max_retention_period_minutes if it is defined and lower).

If no policy applies to a completed workflow, its retention period will be the default retention period, retention_period_minutes, and its weight will be 1.

Completed workflows that have exceeded their retention period are forgotten.

If the number of kept completed workflows exceeds the limit set by max_retention_workflow_count, completed workflows with the lowest weight and the oldest completion time are forgotten, in order, until the count reaches the maximum retention count.

A policy file looks like this:

retentionPolicy:
- name: junk workflows
  scope: workflow.namespace == 'junk'
  retentionPeriod: 5m
- name: weekend failed workflows
  scope: >-
    workflow.status == 'failure'
    && ((dayOfWeek(workflow.completionTimestamp) == 'saturday')
        || (dayOfWeek(workflow.completionTimestamp) == 'sunday'))
  weight: 123
  retentionPeriod: 3d
- name: weekend ok workflows
  scope: >-
    (dayOfWeek(workflow.completionTimestamp) == 'saturday')
    || (dayOfWeek(workflow.completionTimestamp) == 'sunday')
  retentionPeriod: 3d

A workflow context is available for expressions. It has the following entries: name, namespace, status, creationTimestamp, and completionTimeStamp.

Examples

No max_retention_workflow_count

Assuming the default configuration and the above-mentioned policy file:

A workflow running a Saturday or a Sunday will be kept for 3 days (except if it was running in the ‘junk’ namespace).

A workflow running any other day will be kept for 1 hour (the default retention period).

Workflows running in the ‘junk’ namespace will only be kept for 5 minutes.

Warning

The order of the policies in the file is important. The first policy that applies defines the retention period.

In the example above, if the policy for weekend workflows was before the policy for junk workflows, the junk workflows running over the weekend would be kept for 3 days, not 5 minutes.

Weight and max_retention_workflow_count

Assuming a max_retention_workflow_count of 10 and the above-mentioned policy file:

If 5 workflows failed over the weekend and 10 succeeded over the weekend, the 5 failed workflows would be kept for 3 days, and the 5 most recently completed succesfull workflows would be kept for 3 days too.

If there were 15 failed workflows over the weekend, the 5 failed workflows with the oldest completion time would be forgotten, and all 10 successful workflows would be forgotten too.

This is because failed workflows have a weight of 123, while successful ones have no specified weight, and hence have the default weight of 1.

Subscriptions

The observer service subscribes to the following events:

kind apiVersion
Workflow opentestfactory.org/v1beta1
WorkflowCompleted opentestfactory.org/v1alpha1
WorkflowCanceled opentestfactory.org/v1alpha1
GeneratorCommand opentestfactory.org/v1alpha1
GeneratorResult opentestfactory.org/v1beta1
ProviderCommand opentestfactory.org/v1beta1
ProviderResult opentestfactory.org/v1beta1
ExecutionCommand opentestfactory.org/v1beta1
ExecutionResult opentestfactory.org/v1alpha1
ExecutionError opentestfactory.org/v1alpha1
Notification opentestfactory.org/v1alpha1

The observer service exposes an /inbox endpoint that is used by the event bus to post relevant events.

Launch command

If you want to manually start the observer service, use the following command:

python -m opentf.core.observer [--context context] [--config configfile]

Additional command-line options are available and described in “Command-line options.”