Skip to content

Observer Service

A service that is used to observe the progress of workflows. Exposes user-facing endpoints that can be queried to get status info on workflows and execution environments.

Configuration

This module has a configuration file (observer.yaml by default) that describes the host, port, ssl_context, trusted_authorities, and logfile to use. It can also enable insecure logins.

If no configuration file is found it will default to the following values:

apiVersion: opentestfactory.org/v1beta2
kind: ServiceConfig
current-context: default
contexts:
- context:
    port: 443
    host: 127.0.0.1
    ssl_context: adhoc
    eventbus:
      endpoint: https://127.0.0.1:38368
      token: invalid
    retention_period_minutes: 60
  name: default

ssl_context is either adhoc, a list of two items (certificate file path and private key file path), or disabled (not recommended, will switch to plain HTTP).

A context can also contain a trusted_authorities, which is a list of public key files, used for token validation.

A context can also allow for insecure (token-less) logins, if enable_insecure_login is set to true (by default, insecure logins are disabled).

Insecure logins, if enabled, are only allowed from a given address (127.0.0.1 by default). This can be overridden by specifying insecure_bind_address.

Retention limit

This limit must be an integer. If the entry is missing, the default value will be assumed.

retention_period_minutes limits the retention period for workflow events. Events for a given workflow are kept for retention_period_minutes minutes after the reception of an end of workflow event. If not specified, defaults to 60 minutes.

Usage

python3 -m opentf.core.observer [--context context] [--config configfile]

Endpoints

This module exposes the following endpoints:

  • /inbox (POST)
  • /channelhandlers (GET)
  • /channels (GET)
  • /workflows (GET)
  • /workflows/status (GET)
  • /workflows/{workflow_id}/status (GET)
  • /workflows/{workflow_id}/workers (GET)

Whenever calling those endpoints, a signed token must be specified via the Authorization header.

This header will be of form:

Authorization: Bearer xxxxxxxx

It must be signed with one of the trusted authorities specified in the current context.

What is returned can be constrained by what the specified token is allowed to access. For example, if the specified token only has access to a specific namespace, resources not accessible from this namespace will not be shown.

/inbox

This endpoint is used by the eventbus to post relevant events to.

/channelhandlers

This endpoint returns a status manifest (a JSON document) with the following entries:

  • msg: the Known channel handlers string
  • details: an object with an items element which is a possibly empty list of strings, the known channel handlers IDs.

Known channel handlers example

In the following example there are two known channel handlers:

{
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {},
    "status": "Success",
    "message": "Known channel handlers",
    "details": {
        "items": [
            "50c7e5d1-7bdc-46f0-9422-3cc6660d00c0",
            "dd43b0bb-b359-4e9d-9459-c42ee7dc16d9"
        ],
    },
    "code": 200,
    "reason": "OK"
}

/channels

This endpoint returns a status manifest (a JSON document) with the following entries:

  • msg: the Known channels string
  • details: an object with an items element which is a possibly empty list of objects, the known channels.

Each item in the list has the following structure:

apiVersion: opentestfactory.org/v1alpha1
kind: Channel
metadata:
  name: foo
  namespaces: bar
  channelhandler_id: uuid
spec:
  tags: [a, b, c]
status:
  lastCommunicationTimestamp: atimestamp
  phase: 'IDLE' or 'BUSY' or 'PENDING'
  currentJobID: uuid or null

namespaces is either *, a name, or a comma-separated list of names:

  • *
  • foo
  • foo,bar,baz

If namespace is *, the channel is accessible from all namespaces. Otherwise, it is accessible from the listed namespace(s).

Known channels example

{
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {},
    "status": "Success",
    "message": "Known channels",
    "details": {
        "items": [
            {
                "apiVersion": "opentestfactory.org/v1alpha1",
                "kind": "Channel",
                "metadata": {
                    "channelhandler_id": "2af7585e-73dc-4f63-855e-4072fc9aa635",
                    "name": "robotframework.agents",
                    "namespaces": "default"
                },
                "spec": {
                    "tags": ["ssh", "linux", "robotframework"]
                },
                "status": {
                    "currentJobID": null,
                    "lastCommunicationTimestamp": "2022-05-02T10:00:49.933028",
                    "phase": "IDLE"
                }
            },
            {
                "apiVersion": "opentestfactory.org/v1alpha1",
                "kind": "Channel",
                "metadata": {
                    "channelhandler_id": "2af7585e-73dc-4f63-855e-4072fc9aa635",
                    "name": "cypress-agent.agents",
                    "namespaces": "*"
                },
                "spec": {
                    "tags": ["ssh", "linux", "cypress"]
                },
                "status": {
                    "currentJobID": "2af7585e-73dc-4f63-855e-4072fc9aa635",
                    "lastCommunicationTimestamp": "2022-05-02T10:00:49.933028",
                    "phase": "PENDING"
                }
            }
        ]
    },
    "code": 200,
    "reason": "OK"
}

/workflows

This endpoint returns a status manifest (a JSON document) with the following entries:

  • msg: the Running and recent workflows string
  • details: an object with an items element which is a possibly empty list of strings, the running and recent workflows IDs

Orchestrator running and recent workflows example

In the following example there are two running or recent workflows:

{
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {},
    "status": "Success",
    "message": "Running and recent workflows",
    "details": {
        "items": [
            "50c7e5d1-7bdc-46f0-9422-3cc6660d00c0",
            "dd43b0bb-b359-4e9d-9459-c42ee7dc16d9"
        ],
    },
    "code": 200,
    "reason": "OK"
}

/workflows/status

This endpoint returns a status manifest (a JSON document) with the following entries:

  • msg: a summary of the orchestrator status (a string)
  • details: an object with two elements: items, a list of all still running workflows, and status, one of the following strings: IDLE or BUSY.

If there is no running workflows, and if there are no active workers on any workflow, .details.status is IDLE. It is BUSY otherwise.

Orchestrator status examples

In the following example, a workflow is still running (or has at least one active worker):

{
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {},
    "status": "Success",
    "message": "1 workflows in progress",
    "details": {
        "items": ["50c7e5d1-7bdc-46f0-9422-3cc6660d00c0"],
        "status": "BUSY"
    },
    "code": 200,
    "reason": "OK"
}

In this example, all workflows have completed, and no worker is still active. It is safe to stop the orchestrator service.

{
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {},
    "status": "Success",
    "message": "No workflow in progress",
    "details": {
        "items": [],
        "status": "IDLE"
    },
    "code": 200,
    "reason": "OK"
}

/workflows/{workflow_id}/status

This endpoint returns a status manifest (a JSON document) with the following entries:

  • msg: a summary of the workflow status (a string)
  • details: an object with two elements: items, a list of all events pertaining to the workflow, and status, one of the following strings: RUNNING, DONE, or FAILED.

If the workflow is not known, a Failure status is returned instead, with a 404 code.

The list of events, items, may be paginated. The pagination, if used, is as per RFC8288 and relies on the Link header attribute:

import requests

base_url = '...'
response = requests.get(f'{base_url}/workflows/4420a8ad-1880-45dd-af07-9162583efcff/status')
events = response.json()['details']['items']
while 'next' in response.links:
    response = requests.get(response.links['next']['url'])
    events += response.json()['details']['items']

# All events have been collected

By default, the page size is determined by the observer service, but this can be changed by using the per_page parameter. There may be a maximum value allowed for per_page. (In the current implementation the default value is 100 and the maximum value is 1000.)

If a specific per_page value is set, and assuming it is within the allowed limits, the links will use this value.

A specific page can be specified using the page parameter. page starts at 1, as per RFC8288.

Even if items is paginated, the status is for the whole workflow.

Workflow statuses examples

This is a status manifest of a completed workflow:

{
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {},
    "status": "Success",
    "message": "Workflow completed",
    "details": {
        "items": [...],
        "status": "DONE"
    },
    "code": 200,
    "reason": "OK"
}

This is a status manifest of a failed workflow:

{
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {},
    "status": "Success",
    "message": "Workflow failed",
    "details": {
        "items": [...],
        "status": "FAILED"
    },
    "code": 200,
    "reason": "OK"
}

/workflows/{workflow_id}/workers

This endpoint returns a status manifest (a JSON document) with the following entries:

  • msg: a summary of the workflow status (a string)
  • details: an object with two elements: items, a list of active workers on the workflow, and status, one of the following strings: BUSY or IDLE.

If the workflow is not known, a Failure status is returned instead, with a 404 code.

Workflow workers example

{
    "kind": "Status",
    "apiVersion": "v1",
    "metadata": {},
    "status": "Success",
    "message": "2 active workers on workflow",
    "details": {
        "items": [
            "3137d9e2-0873-4cba-b5d3-f36dff8d9b3f",
            "e276b403-6d27-4d63-b3f1-1415c258736c"
        ],
        "status": "BUSY"
    },
    "code": 200,
    "reason": "OK"
}