Observer Service¶
A service that is used to observe the progress of workflows. Exposes user-facing endpoints that can be queried to get status info on workflows and execution environments.
Configuration¶
This module has a configuration file (observer.yaml
by default)
that describes the host, port, ssl_context, trusted_authorities, and
logfile to use. It can also enable insecure logins.
If no configuration file is found it will default to the following values:
apiVersion: opentestfactory.org/v1beta2
kind: ServiceConfig
current-context: default
contexts:
- context:
port: 443
host: 127.0.0.1
ssl_context: adhoc
eventbus:
endpoint: https://127.0.0.1:38368
token: invalid
retention_period_minutes: 60
name: default
ssl_context
is either adhoc
, a list of two items (certificate file
path and private key file path), or disabled
(not recommended, will
switch to plain HTTP).
A context can also contain a trusted_authorities
, which is a list of
public key files, used for token validation.
A context can also allow for insecure (token-less) logins, if
enable_insecure_login
is set to true
(by default, insecure logins are
disabled).
Insecure logins, if enabled, are only allowed from a given address (127.0.0.1
by default). This can be overridden by specifying insecure_bind_address
.
Retention limit¶
This limit must be an integer. If the entry is missing, the default value will be assumed.
retention_period_minutes
limits the retention period for workflow events. Events
for a given workflow are kept for retention_period_minutes
minutes after the
reception of an end of workflow event. If not specified, defaults to 60 minutes.
Usage¶
python3 -m opentf.core.observer [--context context] [--config configfile]
Endpoints¶
This module exposes the following endpoints:
/inbox
(POST)/channelhandlers
(GET)/channels
(GET)/workflows
(GET)/workflows/status
(GET)/workflows/{workflow_id}/status
(GET)/workflows/{workflow_id}/workers
(GET)
Whenever calling those endpoints, a signed token must be specified
via the Authorization
header.
This header will be of form:
Authorization: Bearer xxxxxxxx
It must be signed with one of the trusted authorities specified in the current context.
What is returned can be constrained by what the specified token is allowed to access. For example, if the specified token only has access to a specific namespace, resources not accessible from this namespace will not be shown.
/inbox
¶
This endpoint is used by the eventbus to post relevant events to.
/channelhandlers
¶
This endpoint returns a status manifest (a JSON document) with the following entries:
msg
: theKnown channel handlers
stringdetails
: an object with anitems
element which is a possibly empty list of strings, the known channel handlers IDs.
Known channel handlers example¶
In the following example there are two known channel handlers:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Success",
"message": "Known channel handlers",
"details": {
"items": [
"50c7e5d1-7bdc-46f0-9422-3cc6660d00c0",
"dd43b0bb-b359-4e9d-9459-c42ee7dc16d9"
],
},
"code": 200,
"reason": "OK"
}
/channels
¶
This endpoint returns a status manifest (a JSON document) with the following entries:
msg
: theKnown channels
stringdetails
: an object with anitems
element which is a possibly empty list of objects, the known channels.
Each item in the list has the following structure:
apiVersion: opentestfactory.org/v1alpha1
kind: Channel
metadata:
name: foo
namespaces: bar
channelhandler_id: uuid
spec:
tags: [a, b, c]
status:
lastCommunicationTimestamp: atimestamp
phase: 'IDLE' or 'BUSY' or 'PENDING'
currentJobID: uuid or null
namespaces
is either *
, a name, or a comma-separated list of names:
*
foo
foo,bar,baz
If namespace is *
, the channel is accessible from all namespaces. Otherwise, it is accessible from the listed namespace(s).
Known channels example¶
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Success",
"message": "Known channels",
"details": {
"items": [
{
"apiVersion": "opentestfactory.org/v1alpha1",
"kind": "Channel",
"metadata": {
"channelhandler_id": "2af7585e-73dc-4f63-855e-4072fc9aa635",
"name": "robotframework.agents",
"namespaces": "default"
},
"spec": {
"tags": ["ssh", "linux", "robotframework"]
},
"status": {
"currentJobID": null,
"lastCommunicationTimestamp": "2022-05-02T10:00:49.933028",
"phase": "IDLE"
}
},
{
"apiVersion": "opentestfactory.org/v1alpha1",
"kind": "Channel",
"metadata": {
"channelhandler_id": "2af7585e-73dc-4f63-855e-4072fc9aa635",
"name": "cypress-agent.agents",
"namespaces": "*"
},
"spec": {
"tags": ["ssh", "linux", "cypress"]
},
"status": {
"currentJobID": "2af7585e-73dc-4f63-855e-4072fc9aa635",
"lastCommunicationTimestamp": "2022-05-02T10:00:49.933028",
"phase": "PENDING"
}
}
]
},
"code": 200,
"reason": "OK"
}
/workflows
¶
This endpoint returns a status manifest (a JSON document) with the following entries:
msg
: theRunning and recent workflows
stringdetails
: an object with anitems
element which is a possibly empty list of strings, the running and recent workflows IDs
Orchestrator running and recent workflows example¶
In the following example there are two running or recent workflows:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Success",
"message": "Running and recent workflows",
"details": {
"items": [
"50c7e5d1-7bdc-46f0-9422-3cc6660d00c0",
"dd43b0bb-b359-4e9d-9459-c42ee7dc16d9"
],
},
"code": 200,
"reason": "OK"
}
/workflows/status
¶
This endpoint returns a status manifest (a JSON document) with the following entries:
msg
: a summary of the orchestrator status (a string)details
: an object with two elements:items
, a list of all still running workflows, andstatus
, one of the following strings:IDLE
orBUSY
.
If there is no running workflows, and if there are no active workers on any
workflow, .details.status
is IDLE
. It is BUSY
otherwise.
Orchestrator status examples¶
In the following example, a workflow is still running (or has at least one active worker):
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Success",
"message": "1 workflows in progress",
"details": {
"items": ["50c7e5d1-7bdc-46f0-9422-3cc6660d00c0"],
"status": "BUSY"
},
"code": 200,
"reason": "OK"
}
In this example, all workflows have completed, and no worker is still active. It is safe to stop the orchestrator service.
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Success",
"message": "No workflow in progress",
"details": {
"items": [],
"status": "IDLE"
},
"code": 200,
"reason": "OK"
}
/workflows/{workflow_id}/status
¶
This endpoint returns a status manifest (a JSON document) with the following entries:
msg
: a summary of the workflow status (a string)details
: an object with two elements:items
, a list of all events pertaining to the workflow, andstatus
, one of the following strings:RUNNING
,DONE
, orFAILED
.
If the workflow is not known, a Failure
status is returned instead, with a
404 code
.
The list of events, items
, may be paginated. The pagination, if used, is as
per RFC8288
and relies on the Link
header attribute:
import requests
base_url = '...'
response = requests.get(f'{base_url}/workflows/4420a8ad-1880-45dd-af07-9162583efcff/status')
events = response.json()['details']['items']
while 'next' in response.links:
response = requests.get(response.links['next']['url'])
events += response.json()['details']['items']
# All events have been collected
By default, the page size is determined by the observer service, but this can be
changed by using the per_page
parameter. There may be a maximum value
allowed for per_page
. (In the current implementation the default value is 100
and the maximum value is 1000.)
If a specific per_page
value is set, and assuming it is within the allowed
limits, the links will use this value.
A specific page can be specified using the page
parameter. page
starts at 1,
as per RFC8288.
Even if items
is paginated, the status is for the whole workflow.
Workflow statuses examples¶
This is a status manifest of a completed workflow:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Success",
"message": "Workflow completed",
"details": {
"items": [...],
"status": "DONE"
},
"code": 200,
"reason": "OK"
}
This is a status manifest of a failed workflow:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Success",
"message": "Workflow failed",
"details": {
"items": [...],
"status": "FAILED"
},
"code": 200,
"reason": "OK"
}
/workflows/{workflow_id}/workers
¶
This endpoint returns a status manifest (a JSON document) with the following entries:
msg
: a summary of the workflow status (a string)details
: an object with two elements:items
, a list of active workers on the workflow, andstatus
, one of the following strings:BUSY
orIDLE
.
If the workflow is not known, a Failure
status is returned instead, with a
404 code
.
Workflow workers example¶
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Success",
"message": "2 active workers on workflow",
"details": {
"items": [
"3137d9e2-0873-4cba-b5d3-f36dff8d9b3f",
"e276b403-6d27-4d63-b3f1-1415c258736c"
],
"status": "BUSY"
},
"code": 200,
"reason": "OK"
}