Core services¶
The core services handle the Receptionist
, the Killswitch
, and the Observer
endpoints
and the workflows orchestration.
Endpoints¶
Three endpoints of the OpenTestFactory orchestrator are exposed: the Receptionist
, the
Killswitch
, and the Observer
. They all return Status messages.
Receptionist
¶
This endpoint receives requests to run workflows. It must ensure that the request is authorized and that the workflow is syntactically valid. If this is the case, the request is deemed to have been accepted.
However, its execution may not successfully complete, for example, if a problem occurs with an execution environment.
The body of the request must be a JSON or YAML document that complies with the Workflow schema.
The request is a POST
type request.
POST /workflows
The possible return codes are: Created
(201), Invalid
(422), Conflict
(409), or
Unauthorized
(401). Conflict
is returned if the service failed to publish a
syntactically valid request.
If the request is accepted (status
is Success
), details.workflow_id
contains
a workflow identifier. This identifier, a string of characters, should be used to refer
to this specific workflow when calling the other exposed endpoints. The identifier must
be a valid path segment so that it can be integrated into a URI without specific
processing.
A Workflow event is published on the event bus if the request is accepted.
Receptionist sample response¶
{
"apiVersion": "v1",
"kind": "Status",
"metadata": {},
"status": "Success",
"message": "Workflow FooBar created",
"reason": "Created",
"details": {"workflow_id": "foobarbazfoobarbaz12-34"},
"code": 201
}
Killswitch
¶
This endpoint allows you to cancel an in-progress workflow.
The abort request is a DELETE
type request. There is no request body.
DELETE /workflows/{workflow_id}
The possible return codes are OK
(200), NotFound
(404), or Unauthorized
(401).
The cancellation is not necessarily immediate: some actions that were initiated before
the cancellation request may still be ongoing, and cleanup actions such as defined cleanup
jobs or steps, will still run.
A WorkflowCanceled event is published on the event bus if the request is accepted.
Killswitch sample response¶
{
"apiVersion": "v1",
"kind": "Status",
"metadata": {},
"status": "Success",
"message": "Workflow FooBar canceled",
"reason": "OK",
"details": {
"workflow_id": "foobarbazfoobarbaz12-34"
},
"code": 200
}
Observer
¶
This endpoint may be used to monitor the progress of a workflow.
The request is a GET
type request. It contains a workflow ID, as returned by the
Receptionist
.
GET /workflows/{workflow_id}/status
The possible return codes are: OK
(200), NotFound
(404), or Unauthorized
(401).
If the response is OK
, the body of the response indicates the progress.
The possible details.status
values are PENDING
, RUNNING
, FAILED
, and DONE
.
details.items
is a list containing the relevant messages exchanged pertaining to the
requested workflow execution.
No event is published on the event bus.
Observer sample response¶
{
"apiVersion": "v1",
"kind": "Status",
"metadata": {},
"status": "Success",
"message": "Workflow FooBar status",
"reason": "OK",
"details": {
"status": "DONE",
"items": [
{
"apiVersion": "opentestfactory.org/v1alpha1",
"kind": "Workflow"
// ...
},
{
"apiVersion": "opentestfactory.org/v1alpha1",
"kind": "ExecutionCommand"
// ...
},
{
"apiVersion": "opentestfactory.org/v1alpha1",
"kind": "ExecutionResult"
// ...
},
// ...
{
"apiVersion": "opentestfactory.org/v1alpha1",
"kind": "WorkflowCompleted"
// ...
}
]
},
"code": 200
}
Orchestration¶
There can be multiple workflows running at any given time. The core services can queue or limit the number of workflows they receive or process simultaneously.
There are no dependencies between workflows.
Workflow handling¶
Each workflow has a jobs
section, which is a collection of jobs.
jobs:
job_a:
runs-on: linux
steps:
- run: echo I am job A
job_b:
runs-on: linux
steps:
- run: echo I am job B
job_c:
runs-on: linux
needs: job_a
steps:
- run: echo I am job C
Each job can have a needs
section, which is either a string (a job name) or a list
of strings (a list of job names).
A job with a needs
section must not start before the completion of the job(s) it
depends on.
A job with no needs
section has no dependencies and can be processed at any time.
The core services choose an execution order for jobs based on their specified dependencies. As there may be jobs that can run simultaneously, two executions of a given workflow can result in a different execution order for the jobs.
For the above example, job_a
and job_b
may run in any order, even
simultaneously if there is more than one linux
execution environment available.
The only guarantee is that job_c
will not start before job_a
has been completed.
flowchart LR
A3([job_a]) -.-> B3([job_b]) -.-> C3([job_c])
A0([job_a]) -.-> C0([job_c]) -.-> B0([job_b])
B1([job_b]) -.-> A1([job_a]) -.-> C1([job_c])
A2([job_a]) -.-> C2([job_c])
B2([job_b]) -.-> C2
A4([job_a]) -.-> C4([job_c])
A4 -.-> B4([job_b])
The core services should process each job following the chosen execution order.
When there are no more jobs to run, the workflow has been completed, and the core services must publish either a WorkflowCompleted event (if no execution error occurred related to the workflow) or a WorkflowCanceled event (if an execution error occurred related to the workflow) to the event bus.
Job handling¶
Each job is either a generator or a sequence of steps. A generator produces jobs that will eventually result in sequences of steps.
job_a:
runs-on: application-a
generator: example.com/my_generator@v1
with:
my_parameter: my_value
job_b:
runs-on: [linux, application-a]
steps:
- run: echo Hi there
Jobs may have an if
conditional. If the specified condition is not met (the
if
expression evaluates to false
), the job is skipped.
Generator jobs¶
When the core services encounter a generator job, they must publish a GeneratorCommand event on the event bus.
Generator plugins will handle this event. More than one generator plugin may answer
a given GeneratorCommand
event.
Upon receiving a corresponding GeneratorResult event, the core services should select one (and only one) answer and add the contained jobs to the set of jobs to process for the workflow.
Those new jobs may have needs
sections, but those sections can only refer to jobs
specified in the GeneratorResult
event.
Their runs-on
sections should be enriched with the tags specified in the original
generator job runs-on
section, if any.
Jobs that are dependent on the original generator job will be available for processing
after the completion of the GeneratorResult
jobs.
Sequence of steps jobs¶
When the core services start to process a sequence of steps job, they should publish an ExecutionCommand event on the event bus to find an available compatible execution environment.
This published ExecutionCommand
event must have a metadata.step_sequence_id
of -1
and the runs-on
section must have the value of the same section in the job definition.
Assuming the above job_b
definition, the following event will be published:
apiVersion: opentestfactory.org/v1alpha1
kind: ExecutionCommand
metadata:
name: ...
workflow_id: ...
job_id: ...
job_origin: ...
step_sequence_id: -1
runs-on: [linux, application-a]
scripts: []
Channel plugins will handle this event. More than one channel plugin may make an offer (and a given channel plugin may make more than one offer).
Upon receiving a corresponding ExecutionResult event, the
core services should select one (and only one) offer, and associate their remaining
ExecutionCommand
events with this offer.
The offer includes a metadata.channel_id
section which should be included in all
following ExecutionCommand
events published by the core services related to the job.
Steps in the sequence of steps are then processed in order.
Once all steps have been processed, the core services should publish one last
ExecutionCommand
event, with a metadata.step_sequence_id
of -2
. It should also
contain the metadata.channel_id
section:
apiVersion: opentestfactory.org/v1alpha1
kind: ExecutionCommand
metadata:
name: ...
workflow_id: ...
job_id: ...
job_origin: ...
step_sequence_id: -2
channel_id: ...
runs-on: [linux, application-a]
scripts: []
Step handling¶
Each step is either a function or an elementary step. A function produces steps that will eventually result in elementary steps.
steps:
- uses: actions/checkout@v1
with:
repository: https://gitlabs.com/demo/sample.git
- run: echo hi there
Steps cannot be processed if the result of the previous step execution is not yet known.
Steps may have an if
conditional. If the specified condition is not met (the
if
expression evaluates to false
), the step is skipped.
If a step has no explicit if
conditional, it is assumed to be success()
: if at
least one previous step has failed in the current step sequence the step will be skipped.
Function steps¶
When the core services encounter a function step, they must publish a ProviderCommand event on the event bus.
Provider plugins will handle this event. More than one provider plugin may answer
a given ProviderCommand
event.
Upon receiving a corresponding ProviderResult event, the core services should select one (and only one) answer and add the contained sequence of steps to the beginning of the currently processing sequence of steps.
Their working-directory
should be joined with the working-directory
of the original
function step, if any.
Elementary steps¶
When the core services encounter an elementary step, they must publish an
ExecutionCommand event on the event bus, enriched with the
selected metadata.channel_id
. Those ExecutionCommand
must contain a
metadata.step_sequence_id
which must be a non-negative integer, and continuously
increasing for a given job.
Only one channel plugin should handle this event.
Upon receiving the corresponding ExecutionResult event, the core services should set the step output and proceed with the remaining steps in the sequence.
If the status
section of the ExecutionResult
event is not 0
, and if the step does
not have a continue-on-error: true
section, the step will be in a failed
state.
Error handling¶
At any time during the processing of a workflow, the core services and the participating plugins may publish ExecutionError events.
Also, ExecutionError
events are published by the core services if the specified
timeouts are reached.
Upon receiving an ExecutionError
event, the core services will cancel the workflow.
Steps in the currently running job(s) that have a conditional that contains always()
or
failure()
should still be processed.
Currently running job(s) last ExecutionCommand
should still be published.
Not yet processed jobs that have a conditional that contains always()
or failure()
should still be processed.
Those ‘cleanup’ events will not change the outcome of the workflow. Depending on the cause of the failure, they may or may not succeed.