Core services¶

The core services handle the Receptionist, the Killswitch, and the Observer endpoints and the workflows orchestration.

Endpoints¶

Three endpoints of the OpenTestFactory orchestrator are exposed: the Receptionist, the Killswitch, and the Observer. They all return Status messages.

`Receptionist`¶

This endpoint receives requests to run workflows. It must ensure that the request is authorized and that the workflow is syntactically valid. If this is the case, the request is deemed to have been accepted.

However, its execution may not successfully complete, for example, if a problem occurs with an execution environment.

The body of the request must be a JSON or YAML document that complies with the Workflow schema.

The request is a POST type request.

POST /workflows

The possible return codes are: Created (201), Invalid (422), Conflict (409), or Unauthorized (401). Conflict is returned if the service failed to publish a syntactically valid request.

If the request is accepted (status is Success), details.workflow_id contains a workflow identifier. This identifier, a string of characters, should be used to refer to this specific workflow when calling the other exposed endpoints. The identifier must be a valid path segment so that it can be integrated into a URI without specific processing.

A Workflow event is published on the event bus if the request is accepted.

Receptionist sample response¶

{
  "apiVersion": "v1",
  "kind": "Status",
  "metadata": {},
  "status": "Success",
  "message": "Workflow FooBar created",
  "reason": "Created",
  "details": {"workflow_id": "foobarbazfoobarbaz12-34"},
  "code": 201
}

`Killswitch`¶

This endpoint allows you to cancel an in-progress workflow.

The abort request is a DELETE type request. There is no request body.

DELETE /workflows/{workflow_id}

The possible return codes are OK (200), NotFound (404), or Unauthorized (401). The cancellation is not necessarily immediate: some actions that were initiated before the cancellation request may still be ongoing, and cleanup actions such as defined cleanup jobs or steps, will still run.

A WorkflowCanceled event is published on the event bus if the request is accepted.

Killswitch sample response¶

{
  "apiVersion": "v1",
  "kind": "Status",
  "metadata": {},
  "status": "Success",
  "message": "Workflow FooBar canceled",
  "reason": "OK",
  "details": {
    "workflow_id": "foobarbazfoobarbaz12-34"
  },
  "code": 200
}

`Observer`¶

This endpoint may be used to monitor the progress of a workflow.

The request is a GET type request. It contains a workflow ID, as returned by the Receptionist.

GET /workflows/{workflow_id}/status

The possible return codes are: OK (200), NotFound (404), or Unauthorized (401). If the response is OK, the body of the response indicates the progress.

The possible details.status values are PENDING, RUNNING, FAILED, and DONE.

details.items is a list containing the relevant messages exchanged pertaining to the requested workflow execution.

No event is published on the event bus.

Observer sample response¶

{
  "apiVersion": "v1",
  "kind": "Status",
  "metadata": {},
  "status": "Success",
  "message": "Workflow FooBar status",
  "reason": "OK",
  "details": {
    "status": "DONE",
    "items": [
      {
        "apiVersion": "opentestfactory.org/v1alpha1",
        "kind": "Workflow"
        // ...
      },
      {
        "apiVersion": "opentestfactory.org/v1alpha1",
        "kind": "ExecutionCommand"
        // ...
      },
      {
        "apiVersion": "opentestfactory.org/v1alpha1",
        "kind": "ExecutionResult"
        // ...
      },
      // ...
      {
        "apiVersion": "opentestfactory.org/v1alpha1",
        "kind": "WorkflowCompleted"
        // ...
      }
    ]
  },
  "code": 200
}

Orchestration¶

There can be multiple workflows running at any given time. The core services can queue or limit the number of workflows they receive or process simultaneously.

There are no dependencies between workflows.

Workflow handling¶

Each workflow has a jobs section, which is a collection of jobs.

jobs:
  job_a:
    runs-on: linux
    steps:
    - run: echo I am job A
  job_b:
    runs-on: linux
    steps:
    - run: echo I am job B
  job_c:
    runs-on: linux
    needs: job_a
    steps:
    - run: echo I am job C

Each job can have a needs section, which is either a string (a job name) or a list of strings (a list of job names).

A job with a needs section must not start before the completion of the job(s) it depends on.

A job with no needs section has no dependencies and can be processed at any time.

The core services choose an execution order for jobs based on their specified dependencies. As there may be jobs that can run simultaneously, two executions of a given workflow can result in a different execution order for the jobs.

For the above example, job_a and job_b may run in any order, even simultaneously if there is more than one linux execution environment available. The only guarantee is that job_c will not start before job_a has been completed.

flowchart LR
A3([job_a]) -.-> B3([job_b]) -.-> C3([job_c])
A0([job_a]) -.-> C0([job_c]) -.-> B0([job_b])
B1([job_b]) -.-> A1([job_a]) -.-> C1([job_c])
A2([job_a]) -.-> C2([job_c])
B2([job_b]) -.-> C2
A4([job_a]) -.-> C4([job_c])
A4 -.-> B4([job_b])

The core services should process each job following the chosen execution order.

When there are no more jobs to run, the workflow has been completed, and the core services must publish either a WorkflowCompleted event (if no execution error occurred related to the workflow) or a WorkflowCanceled event (if an execution error occurred related to the workflow) to the event bus.

Job handling¶

Each job is either a generator or a sequence of steps. A generator produces jobs that will eventually result in sequences of steps.

job_a:
  runs-on: application-a
  generator: example.com/my_generator@v1
  with:
    my_parameter: my_value
job_b:
  runs-on: [linux, application-a]
  steps:
  - run: echo Hi there

Jobs may have an if conditional. If the specified condition is not met (the if expression evaluates to false), the job is skipped.

Generator jobs¶

When the core services encounter a generator job, they must publish a GeneratorCommand event on the event bus.

Generator plugins will handle this event. More than one generator plugin may answer a given GeneratorCommand event.

Upon receiving a corresponding GeneratorResult event, the core services should select one (and only one) answer and add the contained jobs to the set of jobs to process for the workflow.

Those new jobs may have needs sections, but those sections can only refer to jobs specified in the GeneratorResult event.

Their runs-on sections should be enriched with the tags specified in the original generator job runs-on section, if any.

Jobs that are dependent on the original generator job will be available for processing after the completion of the GeneratorResult jobs.

Sequence of steps jobs¶

When the core services start to process a sequence of steps job, they should publish an ExecutionCommand event on the event bus to find an available compatible execution environment.

This published ExecutionCommand event must have a metadata.step_sequence_id of -1 and the runs-on section must have the value of the same section in the job definition.

Assuming the above job_b definition, the following event will be published:

apiVersion: opentestfactory.org/v1alpha1
kind: ExecutionCommand
metadata:
  name: ...
  workflow_id: ...
  job_id: ...
  job_origin: ...
  step_sequence_id: -1
runs-on: [linux, application-a]
scripts: []

Channel plugins will handle this event. More than one channel plugin may make an offer (and a given channel plugin may make more than one offer).

Upon receiving a corresponding ExecutionResult event, the core services should select one (and only one) offer, and associate their remaining ExecutionCommand events with this offer.

The offer includes a metadata.channel_id section which should be included in all following ExecutionCommand events published by the core services related to the job.

Steps in the sequence of steps are then processed in order.

Once all steps have been processed, the core services should publish one last ExecutionCommand event, with a metadata.step_sequence_id of -2. It should also contain the metadata.channel_id section:

apiVersion: opentestfactory.org/v1alpha1
kind: ExecutionCommand
metadata:
  name: ...
  workflow_id: ...
  job_id: ...
  job_origin: ...
  step_sequence_id: -2
  channel_id: ...
runs-on: [linux, application-a]
scripts: []

Step handling¶

Each step is either a function or an elementary step. A function produces steps that will eventually result in elementary steps.

steps:
- uses: actions/checkout@v1
  with:
    repository: https://gitlabs.com/demo/sample.git
- run: echo hi there

Steps cannot be processed if the result of the previous step execution is not yet known.

Steps may have an if conditional. If the specified condition is not met (the if expression evaluates to false), the step is skipped.

If a step has no explicit if conditional, it is assumed to be success(): if at least one previous step has failed in the current step sequence the step will be skipped.

Function steps¶

When the core services encounter a function step, they must publish a ProviderCommand event on the event bus.

Provider plugins will handle this event. More than one provider plugin may answer a given ProviderCommand event.

Upon receiving a corresponding ProviderResult event, the core services should select one (and only one) answer and add the contained sequence of steps to the beginning of the currently processing sequence of steps.

Their working-directory should be joined with the working-directory of the original function step, if any.

Elementary steps¶

When the core services encounter an elementary step, they must publish an ExecutionCommand event on the event bus, enriched with the selected metadata.channel_id. Those ExecutionCommand must contain a metadata.step_sequence_id which must be a non-negative integer, and continuously increasing for a given job.

Only one channel plugin should handle this event.

Upon receiving the corresponding ExecutionResult event, the core services should set the step output and proceed with the remaining steps in the sequence.

If the status section of the ExecutionResult event is not 0, and if the step does not have a continue-on-error: true section, the step will be in a failed state.

Error handling¶

At any time during the processing of a workflow, the core services and the participating plugins may publish ExecutionError events.

Also, ExecutionError events are published by the core services if the specified timeouts are reached.

Upon receiving an ExecutionError event, the core services will cancel the workflow.

Steps in the currently running job(s) that have a conditional that contains always() or failure() should still be processed.

Currently running job(s) last ExecutionCommand should still be published.

Not yet processed jobs that have a conditional that contains always() or failure() should still be processed.

Those ‘cleanup’ events will not change the outcome of the workflow. Depending on the cause of the failure, they may or may not succeed.