Workflow Service¶
1. Introduction¶
Workflow Service is used to schedule, orchestrate and monitor execution of a sequence of tasks in the TCUP platform.
The service is mostly leveraged to implement data processing pipelines where the output of one task is an input of another therefore series of tasks can be created in Task Service to do complex data processing. Each of the tasks may depend on some previous task. All those tasks are finally combined into a workflow which can execute them in sequence or in parallel to achieve the final goal.
1.1 Intended Audience¶
The intended audience of this document is anyone who wants to have an overview of TCUP Workflow Service. After going through this document, the user will understand the capability of TCUP Workflow Service in IoT platform.
2. Key Concepts¶
In order to use the Workflow Service, a user needs to understand some of the basic concepts and building blocks of Work Service. Please refer the following section for the concepts:
2.1 Scheduling of Tasks¶
Task Service executes individual tasks. However, tasks may need to be executed in a set, with output of one task serving as an input of another task. Thus there may be dependencies among tasks in the task set. Workflow Service can specify a set of tasks to be executed as a unit, along with their sequence & dependencies. The sequence in which tasks are executed respecting the dependencies is called a schedule.
To schedule a task, users need to specify a process name, process version as parameter and the details of the task set is specified either in a JSON while scheduling the task from swagger or it should be specified in portal..
2.2 Schedule¶
The schedule specifies the sequence in which the tasks are to be executed.
2.3 Process¶
The output of this schedule is known as Process. The process is identified by process name and version which were mentioned at the time of scheduling the task. A process definition is stored inside the Workflow Service.
2.3.1 Providing Permission to a Process¶
Once the process is created, users need to provide the required permissions so that the respective action can be done on that process. To provide the permission, users need to mention the name and version of the process, role name and the required actions as parameter.
2.4 Pre-Initiation¶
A pre-initiation API creates copies of the set of tasks required in the workflow and uses them during the execution of the workflow. User has to provide process name and process version as parameter to run pre-initiation API.
Some important concepts regarding pre-initiation are:
Base Tasks: These are the original tasks from which copies are made.
Copied Tasks: These are the tasks which are copied from the base task. Copied task has the same executable as the base task but may have different input/data files, which are uploaded by the user. It is the copied task which runs when a workflow gets triggered.
Mapping: The Workflow Service maintains a mapping of the base task ID and its corresponding copied task ID as used in the workflow. User can find the mapping using the corresponding API.
Once the process is pre-initiated a process instance identifier (or PID) is generated which can be used to track the state of the workflow.
2.5 Initiation¶
In order for the task set to be executed, the process needs to be initiated. During initiation, the first task in the set of tasks is started and an instance of the process is created. The instance of the process has a Process Instance Identifier (or PID). The PID can be used to track the state of the workflow. Although only a single instance of a task can execute at a time, there may be a requirement for same task executing concurrently in multiple workflows running at the same time. This requirement is met by Pre-Initiation. After the process is pre-initiated user can run the dependent task set by specifying the PID in the initiate process API.
2.6 Orchestration¶
Once a process is initiated, Workflow Service maintains the state of the process which can be tracked using the Workflow Service “history” API. The Workflow Service monitors the task execution and when the task is finished, the next task in the schedule is started by Workflow Service. In this way, the Workflow Service performs orchestration of the set of tasks.
2.7 Recovery¶
During execution of a workflow, errors may occur, including stopping (abort) of the task by the user. In order that the workflow can be continued from the point of error instead of restarting the workflow, the recovery feature has been introduced.
2.8 Recovery Mode¶
Associated with the concept of recovery, is the concept of recovery mode. The recovery feature can be enabled or disabled for a process definition and an API is provided for that.
2.9 Pause and Resume Workflow¶
Workflow service provides API to pause the workflow by specifying the pid (process instance id) of the process to pause. Additionally, processes that are paused can be resumed by a resume API which takes as input the process instance id (pid). Note that the task(s) themselves are not paused, they continue till completion or error. For resume, the completed tasks (that were in running state when the workflow was paused) are restarted.
2.10 Signaling a Workflow¶
The workflow can stop at a certain state, called an IntermediateCatchEvent (ICE) state, and wait for a signal (event) from the external world. The signal can have a message payload associated with it. The workflow, on receiving the signal, copies the message into an internal variable and then continues on its path. The value of the internal variable can be used in the later steps of the workflow
2.11 Condition Evaluation (decision box)¶
The workflow can evaluate a certain (set of) conditions and take different paths based on the outcome of the condition evaluation. If none of the conditions are satisfied, the workflow can take a default path. The computation of the condition is done in a task node.
3. Functional Capabilities¶
Workflow Service has the following functional capabilities:
Create schedule and process definition - The set of tasks to be executed and the dependencies are to be specified. From this, the Workflow Service computes a schedule, which is a sequence of tasks. From the schedule, an internal representation called a “process definition” is created. A process definition has a name and a version.
Get details of a process definition - Workflow Service can return the details of a process by using the details API.
Search a process definition - Workflow Service can search for a particular process using proper search condition by using the API.
Create permissions - For a process definition, specified by name and version, Workflow Service can grant permission to a role to run the various API calls for that process definition.
Run a process - Given a process definition, Workflow Service can create an instance of the process. Creating an instance of a process implies that the execution of the first task in the set of tasks is started.
Show history of a process instance – Given a PID, Workflow Service can display the various tasks already executed as part of the process instance, along with the task(s) that are currently running.
Abort a process instance - Workflow Service can abort a process along with the running tasks by using the abort API and specifying a PID. Aborting the tasks is optional.
Get status of a process instance - Workflow Service can return the current status of a process by using the status API. The status API can also report the currently running tasks associated with the workflow.
Pre-initiate a process - Workflow Service calls Task Service to create copies of the tasks in the workflow definition. This process is called pre-initiation. Pre-initiation is called prior to initiating the process instance.
Mapping for a process instance - For a process instance task mapping can be retrieved by mapping API. This API shows mapping between base task and copied task IDs.
Initiate a process - Given a pre-initiated process instance, Workflow Service can start the execution of the first task in the set of tasks.
Search a process instance - Workflow Service can search for a particular process instance using proper search condition by using the Process Instance (procins) API available in swagger. The Process Instance API can also report the processes that have been created before a specified point of time in the past.
Set recovery mode - Workflow Service can set the recovery mode of a process as enabled or disabled.
Recover a process instance - Workflow Service can recover a process which has encountered an error, such as when a task in the workflow is aborted (from Task Service).
Pause and resume workflow – API is available to pause and resume a workflow. It will not pause and resume the tasks (that were running in task service) associated with the workflow.
Signal workflow – The signal API can be used to send signals to a workflow (i.e. process instance). Signal can have a message payload which is a string. The signal API also takes a signal name and pid.
Delete a process instance – The “remove” API can be used to delete a process instance, so that it no longer appears in the list of process instances.
Show variables – The “showVars” API allows a user to list the variables that were created from the signal’s message payload.
4. Purpose/Usage¶
Workflow Service is primarily used with Task Service for below mentioned activities:
Implement data pipelines where the output of one task is an input of another task
Generates schedules of task execution (both sequential/ parallel)
5. Examples¶
Consider an example of simple data pipeline. There is an input file of format Hierarchical Data Format (HDF5) which needs to be cleaned, then processed and finally observation will be posted to TCUP Data Explorer.
So, first a Task (T1) needs to be created which will be deployed with a script that takes the input HDF5 file and outputs cleaned data. Then another Task (T2) needs to be created which takes the output of T1 as its input. T2 will also be deployed with a script which will process the data and post observation to TCUP Data Explorer.
Once both tasks are set up properly, a process needs to be created to represent the workflow definition with a process name and version combination. After that, permission needs to be granted to the same process name and version. Finally the workflow definition can be initiated to execute the tasks in sequence.
The figure below represents the workflow example stated above:
There could be much more complex workflow design as shown in the figure below. In this case parallel execution of workflow can be utilized:
6. Reference Document¶
For more details about this service please refer the following documents