Common
Common behaviour for all SDKs.
Basic idea
Upon deploying generators, the catalogue builds itself dynamically. The generators are deployed as microservices and register themselves with Sixpack. The Sixpack then uses the catalogue to route tasks to the right generator.
In fact generators are of two types:
- Generator - a component that generates and inserts data in a system
- Orchestrator - a component manages a more complex process of data generation, often involving multiple generators and other orchestrators.
Generators and Orchestrators are deployed in a set called Supplier.
Catalogue structure
The Sixpack 3-level hierarchical structure is built as follows:
- Environment - This is a configuration property of a Supplier
- Supplier - Component often deployed as a microservice managing its Generators and Orchestrators
- Item - Dataset types represented by Generators and Orchestrators
Supplier lifecycle
A Supplier, when started, does the following:
- Analyses capabilities of generators and orchestrators and builds a manifest representing a partial catalogue
- Registers with the Sixpack by sending the manifest
- Listens for incoming tasks from Sixpack
Sixpack SDK does all that automatically, there is very little to do for the developer except coding the Generators and Orchestrators.
Registration
Registration triggers following actions in Sixpack:
- The catalogue is modified to include the Supplier's capabilities that is all the Generators and Orchestrators managed.
- In case some datasets are required to be pre-stocked, Sixpack immediately initiates tasks for their creation and provisioning.
- In case the definition of some Generator or Orchestrator changes, Sixpack evaluates if already available datasets are compatible and if not, initiates their cleanup and removal from stock.
The new catalogue is available immediately to all users.
During the registration, Sixpack also checks for compatibility with the SDK version used.
- In case of deprecated SDK, warnings are printed in the log.
- In case of incompatibility the Supplier will not start at all with an error printed in the log.
Task handling
Sixpack sends tasks to the Supplier. The Supplier automatically routes tasks to the right method of the right Generator or Orchestrator.
Sixpack handles retries and failures in an optimised way. Some customisations are possible but not documented so far.
State handling
Because of the retries and distributed nature of the whole solution, state management is crucial. Sixpack simplifies that massively.
Generators
Sixpack considers Generator methods as idempotent. This means that in case a Supplier is killed while some Generator task is in progress, the task will be restarted when the Supplier will become available again, provided that the Generator is still defined.
To allow Sixpack handling failures coming from state inconsistency faster, a generator can throw a non-retryable exception.
Generators may have some stateful components, like a database connection. The stateful components should be managed by the Generator itself. Sixpack does not provide any support for that.
Orchestrators
Orchestrators are fully stateless by design and can be killed any time. Sixpack will ensure execution continues where it was left-off upon restarting the Supplier, provided that the Orchestrator is still defined.
For technical reasons, in case the code of the orchestrator changes structurally and some dataset handled by the orchestrator is in middle of the processing, the dataset may fail. This depends on how far the processing has gone as compared to the location of structural changes in the code.
Retry strategies
Sixpack retries every task several times before considering the task as permanently failing. This applies to Generators and Orchestrators. Because Orchestrators may involve a rich tree of Generators and other Orchestrators, the retry strategy may result in complex retry patterns. This is all managed by Sixpack automatically and Generator and Orchestrator developers do not need to care too much about that.
However, to achieve an optimal behaviour it is recommended to:
- Develop Generators idempotent
- When using random values, verify in the target systems if uniqueness constrains are not violated
- When uniqueness or other fatal constraints are violated, throw a non-retryable exception
Constrain violation
When trying to create a dataset that violates some constraint that comes from received input, the generator should throw a non-retryable exception. Sixpack will then not retry the task and will mark the dataset as failed. In case the generator is included in an Orchestrator flow, the whole flow will be retried from the beginning with high chances that this time, the constraint will not be violated anymore.
For example:
- Generator A generates random email
- Generator B creates a customer record in the CRM with email as input
- Orchestrator C first calls A then B
In case B fails because the email is already in use, B should throw a non-retryable exception. Orchestrator C will then retry the whole flow, this time with a new email generated by A.
Note: A better strategy (that is not always feasible) could be Generator A to be able accessing the CRM system to check if the email is already in use. This would allow to avoid the exception and the retry.