Overview

Synthetic test data are the future. Sixpack will help you get there quickly.

Synthetic test data

Synthetic test data are data created using various algorithms and used to test one or more systems. In many cases advanced business rules depend specifically on data and therefore getting test data right is one of the main goals of any quality assurance practice whether it is done by testers or developers.

Test data may be available externally and testers then use them to enter data in the systems being tested. Or test data may be made already available in the test systems in which case the challenge is how to get them there efficiently.

The old way

For long, software was often one signle piece doing everything. This is what we call a monolithic system or monolith. In that case data are most probably stored in one database. So to get test data an easy way is to copy the production database.

The old way has already many issues, of which following are the most important:

Size: Production databases are too large and copying them is not efficient, some downsizing is required and introduces additional complexity.
Data protection: Production data are often sensitive and cannot be used for testing, anonymization is required is often very costly, whether it is done with a specialized tool or using custom scripts.

End of an era

The old way was already a challenge, but it is even more now for several reasons:

Legal frameworks are stricter in terms of data privacy and data protection
Systems are more and more distributed hence it is no more as easy as copying one database
With cloud computing, and especially cloud applications (software as a service) there is often no more access to the database at all, data might be stored using proprietary undisclosed techniques.
Test automation has brought the possibility to run many tests in parallel, therefore it is no more possible to use only one dataset for all tests, datasets must be reproducible fast.

Synthetic test data management is all about that:

Generate test data
Provision them to the systems under test
Allocate them to users (testers, developers, automated tests)
Clean them up when not needed anymore

What is Sixpack

Sixpack solves all the above problems at once by taking an innovative approach. It's not a revolution in itself, rather it's a combination of best-practices packaged as platform that frames your tests data management to become trivial, scalable and sustainable.

Core concepts

The core concepts of Sixpack are:

Make test data available immediately by being produced up-front
Make test data available in a consistent way in systems under test
Decentralise the business logic to the same teams that maintain the business logic
Reduce the need of additional configuration to a strict 0.

Test data catalogue

A test data catalogue is a structured tree of possible datasets with following navigation:

Environment - e.g. DEV, TEST, UAT,
Supplier - e.g. team name, project name, application name,
Item - e.g. dataset name, dataset type.

Items can have a configuration, that is attributes of the datasets such as country, language, currency, etc.

Any user or any automated test can request datasets within the catalogue. Sixpack delivers the datasets back immediately and ensures that each dataset is delivered only once.

Suppliers

Suppliers are just logical groups of Generators provided by the same team or application and managed as a set.

Generator

Generators are tiny scripts written by anybody. A strong benefit is to agree with development teams to write generators for systems they also maintain because for them it's trivial to know the structure of data in their systems. To simplify the task to maximum, generators can be written in any language and can be bundled with the systems.

Generators receive requests carrying a wanted configuration, based on that generate test data and provision them to the system instance on the given test environment. Then they return the response that carries a basic reference of the test data, e.g. username of a test customer.

Generators may or not have additional features such as cleaning test data after some time.

Sixpack

Sixpack ensure orchestration of all the Generators and of the requests / responses. It exposes everything on a portal and also via an API to allow automated tests to also consume test datasets.

While the business logic of how test data are generated and provisioned to systems under test is kept with the teams, a lot of complexity is centralized in Sixpack and hidden. This includes issues like:

Providing a self-service portal usable also by non-IT personnel
Stocking test datasets upfront and allocating them to users just in time
Invalidating test datasets in case structure of test data inputs/output has changed
Ensuring cleanup procedures are called after some time

Benefits recapitulation

Based on the concept above all the issues are solved by design.

Data protection

Data are fully synthetic with no need to access the production database at all. There is no discussion any more.

Test data consistency

By orchestrating the Generators mimicking a real business flow, Sixpack ensures that test data are consistent across all the systems under test.

Test data availability

Test data are available immediately because pre-generated and stocked. This comes handy for both issues:

If test data generation takes time, it's no more an issue
If some systems are down on the test environment, test data are still available which might be sufficient to test other systems

Sustainability

By decentralising the logic of the Generators to the same teams that actually maintain systems, the effort to maintain test data may be trivially bound to the standard software development lifecycle. As opposed to any other approach (e.g. testers adjusting their test data scripts once developers have deployed a new version of the system on the UAT environment), the test data definition follows the same lifecycle as the systems under test.

Scalability

With 0 configuration required there is no limit to the number of Generators. Either it exists or not, everything else is fully automated.

Democratisation

To get test data, one does not need to understand the business process to create them. So anybody in the organisation can just take test data immediately without any onboarding.

Additional benefits

Transformation driver

In case the organisation has a goal to get rid of production data on nonproduction environments, Sixpack approach is the way to go progressively. It is easy to start with one test data item (entity) and add others progressively as per the needs. Costs will be low because it will be driven by the needs.

Advanced features beyond

Test data loading

A specific optional feature of generators is to produce a continuous flow of transactional data. In that use case Sixpack provides signalisation and telemetry: switch on/off, monitoring and statistics. Since generators reside in client's infrastructure it is very easy to use Sixpack as a load testing tool.

Synthetic monitoring

Synthetic monitoring has been there for years as well with specialised tools. It is based on mimicking user behaviour and assessing success or failure and possibly other metrics of systems in production. Sixpack by design provides 90% of tne required features and the missing advanced telemetry is usually already in place.

Overview

Synthetic test data​

The old way​

End of an era​

What is Sixpack​

Core concepts​

Test data catalogue​

Suppliers​

Generator​

Sixpack​

Benefits recapitulation​

Data protection​

Test data consistency​

Test data availability​

Sustainability​

Scalability​

Democratisation​

Additional benefits​

Transformation driver​

Advanced features beyond​

Test data loading​

Synthetic monitoring​