How can we provide test data that safeguards privacy in the Norwegian Labour and Welfare Administration? The answer lies in artificial intelligence and synthetic test data.
The original article was published on our Norwegian blog in 2019.
For the Norwegian Labour and Welfare Administration (abbreviation in Norwegian: NAV) it is crucial to test computer systems with as realistic data as possible. For a long time, realistic test data has been the very basis of the development of IT solutions in this organisation.
In many instances, masking or anonymisation of production data has been tested without finding a solution that fits the goal. By that, we mean that the data produced does not have the same degree of dissemination as the real data and that personal data could leak out through uncertain anonymisation techniques.
These techniques were, therefore, not sustainable and one had to look for a new solution for test data.
A new technique based on artificial intelligence combined with a value chain approach (versus database approach) has proven to be the success recipe for providing test data that safeguards privacy. All new developments are now taking place on synthetic test data.
The value chain approach became the solution
The Norwegian Labour and Welfare Administration had 1,200 database tables in their system Arena. In order to create synthetic test data, an overview of these tables was needed. Therefore, we looked at where the data comes in, where the data goes out and what we had to add to be able to incorporate machine learning.
Using a value chain approach means creating artificial data in the interfaces between systems, rather than far down in databases. This approach brings with it several practical benefits:
- There are fewer interfaces than database tables. This leads to less work, as there are fewer links to keep track of.
- Data comes in exactly the same way as real data. This allows us to reuse data flows. The applications also handle the business logic that allows us to get consistency in our data.
- When we use existing value chains for data into the system, we automatically get the value chains out of the current system. Distribution mechanisms work out-of-the-box.
A value chain approach, therefore, became the solution for creating an overview.
Artificial intelligence in the centre
The core of this solution consists of artificial intelligence provided by us in Visma.
This kernel creates data that “mimics” the production data. This means that the machine-learned models generate synthetic test data with the same characteristics as the original dataset.
This data covers the necessary proliferation for testing and development of IT systems by keeping the data as close to real data as possible.
If you should not be satisfied with this data, we have also created a self-service solution for generating your own data. The solution allows anyone to create and customise synthetic data according to their needs, by adding different properties. For example, if you want to test an unusual situation, you can create test data for that situation and then run a test on it.
What about privacy in the solution?
The test data produced through this artificial intelligence model is completely safe and impossible to trace back to individuals. The actual process of retrieving training data for artificial intelligence is quality assured as follows:
- All directly identifying properties are removed (such as birth number and name).
- A qualitative analysis of the rest of the data sets is performed, where statistical abnormalities are deleted.
- We are now left with anonymous data that cannot be linked back to any individuals. This data is used to train artificial intelligence.
What are the results of this project?
So what results did we see after working on this project? Before the project, the Norwegian Labour and Welfare Administration spent a great deal of time and resources creating test data.
Now, this process has become a lot more efficient as you are able to generate up to 10,000 synthetic people profiles in just a few minutes.
In addition to saving time and resources, this solution is also completely secure and anonymous. It is completely impossible to trace data back to individuals. This ensures that individuals’ privacy is preserved and that data can be safely tested.
Curious to learn more about how we work with artificial intelligence to make work processes more effective and create better solutions for a wide range of companies and organisations? Visit our Technology category.