Manage your test data like a PRO

Creating and managing test data is one of the biggest challenges for the tester. Also according to the IBM 2016 research, test engineers spend almost 30%-60% of their time in producing data. There are many challenges that testers face because of it but some of them are listed below

Some of the most common challenges of Test Data Sourcing are mentioned below:

  • The testing team don’t have access to data sources
  • Delay in gaining access to data sources
  • Delay in access when data source is from a third-party provider
  • The teams don’t have knowledge about the test data source
  • The time and resources needed for large set of data migration from one environment to another
  • The same data source used by multiple teams as both teams might need to to do CURD operations on the same data source
  • Multiple environments and data versions
  • When data in testing env is not same as LIVE env
  • When data gets updated after every hour or day or week and so on

Test Data Management

Although there are test data preparation techniques but choosing one or more than one technique is not as straightforward as its sounds. It needs a strategy to be followed which will help streamline the testing process.

1. Analysis of data

The first and most important step is to understand the test data. A thorough analysis is needed for
* Data location
* Static data
* If dynamic data how frequently data needs to be refreshed
* Data from single source e.g a single DB
* Data from multiple sources e.g data is from APIs, multiple databases and from the third party
* Real-time data
* Format of the data
* Any rules being applied to the data

2. Data generation

After analyzing the data in the first step, we can have a better idea that to which extent we can now generate data. We can make one or more data generation techniques
* Import all the data from the PROD environment
* Import the subset of the data from the PROD environment
* Import data from any other testing environment
* Deletion or update of data in the testing environment might be enough
* Add data in the central repository and then use it
* Do we have to automate the data generation process

In short, generate data that will help in signing off the project meaning most of the test cases already executed in the testing environment.

3. Identify sensitive data and protect it

Masking helps protect sensitive corporate, client, and employee data. It also ensures compliance with government and industry regulations. Masking or de-identifying sensitive data should give a realistic look.

4. Maintenance

Sometimes data needs to be updated at regular intervals. Updated data helps to give optimal results and improve testing efficiencies. Refreshing data at regular intervals will also make test data manageable. But one should keep in mind that obsolete data can also be helpful for negative scenarios.

5. Automation

When running repetitive test cases or running the same test cases on different versions and different environments, then automation is an optimal solution. Automating the comparison of actual results with baseline data is an efficient way to detect an anomaly and get accurate results. These comparisons help save time and identify problems that might go undetected.

For any kind of suggestions, feedback or queries please feel free to contact me on my email address