• Introduction

Test data plays an important role in successful completion of test. Test teams not only have to follow exact test methodologies, but also ensure accuracy of data to correctly reflect production situations, both functionally and technically.

Test data scope starts with receiving the request for test data and criteria from the testing team and ends with delivering the data. This includes setting up the data environment, loading test data from production environment or creation of test data (these vary from project to project). Test data is a critical need for Functional, Integration, System, Acceptance, Automation and Performance Testing. This depends on the agreement among the various stake holders.

A well-defined Test data management strategy can rapidly reduce inefficiencies, help extract greater value from expensive data and make validated test data available in an organized, secured, consistent and controlled manner.

The exact nature or representation of the production data needs to be understood through data profiling and discussions with business analysts. This will help understand what makes the data valuable.

Some key pieces of information shall be focused during this process:

  • Domain values: The full range of valid and meaningful values for a data field
  • Data ranges and limits: Especially those that define our equivalence classes
  • Data relationships: Data characteristics including cross- system data mappings and sources for derived or calculated data
  • Upstream and downstream: Data dependencies from upstream and downstream systems

The test data management team should understand how the above mentioned vital features are consumed by the business users.

The type of testing also impacts the test data requirement management. For example, an automation testing scenario requires highly stable, predictable data sets whereas manual testing can afford some variability. Performance tests usually require test data either represented or sampled from the production environment to replicate the real production scenario.

The following are the important activities that are performed as part of Test Data Management:

  • Initial setup of test data: This is a one time job which requires initial setup and synchronization of test data.
  • Service projects for test data requirements: Provide test data to projects based on the requests received. Projects raising data requests may be either new project which require the data to be created from scratch or can be maintenance projects where the data needs to be refreshed.
  • Continuous support to projects for test data requirements: Maintenance of test data based on requests sent by the projects. Like simple data creation request for change in data requirements, requests related to rectifying problems related to test data delivered.
  • Maintenance of data: Scheduled maintenance of test data beds in defined frequencies (weekly, monthly, quarterly or annually)

 Test Data Management Characteristics

A test data management elements are helpful not only to organize information about test data, but also to maintain this information over time.  The following data characteristics or attributes are critical to build the framework:

  • Data classification
  • Data sources
  • Data selection

Data Classification

Data classification has to be considered as an important parameter for an effective test data management.  Data classification includes the following:

  • Environmental data
  • Baseline data
  • Input data

Environmental data defines the application operational environment and is a foundational component of the test effort since it establishes our execution context. Environmental data includes:

  • System configuration
  • User authorization, authentication, and credentials
  • Configuration options

Baseline data has two fundamental purposes – to establish a meaningful starting point for testing and to establish a set of expected results. The initial baseline is set from test case pre-requisites from reliable data which helps in an ideal environment to attain the expected results. This shall help the test team reduce the efforts through automation for result evaluation process against an expected result baseline.

Input data is the data entered into the system under test to evaluate how it responds to the provided input. Observed behaviour establishes the actual results which must then be compared to expected results to determine the correctness of the behavior. Input data is typically a component of the test case itself.

Data Sources

The following are the major sources of test data:

  • Production data
  • Derivatives of production data
  • Simulated data

Production data is rarely used for testing given the risk of data security concerns and regulatory compliance requirements.   In some scenarios when it is inevitable that production data must be used for the test management the test teams should be aware of the sensitivity of the data with which they are working.

Derivatives of production data sets help to maintain the production like characteristics. A proper sanitization needs to be applied on such data to minimize the risk of security breaches.

Simulated data is useful when there is limited or no production data available. The simulated data fits to unit testing where less effort is required to create the test data.

Data Selection

All the possible scenarios need to be considered in the preparation of the test data.   Any scenarios not considered in data preparation will impact the quality of the test data used for testing.  Sample scenarios that will affect the data selection criteria are:

  • Positive and negative testing scenarios
  • Possible overlaps and redundancies
  • Statistical characteristics of the data
  • Default conditions
  • Cross-project dependencies

If you would like more information on Agile, DevOps or Software Testing, please visit my Software Testing Blog or my Software Testing YouTube Channel.