Creating test data in a database [closed]


Creating test data in a database [closed]



I'm aware of some of the test data generators out there, but most seem to just fill name and address style databases [feel free to correct me].

We have a large integrated and normalised application - e.g. invoices have part numbers linked to stocking tables, customer numbers linked to customer tables, change logs linked to audit information, etc which are obviously difficult to fill randomly. Currently we obfuscate real life data to get test data (but not very well).

What tools\methods do you use to create large volumes of data to test with?




How to Decide to use Database Transactions

1:



ISOLATION levels in Transaction
Where I work we use RedGate Data Generator to generate test data..
Hidden Database Schemas and Open APIs?
Since we work in the banking domain.


database archiving vs timeperiod based tables/fields
When we have to work with nominative data (Credit card numbers, personnal ID, phone numbers) we developed an application that can mask these database fields so we can work with them as real data..
SQL DataType - How to store a year?
I can say with Redgate you can get close to what your real data can look like on a production server since you can customize every field of every table in your BD..
Data Warehouse: Modelling a future schedule


With Cloud Computing increasingly getting popular, will Relational DBs suffer death?


ADO.NET: Need help to understand the basics of 'Dataset'

2:


You can generate data plans with VSTS Database Edition (with the latest 2008 Power tools).

. It includes a Data Generation Wizard which allows automated data generation by pointing to an existing database so you get something that is realistic but contains entirely different data.


3:


I've rolled my own data generator that generates random data conforming to regular expressions.

The basic idea is to use validation rules twice.

First you use them to generate valid random data and then you use them to validate new input in production.

I've stated a rewrite of the utility as it seems like a nice learning project.

It's available at googlecode..


4:


I just completed a project creating 3,500,000+ health insurance claim lines.

Due to HIPPA and PHI restrictions, using even scrubbed real data is a PITA.

I used a tool called Datatect for this (http://www.datatect.com/).. Some of the things I like about this tool:.
  1. Uses ODBC so you can generate data into any ODBC data source.

    I've used this for Oracle, SQL and MS Access databases, flat files, and Excel spreadsheets.

  2. Extensible via VBScript.

    You can write hooks at various parts of the data generation workflow to extend the abilities of the tool.

    I used this feature to "sync up" dependent columns in the database, and to control the frequency distribution of values to align with real world observed frequencies.
  3. Referentially aware.

    When populating foreign key columns, pulls valid keys from parent table.



5:


The Red Gate product is good...but not perfect.. I found that I did better when I wrote my own tools to generate the data.

I use it when I want to generate say Customers...but it's not great if you wanted to simulate randomness that customers might engage in like creating orders...some with one item some with multiple items.. Homegrown tools will provide the most 'realistic' data I think..


6:


Joel also mentioned RedGate in podcast #11.



85 out of 100 based on 75 user ratings 1075 reviews