Why QA Must Have Its Own Databases

  August 15, 2011

Too many test departments rely on the kindness of outsiders for one essential artifact of a test cycle: databases. It's time to have your own databases, ones that you control. It's right for the organization, right for quality assurance, and a boost to your career. QA-owned databases are also more affordable than in the past, and the pay-offs are bigger. Here's what you need to know as a test specialist or developer.

Starting a software QA cycle? Standard references tell you that you need to control the test plan, hardware, operating system, application, test scripts, and result reports.

As widely-accepted as this checklist is, it leaves "data" invisible. That's a mistake, but one you can correct.

Three reasons data emerge from the QA shadows


First, let's be clear about the nature of the mistake. "Data" have only been hidden from most test plans, not entirely missing. They're often tucked away inside individual test scripts, or implicit in the description of the operating system. Data show up in clauses such as "create two different accounts with identical names but different user IDs" or "At the start of the test, open a connection to DB1/test_user7."

For historically legitimate reasons, this was probably appropriate. Most organizations have been better off with data maintained outside the development and testing hierarchies. Nowadays, though:

    • Application functionality is more data-driven,


    • It's far more feasible for QA to take responsibility for its own data sources, and


    • It's increasingly less practical to rely on a data division of your own organization to supply you with data views.



While data teams continue to think in terms of “views” from live databases with appropriately-restricted security, those same data teams simply don’t have the slack time to construct views for testing, unless on a chargeback basis.



"Data-driven functionality" recognizes that interesting applications today exhibit behavior that emerges from subtle, large-scale data combinations. QA's responsibility is no longer limited to validation of an algorithm at 0, 1, and a few interior points, or to check that application response looks smooth when one particular data dimension varies.

In an era of Big Data, we must verify highly non-linear and subtle responses. Casual descriptions that fit in footnotes of test scripts simply are no longer adequate. Neither is it enough to assume that a backup database — last quarter's image, for instance — is a good model for a rigorous testing program.

Now is the time for data to be fully visible and explicitly specified. Just as we carefully describe that a test plan will be executed on specific hardware, with a specific memory configuration, loaded with a specific operating system release and service-pack update, we need our data source to be fully replicable. This is likely to mean loading a specific database schema with a precise image of hundreds of megabytes, or perhaps gigabytes, of test data.

There's good news, though: QA can take on this responsibility. In the past, it was prohibitively expensive. The cost of entry to a commercial relational database management system often started with several tens of thousands of dollars of licensing fees, out of balance with a test cycle that might span only a few days a year of direct use. The expenses of training and dedicated hardware only amplified this mismatch. Now a training department can generally license an enterprise-class database for little or no cash outlay.

Compliance and accountability


Even if organizations managed themselves by old rules, and the data department or database administrators swiftly responded to QA's requests for the database views and updates QA’s needs, the data department probably can't operate as inexpensively as testing can on its own behalf.

That's probably just as well, too, because the data team doesn't operate under old rules. The complexities of commercial licensing combined with progressively more finely-parsed accounting rules could raise the price the data department charges for use of the information it maintains.

Even more momentously, the focus of the data department now needs to be on compliance. Out-of-pocket charges for database instances or views are only a fraction of the expense involved in managing and monitoring those views. Contemporary data managers are more likely to be reading legal statutes than SQL standards. At a time when what we see as "syntax errors" can result in felony charges, casual co-operation between departments becomes impossible.

As a tester, this is an ideal occasion for you to bring data "into the light." Make data sources explicit in your plans. Account for their costs. Improve your testing, as the data sources you and your team craft fit the test plans better than the "hand-me-down" tables and views cobbled together for QA by the data department or software development team. Make your results more replicable and efficient with explicit control over your data sources. And stay in touch with me as I write about other aspects of database testing over the coming months.