Grey Geek's Eyes: How to “dig into a large database”?

This posting was prompted by a question on one of the BA forums on Linked In. The original question was:

“So as a business analyst, whats your procedure to begin investigating a very large, unfamiliar database and then draw your understanding?”

I gave an answer in the forum, but here is an expansion, and some ponderings.

First of all, let’s set out the terms of reference. The question asks about a “large unfamiliar” database. I think we can assume that “unfamiliar” is “one that we haven’t encountered before”, but what is “LARGE”? To me “large” could be:

Lots of tables
Many terror-bytes ;-)
Lots of transactions
Lots of users
There may be other interpretations

I’m going to go with “Lots of tables” with the definition or “lots of” being:

“more than I can conveniently hold in my head at one time”

I've also assumed that we are working with a "transactional database" rather than a "data warehouse".

Preparation

Gilian, the questioner was given some good suggestions, which I summarised as "Collecting Information" or perhaps “Preparation”:

Understand objectives of "The Business"
Understand the objectives of "This Project" (Digging into the Database)
Collect relevant Organisation charts and find out who is responsible for doing what
Collect relevant Process Models for the business processes which use the database
Get hold of, generate, or otherwise create a data model (Entity Relationship Diagram or similar)

Of these, the one which is specific to working with a Database is the ERD. Having a diagram is an enormous help in visualising how the bits of the database interact.

Chunking

For me, the next step is to divide the model into "chunks" containing groups of entities (or tables). This allows you to:

Focus - on one chunk
Prioritise - one chunk is more important, interesting or will be done before, another
Estimate - chunks are different sizes
Delegate - you do that chunk, I'll do this one
And generally "Manage" the work; do whatever are the project objectives.

I would use several techniques to divide the database or model up into chunks. These techniques work equally well with logical and physical data models. It can be quite a lot of work if you have a large model. None of the techniques are particularly complicated, but they are a little tricky to explain in words.

Here is a list of techniques:

Layering
Group around Focal Entities
Process Impact Groups
Realigning

Organise the Data Model

I cannot over-emphasis how important it is to have a well-laid out diagram. Some tools do it well, some do it less well. My preference is to have “independent things” at the top.

I’ve invented a business.

We take ORDERs from CUSTOMERs.
Each ORDER consists of one or more ORDER_LINES and each line is for a PRODUCT.
We Deliver what the customer wants as DELIVERY CONSIGNMENTS.
Each CONSIGNMENT contains one or more Batches of product (I’ve haven’t got a snappy name for that).
We know where to take the consignment by magic, because we don’t have an Address for the Customer!
We reconcile quantities delivered against quantities ordered, because we sometimes have to split an order across several deliveries.
That’s it!

Layering

"Layering" involves classifying the entities or groups of entities as being about:

Classifications
Things
Transactions
Reconciliations

Things

Let’s start with “Things”. Things are can be concrete or they can be abstract. We usually record a “Thing” because it is useful in doing our business. Examples of Things are:

People
Organisations
Products
Places
Organisation Units (within our organisation, or somebody elses)

Classifications

Every business has endless ways of classifying “Things” or organising them into hierarchies. I just think of them as fancy attributes of the “Things” unless I’m studying them in their own right.

Note: “Transactions” can have classifications too (in fact almost anything can and does), I’ve just omitted them from the diagram!

Note: The same structure of “Classification” can apply to more than one thing. This makes sense if, for example, the classification is a hierarchy of “geographic area”. Put it in an arbitrary place, note that it belongs in other places as well, and move on!

Transactions

Transactions are what the business is really interested in. They are often the focus of Business Processes.

Order
Delivery
Booking

Where there are parts of Transactions (eg Order_Line) keep the child with the parent.

Reconciliations

Reconciliations" (between Transactions) occur when something is “checked against something else”. In this case we are recording that “6 widgits have been ordered” and that “3 (or 6) have been delivered”.

If you use these “layers”, arranged as in the diagram, you will very likely find that the "One-to-manys" point from the top (one) down (many) the page.

Groups around Focal Entities

To do this, pick an entity which is a “Thing” or a “Transaction” then bring together the entities which describe it, or give more detail about it. Draw a line round it, give it a name, even if only in your head!

"Customer and associated classifications" and
"Order and Order_line" are candidate groups.

Process Impact Groups

To create a "Process Impact Group"

Select a business process
Draw lines around the entities which it: creates, updates and refers to as part of doing its work.
You should get a sort of contour map on the data model.

In my example the processes are:

Place Order
Assemble Delivery Consignment
Confirm Delivery (has taken place)

It is normal for there to be similarities between “Process Impact Groups” and “Focal Entity Groups”. In fact, it would be unusual if there were not similarities!

Realigning

Try moving parts under headers (so, Order_line under Order) and reconciliations under the transaction which causes them. In the diagram, I’ve moved “Delivered Order Line” under “Delivery”, because it’s created by “Delivery related processes” rather than when the Order is created.

Finally, “Chunking”

Based on the insights you have gained from the above, draw a boundary around your "chunks".

The various techniques are mutually supportive, not mutually exclusive. The chunks are of arbitrary size. If it is useful, you can:

combine neighbouring chunks together or
you can use the techniques (especially "Focal entities" and "Process Entity Groups") to break them down until you reach a single table/entity.

Tools

My preferred tools for doing this are: a quiet conference room, late at night; the largest whiteboard I can find; lots of sticky Post-its or file cards (several colours); a pack of whiteboard pens; black coffee on tap and the prospect of beer when I’ve finished. For a large database (several hundred tables) it can take several days!

Once you've done all this, then all you have to do, is do the work!

I hope you have as much fun doing it as I had writing about it! J

Thursday, 26 September 2013

How to “dig into a large database”?