Data Model and Structure

Understanding Data Models and what models are useful is important.

Knowing where they are not useful is also Important.

The Enterprise Data Architect is always - rightly - associated with data models.

People know they should have a data model and that they can solve all the problems of the world, but often don't know what to do with them when they have them.

I love data models, I've been doing them for years, and I've got a good sense of what they are, and how to apply them.

As with all things data this is a big topic so I will keep returning to it and adding more content over time.

What is a Data Model?

A data model is a way of describing and organising data so that you understand what the data actually is and how it relates to other pieces of data.

I love data models and I'm planning to provide a lot of insights into data modelling, however, I won't be going into the detail of normalisation or data mapping. There's material elsewhere on the web that can help you take the insights here and develop them further.

Data Objects or Entities

You are typically modelling 'things'/nouns from the area you are focussing on. Verbs are typically from processes, though there are exceptions.

So a Customer or an Account or a Product or an Employee are typical examples of data objects that you might want to model.

These objects (or entities) need to have a definition; 'An account is a financial relationship between the organisation and a customer' for example.

They have relationships too.

The Account object sits between a customer object and an organisation object and has a relationship between the two of them.

They can have attributes too; depending on the level of model you are creating each of the data objects will have one or more attributes e.g. the customer has a name, and age, the account a balance and a credit limit etc

Levelling and Granularity

The level of detail you go to is a key consideration when constructing, reading or using a data model.

Typically people describe there as being 3 levels of data model, they have quite formal sounding names but the ideas behind them are quite simple and, in reality, there's a lot of flex within and across each category. They are:

Conceptual Data Model

Logical Data Model

Physical Data Model

Conceptual Data Model

A conceptual data model is a high-level overview of the data an organization needs to run its business. It shows the different types of data, their relationships to each other, and what they're used for.

They can be really high level, 4 or 5 of the most important data objects and roughly how they relate, or they can be quite expansive and contain many data objects with some detail on how they relate.

They can be used to communicate data needs to others, make better decisions about data, and ensure that data is used consistently.

They are really useful to start to define the scope of an initiative, and to start to hone the requirements.

Logical Data Model

The logical model is one of the most useful items in an Enterprise Data Architect's armoury.

They come in different levels and granularities depending on what you are trying to achieve and your understanding of what it is you are trying to model.

Two notions begin to become important when you start modelling logically;

Uniqueness - where each instance of your data objects has to be able to be identified uniquely e.g. Customer Name (there are obvious flaw with using this) or Account Number.
The relations between the objects starts to be analysed. The aim is to gradually get them to a point where everything is in the model only once and has 1-to-many relationships with the other data objects with which it interacts. 1-to-many? One bank can issue many accounts. A customer can have more than one account but (in this example) an account is only owned by one customer and is issued by one bank.

I'll explain why this is important shortly

Logical Data Model with Attributes

As you understand the domain better it's possible to add 'attributes' to your data objects (or entities); these are features of the entity that add more information about that entity- so a customer has a name attribute, a bank has a sort code etc.

Adding the attributes really helps to understand if you have the right entities and the right relationships. Remember the aim is for instances of each entity to be uniquely identifiable. So in the example above Product Name is found in both the Product and in the Account. This doesn't look right. The Product Name belongs in the Product and should be accessible to the account via the relationship between the two entities; e.g. account 1234 has only one product; product id 4566, what is the name of the product for the account, well it's the name on the Product identified with the ID 4566.

Why 1-Many ... Normalisation

It's not obvious why 1-to-many should be so important, but it is.

Holding a data item once and relating it to other items that are also held just once mean you have the data in a structure where you can ask the structure any question and get the right answer. Because everything is held just once, you don't get duplication that you can get when data is held in more than one place....I know this sounds unlikely but it works.

If you have £100 in a bank account, and that bank account exists across three different data objects then if you ask how much money do I have across all 3 of the accounts (all pointing to the £100) then you get £300.

If account is there just the once, you get the right answer.

Another reason for normalising is managing updates.

In the above example, if the balance of the account is held in 3 places, then if you want to update the balance and add £100, you've got to do it in 3 places.

That's extra processing and time.

What happens if your programming misses one of the locations?

You now have 2 places with a balance of £200 and one place with £100.

When you come to query the balance, you have 2 different numbers, which is the real balance?

If it was held in one place, it would only need to be updated once in one place, there wouldn't be a question about what the value of the balance was, it would be £200 just held once.

There are other reasons for normalising (Database Systems by C.J. Date is genuinely great and authoritative) but for Enterprise, modelling the data so you get the right answer from any angle is the most important. If you understand the data like that, then you can choose how to implement it in your BI and your integrations, you can bring it to bear on your Master Data Management, your data governance and your data quality.

Physical Data Model

The physical data model takes the logical model into something that can be implemented in a system. In a database system it will include denormailisations (delberate duplications to improve performance), it will include data types, it will include physical implementation like tablespaces (places where similar tables are held together) and indexes (things that improve query performance) and other detailed implementation information. A good physical data model can often be directly turned into an implementation.