What is Data Modeling?

Reading Time: 6 minutes

What is Data Modeling?

The method of creating a data store model is called data processing in a database. Data modeling is the process of identifying the entities in our domain, the relationships between these entities and how they will be stored in the database. This introduces theoretical data objects and connections between different data objects. Data modelling is a data formulation process in a standardized format in an information system. It helps to quickly analyze data, which helps to meet business needs. The data modelling process requires data modelers who work correctly with stakeholders and prospective IT users. Data modelling ends with the development of a model of data supporting the infrastructure for the business information system

Understanding Data Modeling / Scope

It occurs at three different layers:

  • Physical model: It is a schema which says how data is stored physically in the database
  • Conceptual model: It is the user view of the data i.e. the high level which the user sees.
  • Logical model: It sits between the Physical model and conceptual model and it represents the data logically, separate from its physical stores.

Hierarchical Data Modeling: These models were used to replace file-based systems. The data was kept in a tree like one too many arrangements.

Relational Data Modeling: It’s true that the hierarchical model helped us to move from file-based systems which reduced complexity but still one had known the specific physical data storage employed. The relational database follows the relational model where data is stored in tables, unlike Hierarchical database where it is stored in a tree-like structure. In short, it reduced the complexity more when compared to the hierarchical model.

How does Data Modeling make work so easy/why should we use it?

It helps us in a visual representation of data and enforces business logic, regulations, policies, etc on data. It is a guide which is used by scientists and analysts in the designing and implementation of a database. So, without data modeling the job of analysts and scientists to implement the business requirements on database becomes difficult.

  • Data modeling is query based, that is, we think of the application workflow and the queries early on in the data model process
  • A Table is how databases stores data and can be thought of as a set of rows and columns
  • After designing the entities (or tables) that we need, we decide how the tables would be related to each other
  • The next thought process is to define the primary and foreign keys for the tables. The primary key uniquely identifies a single record on the table. A foreign key is used to relate to other tables within the data model.
  • For operational databases/transactional systems, the goal of data model is to reduce any duplication so that we don’t store the same data in multiple places.
  • For dimensional models, data warehouses, data marts, etc, the goal is to optimize the select query. Duplication of data is not considered an issue. We do not optimize for writes as writes are considered cheaper than select statements. so, we design the database to solve all the fields that are needed in a select query the BI tools run, the report that gets generated or the model that the data scientists need to work with.
  • One of our goals in data modeling for data warehouses is even data distribution. As the data is dispersed across nodes (servers), the optimal distribution of data so as to query within one node or query disperses equal load across all nodes may be considered based on what database we are working with (Say, it depends on Teradata vs Netezza)
  • Selecting the Primary Key is very important and has a huge impact on query performance

Why do we need Data Modeling? / What can you do with it?

The main goal of using it is:

  • To ensure that all data objects are represented correctly as if it is not done correctly we would get incorrect results.
  • It helps as stated earlier to design database at conceptual, physical and logical levels.
  • It helps to design the relational tables, primary keys, foreign keys, etc.
  • Database developers can create a better physical database with a good model as it becomes a guiding tool for them.
  • It helps to identify missing and redundant data.
  • It helps us to have a better IT infrastructure and to have easy and cheap maintenance when required in the long run though it’s time-consuming initially.

Working

Now let’s create a sample data model to understand how to work with a model. To do this we have to follow certain steps:

  • First we have to understand the requirements, In this case, we will create a model for an online store. So, keeping that in mind we need two tables a) customers b) products
  • Next step is to get the attributes of the tables or entities

a. customer table can have attributes like:

  • Id
  • Name
  • Email
  • Address

b. Product table can have attributes like:

  • Id
  • Name

In the customer table, we can have Id as Primary key and similarly Product Id in Product table will be the primary key.

Now, we will design the relationship between these two tables. So to connect the customer and product table we will create a table called purchase which will be like an order table (i.e. which customer ordered which product).

Who is the right audience for learning this technology?

It is very essential. The right audiences for learning modeling techniques are individuals who are data architects and data analysts. Most individuals start as data analysts and then move up the ladder.

How this technology will help you in career growth?

According to Glassdoor, the average salary in the market for modelers is projected to earn about $78,601 on an average. So you can see that it is a well-paid job. Most big companies invest in modelers as they are very essential for keeping the integrity of data.

Conclusion

In conclusion, we can say that the model created by modelers ensure consistency in naming conventions, integrity, and security of data. because good data will enable the business in the correct efficient utilization of their data.