There are many definitions for Data Architecture and Data Modeling. Major evolution occurred in the domain of data management these last few decades, and so did the definition of those terms.
Some of us started our journey in data management using COBOL, which was created in 1959, and was one of the first computer languages that introduced the concept or rows and columns.
Starting in 1970 the new concept of data normalization was introduced and refined in the following years.
There are many articles and books about data normalization, and as an architect it should be second nature to follow at least the 3rd Normal Form.
Currently the most common definition for Normalization, Data Architecture and Data Modeling are (Using ChatGPT):
"Normalization is a technique used to reduce data redundancy and improve data integrity by organizing data into tables and establishing relationships between them."
"Data architecture is a high-level approach that deals with the overall design of data structures, systems, and processes. It defines the standards, policies, and practices that govern the collection, storage, retrieval, and use of data in an organization. Data architecture aims to ensure that data is managed in a consistent and coherent manner across the organization."
"Data modeling, on the other hand, is a more detailed approach that focuses on creating specific representations of data structures and relationships. It involves the creation of data models, which are diagrams or visual representations of the data elements, their attributes, and the relationships between them. Data modeling helps to clarify the meaning and relationships of data elements, and enables the creation of more accurate and efficient databases".
Another set of definitions exists for Conceptual, Logical and Physical data models. And again we can not find a single precise definitions of those terms. So here is what I would consider the most up to date definition (Using ChatGPT):
Conceptual, logical, and physical data models are different levels of abstraction used to represent data and information in a structured way.
Conceptual Data Model: A conceptual data model represents high-level business concepts and their relationships, without getting into the technical details of implementation. It is a conceptualization of the real-world entities and relationships between them. It describes the overall structure of the data, including entities, attributes, and relationships between them, without going into too much detail about the database structure.
Logical Data Model: A logical data model is a detailed representation of the data and how it is organized in a particular database system. It defines the structure of data elements and their relationships, such as entities, attributes, and relationships between them, and also includes data constraints and business rules. It is a representation of data that is independent of any particular database management system or hardware implementation.
Physical Data Model: A physical data model specifies the physical implementation of a database on a particular hardware platform, including storage structures, access methods, and security constraints. It represents the actual data structure and how it is stored in the database management system. It includes details such as table structures, indexes, keys, and data types. The physical data model is optimized for performance and storage efficiency, and is specific to a particular database management system and hardware platform.
Up to this point, it is all theories and definitions. As a long time architects I used to use Microsoft Visio and some desktop database documentation tool to create the Conceptual and Logical/Physical models.It was always very difficult to share diagrams and documentation and to not be concerned about which version of the models are out there. This became even harder in the enterprise worlds, when several hundred architects, engineers and product managers had to cooperate together on multiple projects and applications.
Starting first as an example of a FastCGI web app, than moved out to its own github repo, I created
DataWharf, an enterprise grade web application, that can be used to created Conceptual, Logical and Physical models.
Initially it was a simple data dictionary tool with some visualization, but then with help from Thomas Goldschmidt, it became a full modeling tool with even more advanced visualization and Single Sign On support.
DataWharf has two major sections: Projects and Applications.
A Project in DataWharf, is collection of Models, conceptual and/or logical. Projects could be used before applications even exists. Usually a team of architects would be creating one or more models to design an entire project.
The Models are made of Entities, Associations, Attributes, Packages, Datatypes, ... and many of those concepts can be renamed at the level of a project.
Custom access rights can be managed at the level of projects, and most components of a model can have an unlimited number of custom fields as well.
When using DataWharf to document itself; the following is a physical model of the core tables used to support projects:
And the following is a conceptual model of projects:
The grey zone of conceptual and logical modeling; where one ends and the other begins?
Some people state that a conceptual model should not have defined attributes.
Others that logical models should already represent many to many relationships with actual entities.
DataWharf will not force the level of details or lack of in your models. Basically models are to be used to express your intent of your data structure.
For seasoned architects, models are used to simplify a physical model, to help during training and to explain an application, and therefore are sometimes created after the physical tables already exists.
Sometimes models are used during the exploration of what an applications should do, but is that case maintaining the model as the application grows is recommended.
An Application in DataWharf, is in between a logical and physical model for a single database, represented as an actual Data Dictionary.
But unlike most data dictionaries, the ones defined in DataWharf are backend independent, hence the "logical model like" behavior.
The data dictionary is defined with its own column types, that can be converted for PostgreSQL, MySQL, MariaDB and MS SQL (currently).
Most data types will work for any backends. Some data types will be converted to strings if not supported natively, and to not restrict an application's implementation, some field types can be PostgreSQL specific, like arrays.
By the way, using array column types violates the 1st Normal Form (Normalization).
An entire section of this article will describe the mapping for DataWharf column types to all supported backends.
Another concept DataWharf supports are Namespaces, equivalent to "Schema" in PostgreSQL and MS SQL. MariaDB and MySQL do not include such concept and therefore if multiple Namespaces exists, their names will be prepended to the table names.
When using DataWharf to document itself; the following is a physical model of the core tables used to support data dictionaries:
Custom access rights can be managed at the level of applications, and most components of a data dictionary can have an unlimited number of custom fields as well.
Deployments can also be recorded for each Data Dictionary (Application), making it easier to compare the specification of a data dictionary with the actual implementation of them.
For example, an Application could have a Develop, Beta, QA and Production installation.