Author: Joe Reis
(find the original here: https://joereis.substack.com/p/joes-nerdy-rants-5?, We are republishing this blog with permission of Joe Reis.)
“We are not modeling reality, but the way information about reality is processed, by people.” – William Kent (Data and Reality, 1978).
The world of data is so jumbled that you might be forgiven for not being clear on what a “model” is. When you hear the word “model,” what comes to mind? If you work in data, you might be thinking of the canonical stages of data modeling (conceptual, logical, physical), a relational data model, a dimensional model, a machine learning model, a dbt model file, a Django or Rails model file, a Python Pickle model artifact, etc. The word “model” is pervasive in our field, and I think it’s causing data people to cross wires when they talk to each other. A data scientist considers her ML model to be a model. A DBA implements a 3rd normal-form relational data model in a database. An analytics engineering team interacts with various dbt models, which might also be modeled dimensionally, just a bunch of tables or one big table. You get the idea. We’re all modeling, but very often in different ways.
When I ask data professionals from across different modes of operation what they consider a “model,” I get answers like the ones above. Meanwhile, I see the formal practice of data modeling as either unknown or willfully ignored by software and data practitioners across the board. Oftentimes, people complain that formal “data modeling” is too rigid and takes too long. Instead, models are often created ad hoc. We model for what’s in front of us at the expense of the bigger picture. The tradeoff is expediency versus knowledge. I strongly believe we do data modeling whether we’re intentional or not. The lack of a data model is still a data model, albeit a potentially crappy one.
Ironically, in a field that advocates for consistent data governance and management, we struggle to use a simple word like “model” consistently or accurately. This has significant consequences. Instead of using modeling to establish a shared understanding of our organization’s vocabulary, rules, and processes, we have extreme fragmentation in our models. Most models don’t align with each other and only focus on tiny aspects of reality, which leads to a lack of understanding of the bigger picture.
The big picture is what matters. We need to revisit higher-level data modeling, namely conceptual and logical. A shared understanding of data at a high level will help pave the way toward the broad use of consistent and believable models (analytical, ML, application, etc.) across and between organizations.
I’m writing a longer article on this theme, to be published soon. This theme is part of the much broader theme of my ongoing book on resurrecting and revamping data modeling.
Listen to the audio clip above on this topic, which is also my 5 Minute Friday on Spotify.