Have We Been Getting Football Data Wrong?
- Antti Nyman

- Jan 24
- 5 min read

There's an interesting tension in modern football analytics. We've spent the last decade obsessed with building bigger databases, tracking every touch across every league, searching for universal truths about what makes teams successful. Expected goals, pressing intensity, progressive carries have become the lingua franca of the sport.
But what if this entire approach has been a productive wrong turn?
The premise sounds heretical in 2025, when even modest clubs employ data scientists and xG is discussed in post-match interviews. Yet consider what we've actually achieved. Despite unprecedented data availability, the predictive power of our models remains stubbornly limited. Underdogs still win. Tactics that "shouldn't work" succeed anyway.
But what if this entire approach has been a productive wrong turn? Perhaps the problem isn't that we need more data. Maybe it's that we're looking for the wrong kind of patterns.
The Seduction of Scale
Big data's promise in football has always been alluring: collect enough information about enough matches, and universal principles will emerge. Track 10,000 passes and you'll understand passing. Watch 500 teams press and you'll decode pressing. The assumption is that football operates like physics, with discoverable laws that apply consistently across contexts.
But football isn't physics. It's closer to a conversation, deeply contextual, interpersonal, improvisational. When a team's front three press in a certain way, it works not because the geometric angles are optimal in some abstract sense, but because those three players have developed an intuitive understanding of each other's movements, strengths, and limitations. They've learned when one will gamble and when one will hold. This knowledge is hyper-local, non-transferable, and largely invisible to external data collection.
The Local Knowledge Problem
Consider a manager who knows that his left-back plays more conservatively after making a defensive error. Or that his striker's pressing intensity drops in the second half of matches following long travel. Or that two midfielders have a personality clash that subtly degrades their combination play. Or that a particular player performs dramatically better when given freedom rather than rigid instructions.
None of this appears in any dataset. Yet it's precisely this kind of granular, context-specific knowledge that separates good coaching from great coaching. It's the difference between knowing that "high pressing correlates with success" and knowing that your team presses effectively only when these specific conditions are met with these particular players in this tactical setup against these kinds of opponents.

The coach who knows his squad intimately has access to a dataset that is both smaller and vastly richer than anything in Statsbomb, Opta or Wyscout. He knows which tactical instruction will land with which player. He understands the invisible web of relationships, hierarchies, and communication patterns that make the team function. He can read body language in training and adjust accordingly. This is bespoke intelligence that can't be aggregated or generalized.
The Tacit Dimension
There's a concept in philosophy of knowledge called "tacit knowledge" which means things we know but cannot articulate or formalize. Football is thick with tacit knowledge. Players develop intuitions about space, timing, and pressure that they can't verbalize. Coaches sense when a team talk needs to be fierce or gentle. These forms of knowing resist quantification not because our measurement tools are crude, but because the knowledge itself is embodied, contextual and relational.
The big data approach necessarily excludes this dimension. It can only work with what can be measured, standardized, and compared. In doing so, it may be systematically missing the phenomena that matter most.
When Big Data Misleads
The danger isn't just that big data approaches are incomplete. It's that they can actively mislead by suggesting false equivalences. When we say "Player A and Player B both have similar progressive passing numbers," we create an illusion of interchangeability that evaporates in practice. Player A might unlock those passes through individual brilliance, while Player B does so through understanding his teammates' runs.
Similarly, identifying that "playing out from the back" correlates with success league-wide tells you nothing about whether your center-backs have the composure and technique to execute it against your upcoming opponent's pressing scheme. The aggregate pattern may be real while being locally inapplicable.
This creates a particularly modern failure mode: teams adopting tactics that work "in general" without the specific personalities, relationships, and understanding required to make them work in particular. It's cargo cult analytics, copying the form without the substance.
The Case for Radical Localism
What would football analytics look like if we started from the opposite assumption? Instead of seeking universal patterns, what if we focused on building exquisitely detailed models of individual squads?
The expertise flows in the opposite direction.
Imagine data collection focused entirely on your 25 players. Not just what they do, but how they do it. The qualitative texture of their decision-making, the relationships between them, the contextual factors that enhance or degrade their performance. Combine this with systematic ways of capturing coaching intuition: structured interviews about player characteristics, frameworks for documenting what works in training, methods for making tacit knowledge more explicit.
The goal wouldn't be generalization but particular insight. Not "what do successful teams do?" but "what does success look like for this team, with these players, in these circumstances?"
This approach inverts the normal priority. Instead of data analysts discovering patterns and presenting them to coaches, coaches would be the primary knowledge generators, with data serving to test, refine, and systematize their intuitions. The expertise flows in the opposite direction.
Synthesis, Not Substitution
None of this argues for abandoning statistical analysis. There's real value in knowing opponent tendencies, in tracking fitness and workload, in having objective measures of what happened on the pitch. The question is one of philosophy.
Perhaps the future of football analytics isn't bigger databases but better integration of different knowledge types. Statistical analysis that is always and explicitly in dialogue with coaching expertise. Data that serves to enhance local knowledge rather than replace it with universal patterns. Tools designed to help coaches clarify and test their intuitions rather than override them.
The revolution might not be finding the perfect xG model that works everywhere, but building the perfect feedback loops that help each coach understand their unique situation better.
A Different Kind of Sophistication
There's a certain intellectual appeal to the idea that football can be decoded through sufficient data and analysis. It suggests the sport is ultimately rational and comprehensible.
But what if the higher sophistication lies precisely in recognizing what can't be systematized? In building approaches that remain humble about the limits of quantification while using it judiciously? In trusting that the coach who has worked with these players every day for months might actually know things that can't be extracted from tracking data?
The irony is that this "small-data" position might actually be more empirically grounded than big data maximalism. It starts with observation: that great coaches often succeed through deep particular knowledge rather than general principles. That tactical innovations usually come from contextual problem-solving rather than abstract analysis. That championship teams develop an almost telepathic understanding that exists nowhere in the statistics.
If the evidence suggests that football success is irreducibly local and contextual, then perhaps our analytics should be too.
Author: Antti Nyman
*This post is the first part of a three-part article series by Antti Nyman on the boundaries between football analytics and coaching.
**Antti Nyman is a football video and data analyst working with the Finnish youth national teams. You can learn more about his work on LinkedIn: https://www.linkedin.com/in/anttinyman



Comments