Physical AI Needs a System Connector & Regulator, Not Just a Body & a Model

16 Mar

Why the biggest opportunity in industrial AI isn't the robot - it's the operating loop that governs the whole site.

~7min read

NVIDIA's GTC 2026 is happening this week in San Jose, and the message is unambiguous: physical AI is the next platform.

Jensen Huang declared it at CES in January - "the ChatGPT moment for robotics is here" - and the GTC agenda doubles down with an entire pillar dedicated to physical AI alongside agentic AI, inference, and AI factories. Siemens announced a partnership to build the first fully AI-driven adaptive manufacturing sites. Microsoft and NVIDIA are co-marketing physical AI as the decisive shift for manufacturing. Dassault Systèmes is integrating "virtual twins" with NVIDIA's simulation stack. Humanoid robots from Boston Dynamics, NEURA Robotics, and others featured prominently in the keynote.

The investment community is paying attention too. Physical AI contributed over $6 billion in NVIDIA revenue in fiscal 2026 - still less than 3% of total, but growing fast. The consensus is that this is early and enormous.

Here's the thing, though. Almost all of this energy is directed at individual machines. A humanoid robot that picks things up. An AMR that navigates a warehouse. An autonomous truck that follows a route. A robotic arm that learns new tasks in simulation before deployment.

These are real, and they're impressive.

But they're not the hard problem.

The hard problem isn't the robot — it's the site

A production site isn't a single machine. It's a system of hundreds of machines, dozens of processes, shifting constraints, ageing equipment, experienced people who are retiring, and legacy systems that were never designed to talk to each other.

The hard problem isn't teaching a robot to pick up a part. It's making everything work together - sensors, AI models, autonomous machines, legacy SCADA, human operators, maintenance teams, safety systems, and production targets - in a way that actually improves outcomes without creating new risks, all while the very system itself is being changed in ways that we’ve never dealt with.

The Geminum team has over 200 years of experience in mining, manufacturing, power, automation, robotics and now AI. The sites we’ve worked with don't lack technology ambition. They lack operational coherence. They have more dashboards than they can use, more data than they can interpret, and more AI pilots than they can sustain. What they don't have is a way to turn all of that into faster decisions, governed actions, and continuous learning at the site level.

That's the gap physical AI needs to close. Not at the robot. At the site.

NVIDIA's definition already describes what we build

Here's what we find interesting. NVIDIA defines physical AI as "intelligence that can sense, reason, and act in the real world." That's essentially a description of what an operational digital twin does at the site level.

At Geminum, we structure this as the Sense → Decide → Act → Learn (SDAL) operating loop. It's the core of our Site Twin product, and it maps directly onto the physical AI definition:

Sense: the twin fuses data from OT/IoT systems, SCADA historians, cameras, LiDAR, fleet systems, EAM/CMMS, and AI models into high-context operational signals. It senses where humans can't be - in the dark, across remote work fronts, in dust and poor visibility, across every connected system, the tiny variations in data from 2 systems that indicate something is amiss, but no human is paid to sit and stare at data streams, and most site supervisors spend far more time behind a desk than on the floor. This is physical AI's sensing layer, applied to the entire site, not just one robot's field of view.

Decide: the twin identifies what's limiting outcomes, what's creating risk, and what's trending toward failure. It links local signals to global system impacts. A conveyor fault isn't just a conveyor problem - it's a throughput constraint, a scheduling dependency, a safety consideration, and a maintenance prioritisation call, all at once. The twin holds that context so the right people (or agents) can decide faster with less ambiguity.

Act: we define four action types, from lowest to highest consequence: Communicate, Collaborate, Escalate, and Intervene. Each maps to a different level of disruption and authority. A shift supervisor getting a contextualised alert is a Communication. A cross-functional team spinning up around an emerging issue is Collaboration. A safety system acting first and informing humans after is an Intervention. These aren't just notification categories - they're the action grammar for physical AI on a production site.

Learn: the weakest link in most digital twin deployments, and the one we're most focused on closing. Connecting outcomes back to predictions, decisions, and actions so that the system actually gets better. Not through a quarterly review meeting. Continuously, persistently.

When NVIDIA talks about physical AI, they're describing intelligence that operates in the real world. We agree. We've just built the operating loop for it at the site scale.

The training ground vs. the operating system

There's a distinction worth making explicit, because it explains where most of the GTC conversation stops and where the real operational value begins.

NVIDIA's Omniverse platform, Isaac Sim, and Cosmos world models are primarily simulation and training environments. They let you build a physically accurate digital twin of a warehouse or factory, train robots in that virtual world, test fleet coordination, and validate behaviours before deploying them to real hardware. This is valuable and necessary work, and we love using this ecosystem every day.

But once those robots are deployed, what governs them?

Once the AMR fleet is running on the warehouse floor, what decides which delivery takes priority when two agents have conflicting objectives? When a safety camera detects a human in an exclusion zone, who (or what) has the authority to stop the nearest machine? When a predictive model says a pump will fail in four hours, who decides whether to pull maintenance forward or push production to hit a shift target first?

These aren't simulation questions. They're operational governance questions. And they can't be answered by each individual AI agent in isolation. They need a site-level system that holds the full context - what's happening now, what's predicted, what's allowed, and what's been learned from similar situations in the past.

That's the difference between a digital twin as a training ground and a digital twin as an operating system.

The training ground builds the agents. The operating system governs them.

Progressive autonomy: why governance is the bottleneck, not AI capability

The reason physical AI hasn't yet transformed industrial operations isn't that the AI doesn't work. The models are good. Computer vision is good. Predictive algorithms are good. Edge inference is getting faster and cheaper. None of them are perfect, and 5 nines of reliability is still a journey we all have to walk.

The bottleneck is governance - specifically, the gap between what an AI agent can do and what the organisation allows it to do in a given context, and within a time frame where taking action would make a difference.

Most industrial sites still govern operations through static policies and procedures - PDFs written by someone who isn't on site anymore, for conditions that may not exist anymore. These documents are necessary (and required by law) but they're low-bandwidth governance. They cover the scenarios the authors imagined, not the situations the operation actually produces.

The gap between static governance and operational reality is currently bridged by experienced humans. Shift supervisors interpret rigid rules in fluid contexts. Maintenance planners juggle competing priorities based on decades of pattern recognition. Safety officers make judgment calls that aren't in any manual.

This works right up until those people retire. The silver tsunami isn't a warning anymore; it's happening. And when an experienced operator leaves, their regulatory capacity - their ability to handle the variety of situations the operation throws at them - walks out with them.

Physical AI needs a governance layer that's alive, not static. Rules that flex with context. Boundaries that respond to what's actually happening on site right now. Playbooks that emerge from data about what's worked and what hasn't, not from someone's memory of what happened five years ago.

We call this progressive autonomy, and we stage it deliberately:

Mode 1: System prompts a human. The twin surfaces evidence, drafts recommendations, requests a decision. The human decides and acts.

Mode 2: System acts with humans involved. Bounded actions with acknowledgement and shared control. The twin does the routine work; humans validate.

Mode 3: System acts then informs humans. Reserved for high-severity, well-understood scenarios where the system can respond faster than a human can arrive. Every Mode 3 action is investigated.

This ladder isn't just a trust-building exercise. It's the mechanism by which sites can safely reduce the number of people required to operate, without increasing risk. You can't do that by only deploying robots. You can only do that by deploying robots within a governance system that knows what they're allowed to do, right now, in this specific context.

What the NVIDIA ecosystem is missing (and where the opportunity is)

We want to be clear: we’re not criticising NVIDIA's physical AI stack. We're an NVIDIA Inception member. Omniverse, Isaac Sim, Jetson, Metropolis - these are powerful building blocks and sit at the heart of our products. The Mega blueprint for testing robot fleets in digital twins is excellent work. The partnerships with Siemens, Dassault, and others are moving the industry forward.

But the conversation at GTC, and in the broader market, is overwhelmingly about the components of physical AI: better chips, better models, better simulation, better robots. What's underrepresented is the orchestration & governance layer - the system that sits above all these components and makes them work together coherently across an entire production site.

Consider what any medium-sized mining operation or steel processing plant actually needs:

Dozens of AI vision models running on edge compute for safety and process monitoring
Multiple OEM and plant specific predictive maintenance models consuming vibration, temperature, and load data from hundreds of assets
A constraint model tracking where throughput is being lost across the value chain
Fleet management systems coordinating autonomous and human-operated vehicles
Work order systems managing thousands of maintenance activities
Planning tools balancing short-interval production targets against maintenance windows
Safety systems with escalation rules and intervention authorities
Human operators across production, maintenance, and safety who need to share context and coordinate action and take responsibility for outcomes.

No single robot, no single AI model, and no single simulation environment can hold all of this together. You need an operating loop at the site level - something that senses across all these domains, decides what matters, acts through the right channels, and learns from outcomes.

That's what a Site Twin is and that's why we think the physical AI conversation, as exciting as it is, is still incomplete.

Where this goes from here

The next 2–3 years will see physical AI move from demos to operational reality in industrial settings. When it does, the sites that succeed won't be the ones with the best individual robots or the most GPU compute. They'll be the ones that solve the orchestration & governance problem - the ones that have an operational twin governing how all the pieces work together.

The operating loop - Sense, Decide, Act, Learn - isn't new. It's how good operators have always worked. What's new is the ability to run that loop at machine speed, across the whole site, with governance that adapts to context rather than defaulting to the lowest common denominator of a policy document.

Physical AI gives us better reach. The Site Twin connects and regulates this reach within the ecosystem of assets, systems and people.

---

Geminum is an NVIDIA Inception member. We design and build operational digital twins for production-intensive industries: mining, steel, recycling, and heavy manufacturing. Our Site Twin product unifies production, safety, and maintenance into a single operating loop, enabling teams to make faster decisions with better context, and progressively reduce the human effort required to run safe, high-performing operations.

If you're exploring how physical AI fits into your site operations, we'd like to talk.

Robert Foster