BS2025 / Program / From correlation to causality: Heat pump control with field data using Double Machine Learning (DML)

From correlation to causality: Heat pump control with field data using Double Machine Learning (DML)

Location
Room 5
Time
August 25, 11:00 am-11:15 am

Recent advancements in data availability and computational resources have driven the widespread adoption of machine learning (ML) in control-oriented building simulations. Many ML models, including artificial neural networks (ANNs), rely on observational correlations between input and output variables to model dynamic relationships in building systems. However, correlation does not imply causation (Pearl & Mackenzie, 2018). Real-world building operational data—shaped by latent factors such as occupant comfort constraints, uncontrollable weather conditions, and pre-existing control strategies— may introduce spurious correlations between control mechanisms and external factors. Consequently, correlation-driven models risk overfitting to historically observed patterns, limiting scalability.

To address this, this study proposes a novel causality-driven modeling approach for a heat pump’s (HP) cooling behavior in a 39 m² zone, leveraging Double Machine Learning (DML). Unlike correlation-driven methods, the DML employs double nuisance functions to systematically de-bias training data. The proposed approach achieves two key objectives: (1) quantifying the HP control effect on indoor air temperature (IAT), accounting for external heterogeneity, and (2) generating counterfactual IAT outcomes to evaluate alternative HP control strategies.

The proposed DML model, trained on causalities, outperformed an ANN model, trained on correlations, despite both being developed using the same dataset. The DML model extrapolated temperature dynamics to unseen setpoints in a physically consistent manner under varying control conditions. Moreover, although the true control effect could not be directly measured, the inferred causal effect on IAT was statistically significant (p = 0.048), supporting the model’s validity.

In contrast, while the ANN model achieved a low mean squared error (MSE = 0.19 °C) in IAT prediction, it often failed to maintain physical consistency under unseen control options. This highlights how correlation-driven approaches can misinterpret confounding factors and operational biases in real-world datasets, ultimately misrepresenting the underlying mechanisms of building systems. Consequently, a fundamental shift in the modeling lens—from correlation to causation— is essential to ensure reliable and scalable control decision-making.

Presenters

Create an account or log in to register for BS2025