Enhancing transfer learning for building energy data time series using pattern-based hidden Markov models
Room 7
August 26, 4:15 pm-4:30 pm
Collecting and preparing a large amount of data to train machine learning algorithms is time-consuming and not always feasible due to issues like sensor malfunctioning and privacy concerns. Transfer learning offers a promising solution to these problems. By leveraging prior knowledge gained from related tasks (i.e., source domain) it is possible to enhance the performance of a model on a new task (i.e., target domain) without extensive new data collection. However, the similarity between the source and target domains is crucial in transfer learning methods. Dissimilarity may result in a negative transfer, hampering the model’s performance and hindering the energy analysis.
Traditional clustering and similarity approaches have limitations in capturing a system’s internal dynamics and temporal dependencies. Therefore, to get a better understanding of the underlying data-generation mechanism, it is crucial to uncover latent states that reveal hidden patterns and transitions not observable with conventional methods. For that purpose, this paper focuses on the optimal problem formulation for transfer learning in the context of building energy analysis. Specifically, the research steps include the implementation of an existing pattern-based Hidden Markov model (pHMM) to find pattern-based correlations between energy use time series collected in different buildings. The model is developed using a two-phase approach. In the first phase, the time series is segmented and clustered to provide a robust initial parameter estimate for the pHMM. Following this, the second phase involves iteratively refining the model by re-segmenting and re-clustering the time series based on the learned states. Finally, the correlation between the various domains is quantified by computing the similarities between the observed data patterns.
In order to assess the effectiveness of the proposed approach, a same predictive model is evaluated across the given time series. The defined correlation metric is validated by examining whether the performance drops align with the indication provided. Additionally, commonly used correlation metrics, such as Dynamic Time Warping or Pearson Correlation Coefficient, are used as baselines. It is proven that the proposed pHMM can effectively identify and correlate patterns between time series, enhancing the accuracy of pattern detection and similarity assessment. Therefore, the presented method might significantly reduce data collection costs, while enabling accurate building energy analysis. In order to foster reproducibility and replicability of the results, the analysed energy data are retrieved from the Building Data Genome Project 2 dataset. Additionally, all codes developed during this study will be made available as an open-source repository.
Presenters
Antonio Liguori
RWTH Aachen University