Skip to content

Enhancing urban building energy models with Vision Transformers: A Case study in material classification from Google street view

Liu, Y., & Abbasabadi, N. (2025). Enhancing urban building energy models with Vision Transformers: A Case study in material classification from Google street view. Energy and Buildings, 333, Article 115457. https://doi.org/10.1016/j.enbuild.2025.115457.

View Publication

Abstract

The growing urbanization and increased urban energy consumption highlight the need for energy use and greenhouse gas emissions reduction strategies. Urban Building Energy Modeling (UBEM) emerged as a valuable tool for managing and optimizing energy consumption at the neighborhood and city scales to support carbon reduction goals. However, the accuracy of the UBEM is often limited by the lack of large-scale building façade material dataset. This study introduces a new approach to enhance UBEM by integrating an automatic deep learning material classification pipeline. The pipeline leverages multiple views of Google Street View Images (SVIs) to extract building façade material information, utilizing two Swin Vision Transformer (ViT) models to capture both global and local features from the SVIs. The pipeline achieved a main material classification accuracy reached 97.08%, and the sub-category accuracy reached 91.56% in a multi-class classification task. As the first study to apply a deep learning model for material classification to enhance the UBEM framework, this work was tested on the University of Washington campus, which features diverse facade materials. The model demonstrated its effectiveness by achieving an overall accuracy increase of 11.4% in year-round total operational energy simulations. The scalability of this material classification pipeline enables a more accurate and cost-effective application of UBEM at broader urban scales.