Semantic Segmentation Using the U-Net Architecture on Monocular Datasets

Authors

  • Ahmad Fikri Hanafi Universitas Negeri Surabaya
  • Ervin Yohannes Universitas Negeri Surabaya

Abstract

Abstract— This study implements a deep learning model based on the U-Net architecture with a pre-trained ResNet50 backbone on ImageNet to solve the task of semantic segmentation on monocular images. The Cityscapes dataset is used as the main benchmark because it provides high-quality data with high resolution that is widely recognized in urban image segmentation research. Experiments were conducted to evaluate the model's performance with varying learning rate values, aiming to understand the model's sensitivity to training parameters. The results show that a learning rate of 1e-4 yields optimal performance, achieving a Mean Intersection over Union (Mean IoU) of 86.59% and pixel accuracy of 97.63%. Visualization of the segmentation predictions demonstrates the model's ability to accurately recognize urban objects and structures, especially under varying lighting conditions and background complexity. These findings confirm the effectiveness of U-Net in image segmentation tasks, as well as the importance of hyperparameter selection and dataset quality in achieving high model performance in the monocular image domain.

 

Keywords— Convolusional Neural Network, Deep Learning, U-Net, Encoder-Decoder, Semantic Segmentation

Downloads

Download data is not yet available.

Downloads

Published

2025-07-14

Issue

Section

Articles
Abstract views: 11 , PDF Downloads: 10