融合双目视觉与多阶段特征的交通车辆三维尺寸估计

卢康; 周洲; 王海斌; 王越

doi:10.12415/j.issn.1671-7872.25105

融合双目视觉与多阶段特征的交通车辆三维尺寸估计

Vehicle 3D Dimension Estimation with Fusion of Binocular Vision and Multi-stage Features

摘要

摘要: 为解决交通车辆超限检测问题，提出一种融合双目视觉与多阶段特征的车辆三维尺寸估计方法。首先，采用YOLOv8网络检测车辆并提取感兴趣区域，通过ResNet18网络提取深层语义特征；利用HRNet网络检测车辆的预定义关键点，以车辆物理尺寸为约束，通过梯度下降法迭代优化双目相机内、外参数，结合三角测量原理计算车辆初始三维尺寸；最后，将深层特征与初始尺寸拼接为多模态特征，输入至本文设计的双目特征融合多层感知机(BFMLP)模型中回归，输出车辆精确的长、宽、高。在真实交通场景下开展对比实验、消融实验及典型案例分析，结果表明：本文方法估计的三维尺寸平均相对误差(MRE)达0.04，性能显著优于传统端到端回归方法，验证了多模态特征融合与几何约束协同优化的有效性，在受控交通场景中表现出稳定的测量精度与工程可行性。该方法实现了基于深度学习的高精度、实时车辆三维尺寸估计，为交通运输智能化管理提供了有效技术支撑。

Abstract: To address the challenge of traffic vehicle over-limit detection, a vehicle 3D dimension estimation method integrating binocular vision and multi-stage features was proposed. First, the YOLOv8 network was adopted to detect vehicles and extract regions of interest, and the ResNet18 network was used to extract deep semantic features. The HRNet network was then employed to detect the predefined key points of the vehicle. Constrained by the physical dimensions of the vehicle, the internal and external parameters of the binocular camera were iteratively optimized via gradient descent, and the initial 3D dimensions of the vehicle were calculated based on the triangulation principle. Finally, the deep features and the initial dimensions were concatenated into multi-modal features, which were fed into the designed binocular feature fusion multi-layer perceptron (BFMLP) model for regression, and the accurate length, width, and height of the vehicle were obtained. Comparative experiments, ablation experiments, and typical case analyses were conducted in real traffic scenarios. The results show that the mean relative error (MRE) of the estimated 3D dimensions by the proposed method is 0.04, which is significantly superior to traditional end-to-end regression methods, demonstrating the effectiveness of the synergistic optimization of multi-modal feature fusion and geometric constraints. The method exhibits stable measurement accuracy and engineering feasibility in controlled traffic scenes. The proposed method achieves high-precision and real-time vehicle 3D dimension estimation based on deep learning, providing effective technical support for intelligent transportation management.

HTML全文

参考文献(24)

施引文献

资源附件(0)