SPLiDeR
Joint Depth and Reflectivity Estimation using Single Photon LiDAR



Abstract

Single-Photon Light Detection and Ranging (SP-LiDAR) is emerging as a leading technology for long-range, high-precision 3D vision tasks. In SP-LiDAR, timestamps encode two complementary pieces of information: pulse travel time (depth) and the number of photons reflected by the object (reflectivity). Existing SP-LiDAR reconstruction methods typically recover depth and reflectivity separately or sequentially use one modality to estimate the other. Moreover, the conventional 3D histogram construction is effective mainly for slow-moving or stationary scenes. In dynamic scenes, however, it is more efficient and effective to directly process the timestamps. In this paper, we introduce an estimation method to simultaneously recover both depth and reflectivity in fast-moving scenes. We offer two contributions (1) A theoretical analysis demonstrating the mutual correlation between depth and reflectivity and the conditions under which joint estimation becomes beneficial. (2) A novel reconstruction method, ``SPLiDeR'', which exploits the shared information to enhance signal recovery.

SP-LiDAR Framework
Different Processing Methods of Timestamp Data and the Corresponding Results: (a) A SPAD sensor array captures noisy timestamp data from a dynamic scene at high speed. (b) To mitigate noise in the raw data, conventional approaches cluster multiple detections to form a 3D cube; the object's reflectivity and depth are then estimated by locating the peak's height and position using algorithms such as maximum likelihood estimation. (c) We propose SPLiDeR, a deep learning framework that directly leverages individual timestamp frames. (d) Conventional algorithms often suffer from blurry results due to long integration times. (e) In contrast, our proposed method achieves more accurate and sharper reconstructions of dynamic scenes.

Goals

The goal of this paper is to explore the benefit of simultaneous estimation of depth and reflectivity. To this end, we aim to answer two questions:

  1. Does recovering depth help recover reflectivity, and vice versa?
    Does recovering depth help recover reflectivity, and vice versa? To address this, we derive the Maximum-Likelihood Estimator (MLE) for the joint recovery problem and analyze the Cramer–Rao Lower Bound (CRLB) to theoretically establish conditions where information sharing is beneficial. This forms the foundation for joint estimation
  2. Can we build an efficient neural network to jointly recover depth and reflectivity, for fast-moving scenes?
    Building on our theoretical and experimental insights, we propose SPLiDeR, a dual-channel joint estimation network with a feature-sharing mechanism. To address information loss from scarce photon detections, we align features from neighboring timestamp frames using optical flow. Our method extracts multiscale features for depth and reflectivity through two parallel branches, enhancing each modality and capturing both fine and coarse details for improved performance.

What is SP-LiDAR?

SP-LiDAR stands for Single-Photon Light Detection and Ranging. Unlike conventional systems, it is an advanced LiDAR technology that uses single-photon avalanche diodes (SPADs) to detect and measure distances and reflectivity based on the time of flight of individual photons. SP-LiDAR systems are particularly advantageous because of their extreme sensitivity and ability to operate with very low levels of light, making them well-suited for scenarios like long-range sensing and high-speed dynamic scenes.

SP-LiDAR Framework
Typical Arrangement of a LiDAR Setup.
SP-LiDAR Framework
SP-LiDAR Optical Setup.

Method

In our study, SPLiDeR (Single-Photon LiDAR Restoration) builds on SP-LiDAR by leveraging dual-channel joint estimation to recover both depth and reflectivity of dynamic scenes simultaneously. By integrating motion field estimation, quanta data restoration, and feature sharing, SPLiDeR enhances SP-LiDAR’s performance in dynamic and noisy environments, making it suitable for fast-moving scenes. By shifting from histogram-based processing to timestamp-based estimation, our method provides a more efficient and accurate solution for simultaneous depth and reflectivity estimation.

The SPLiDeR Network Architecture. The proposed SPLiDeR network consists of four main components. This network produces multiscale reconstructions for both depth and reflectivity simultaneously. PMSR - Progressive Multi-Scale Reconstruction, STAR - Spatio-Temporal Alignment with Residual Refinement, IBFE - Iterative Multi-Scale Bi-directional Flow Estimation, CCAM - Convolutional Cross Attention Module.

Results

Comparison with other SP-LiDAR Algorithms

Timestamp Data BPRNN DDFN PEINN Ours
Ground Truth - Scene 02 Ground Truth - Scene 03 Ground Truth - Scene 04 Ground Truth - Scene 05 Ground Truth - Scene 06
AnimateDiff AnimateDiff AnimateDiff SD3 SD3
Ground Truth - Scene 02 FLUX FLUX SD3 SD3
AnimateDiff AnimateDiff AnimateDiff SD3 SD3

Comparison with other Video Reconstruction Algorithms

Binary Data RVRT FlorNN Quiver Ours
Ground Truth - Scene 02 FLUX FLUX SD3 SD3
AnimateDiff AnimateDiff AnimateDiff SD3 SD3
CameraCtrl CameraCtrl CameraCtrl SD3 SD3
Ours Ours Ours SD3 SD3
  • J. Liang, Y. Fan, X. Xiang, R. Ranjan, E. Ilg, S. Green, J. Cao, K. Zhang, R. Timofte, and L. V. Gool, "Recurrent Video Restoration Transformer with Guided Deformable Attention (RVRT)," NeurIPS, 2022.
  • J. Li, X. Wu, Z. Niu, and W. Zuo, "Unidirectional Video Denoising by Mimicking Backward Recurrent Modules with Look-Ahead Forward Ones (FlorNN)," ECCV, 2022.
  • P. Chennuri, Y. Chi, E. Jiang, G. M. D. Godaliyadda, A. Gnanasambandam, H. R. Sheikh, I. Gyongy, and S. H. Chan, “Quanta Video Restoration (QUIVER)," ECCV, 2024.
  • J. Peng, Z. Xiong, H. Tan, X. Huang, Z.-P. Li, and F. Xu, "Boosting Photon-Efficient Image Reconstruction With A Unified Deep Neural Network (BPRNN)," IEEE TPAMI, 2023.
  • D. B. Lindell, M. O’Toole, and G. Wetzstein, "Single-Photon 3D Imaging with Deep Sensor Fusion (SPSDF)," ACM Trans. Graph, 2018.
  • J. Peng, Z. Xiong, X. Huang, Z.-P. Li, D. Liu, and F. Xu, “Photon-Efficient 3D Imaging with A Non-local Neural Network (PEINN),” ECCV, 2020.