Development of 3D Endoscope System Based on Active Stereo

Overview

Measurement of tumor sizes in medicine is important for determining treatment policies. However, in many cases, such measurements rely on visual estimation by physicians, which tends to introduce errors due to individual differences. Objective measurement of tumor shapes using endoscopic devices would represent a significant step toward solving this problem. Furthermore, 3D measurement with endoscopes is becoming increasingly important for surgical robots and for training data in medical AI applications.

We are developing a 3D endoscope system based on active stereo. A compact pattern projector is inserted into the instrument channel of the endoscope, and images are captured by the endoscopic camera while a structured-light pattern is projected onto the target surface. Correspondences between the captured image and the projected pattern are estimated through image processing, and 3D measurement is performed by triangulation.

Pattern Projector: Wide-Angle Micro Projector Using DOE

Since the measurement targets are biological tissues, projected patterns can become blurred due to subsurface scattering within tissues. In addition, noise and disturbances are more significant in the endoscopic environment than in ordinary camera environments. To address these issues, we developed a gap-coded grid pattern that modulates grid points by the gaps between grid lines, and a compact pattern projector that can sharply project patterns using a Diffractive Optical Element (DOE). The projector achieves a wide angle of view of approximately 90 degrees, enabling wide-area coverage. By inserting the projector into the instrument channel, endoscopic imaging can be performed while projecting the pattern.

(Left) Projection pattern by the wide-angle micro projector (approx. 90-degree angle of view) [From Furukawa et al., Healthcare Technology Letters, 2025]. (Right) Endoscopic image with pattern projected (inner wall of pig stomach)

Decoding of Projected Patterns by Deep Learning

System configuration of the 3D endoscope

The developed pattern is "decoded" by a deep learning model. This enables pixel-wise estimation of correspondences from the camera to the projector.

>
Measurement example of pig stomach: (from left) original image, decoded result [x coordinate], decoded result [y coordinate], single-frame shape

Multi-Frame Optimization

For 3D reconstruction from sequential images, we propose a method that integrates Neural Signed Distance Field (Neural-SDF) representation with structured-light (SL) projection to improve geometric consistency and accuracy across frames.

System combining an endoscope, a wide-angle micro pattern projector (inserted via instrument channel), and an EM sensor probe. The EM sensor outputs poses with respect to a magnetic field generator. [From Furukawa et al., Healthcare Technology Letters, 2025]

The processing flow is shown below. For each frame, the gap-coded pattern is decoded to obtain a pixel-wise correspondence map, and per-frame calibration (estimation of camera and projector poses) is performed. Then, a joint optimization over all frames is carried out to simultaneously refine camera poses, projector poses, and surface geometry.

Processing flow: gap-coded pattern decoding to obtain correspondence maps (visualized modulo 256), per-frame calibration, and all-frame joint shape optimization. [From Furukawa et al., Healthcare Technology Letters, 2025]

Per-frame shape reconstruction alone suffers from shape gaps caused by decoding errors, and scale or geometric inconsistencies across frames. In the all-frame joint optimization, a pattern projection image and a camera-projector correspondence map are rendered on the neural surface and compared with the captured image and decoded result. The overall shape, camera positions, and projector positions are optimized so that these rendered outputs match the observations, yielding geometrically consistent depth images and 3D shapes across all frames.

All-frame joint optimization: the pattern projection image and camera-projector correspondence map are rendered on the neural surface and compared with captured images and decoded results. Overall shape, camera positions, and projector positions are optimized to minimize the discrepancy. [From Furukawa et al., Healthcare Technology Letters, 2025]

Examples of Multi-Frame Optimization Results

The following videos show geometrically consistent depth images and 3D shape reconstruction results obtained by all-frame joint optimization for multiple in-vivo samples.

Sample	Depth image	3D shape
Sample A
Sample B
Sample C
Sample D
Sample E

Publications

Furukawa, R., Nagamatsu, G., Oka, S., Kotachi, T., Okamoto, Y., Tanaka, S., and Kawasaki, H., Simultaneous Shape and Camera-Projector Parameter Estimation for 3D Endoscopic System Using CNN-Based Grid-Oneshot Scan, Healthcare Technology Letters, 6(6):249-254, 2019
Furukawa, R., Oka, S., Kotachi, T., Okamoto, Y., Tanaka, S., Sagawa, R., and Kawasaki, H., Fully Auto-calibrated Active-Stereo-Based 3D Endoscopic System Using Correspondence Estimation with Graph Convolutional Network, Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp.4357-4360, 2020
Mikamo, M., Kawasaki, H., Sagawa, R., and Furukawa, R., GCN-Calculated Graph-Feature Embedding for 3D Endoscopic System Based on Active Stereo, Communications in Computer and Information Science, pp.253-266, 2021
Mikamo, M., Furukawa, R., Oka, S., Kotachi, T., Okamoto, Y., Tanaka, S., Sagawa, R., and Kawasaki, H., Active Stereo Method for 3D Endoscopes Using Deep-Layer GCN and Graph Representation with Proximity Information, Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp.7551-7555, 2021
Furukawa, R., Mikamo, M., Kawasaki, H., Sagawa, R., Oka, S., Kotachi, T., Okamoto, Y., and Tanaka, S., Simultaneous Estimation of Projector and Camera Poses for Multiple Oneshot Scan Using Pixel-Wise Correspondences Estimated by U-Nets and GCN, Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 10(5):540-548, 2021
Furukawa, R., Mikamo, M., Sagawa, R., and Kawasaki, H., Single-Shot Dense Active Stereo with Pixel-Wise Phase Estimation Based on Grid-Structure Using CNN and Correspondence Estimation Using GCN, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp.245-255, 2022
Mikamo, M., Furukawa, R., Oka, S., Kotachi, T., Okamoto, Y., Tanaka, S., Sagawa, R., and Kawasaki, H., 3D Endoscope System with AR Display Superimposing Dense and Wide-Angle-of-View 3D Points Obtained by Using Micro Pattern Projector, Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp.881-885, 2022
Furukawa, R., Mikamo, M., Sagawa, R., Okamoto, Y., Oka, S., Tanaka, S., and Kawasaki, H., Multi-Frame Optimisation for Active Stereo with Inverse Rendering to Obtain Consistent Shape and Projector-Camera Poses for 3D Endoscopic System, Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 11(4):1178-1186, 2022
Furukawa, R., Sagawa, R., Oka, S., Tanaka, S., and Kawasaki, H., Single and Multi-Frame Auto-Calibration for 3D Endoscopy with Differential Rendering, Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp.1-5, 2023
Furukawa, R., Chen, E., Sagawa, R., Oka, S., and Kawasaki, H., Calibration-Free Structured-Light-Based 3D Scanning System in Laparoscope for Robotic Surgery, Healthcare Technology Letters, 11(2-3):196-205, 2024
Furukawa, R., Sagawa, R., Oka, S., and Kawasaki, H., NeRF-Based Multi-Frame 3D Integration for 3D Endoscopy Using Active Stereo, Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp.1-5, 2024
Furukawa, R., Kawasaki, H., and Sagawa, R., Incremental Shape Integration with Inter-Frame Shape Consistency Using Neural SDF for a 3D Endoscopic System, Healthcare Technology Letters, 12(1):e70001, 2025
Furukawa, R., Sagawa, R., and Kawasaki, H., Sequential Endoscopic-Image 3D Reconstruction Using Structured-Light and Neural Signed Distance Field with Photometric Loss, Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp.1-6, 2025
Furukawa, R., Inui, T., Sagawa, R., and Kawasaki, H., Acquiring Aligned Endoscopic and Depth Image Pairs Using Structured-Light Projection, Neural Surfaces and an Electromagnetic Positional Sensor, Healthcare Technology Letters, 12(1), 2025

Back

Computer Vision and Graphics Laboratory