Phuc Nguyen Duc Anh | PhD in Computer Science at UMD

Phuc Nguyen Duc Anh

Ph.D. Student @ University of Maryland, College Park Working on |

I’m a PhD student at University of Maryland, College Park advised by Prof. Ming C. Lin!

My research focuses on 3D/4D reconstruction and understanding, particularly on developing scalable methods that jointly recover geometry, motion, and semantics from multi-view images and videos.

Email GitHub Scholar LinkedIn X Codeforces YouTube

News

Jun 2026 🏆 OpenVO received the Compute Champion Award in CVPR 2026!

May 2026 🏠 We release Scale3D, a scalable approach to 3D reconstruction and scene understanding.

Feb 2026 🚗 OpenVO has been accepted to CVPR 2026. We have also released the code.

Nov 2025 🚀 We release OpenVO, an open-world visual odometry works on any video captured by any camera!

Aug 2025 🎓 I started my PhD in Computer Science at University of Maryland (UMD)!

Jul 2025 📢 HA-RDet and OE-3DIS have been accepted to ICCV 2025.

Feb 2025 📌 Any3DIS has been accepted to CVPR 2025. Cheerz!

Jul 2024 🚀 VinMap has been accepted to MAPR 2024. We have also released the dataset.

Jun 2024 🏆 VinAI-3DIS team has ranked top 1 in OpenSUN3D challenge at CVPR 2024.

Feb 2024 💡 Open3DIS has been accepted to CVPR 2024. We have also released the code.

Jun 2024 🏆 VinAI-3DIS team ranked top 2 in OpenSUN3D challenge at ICCV 2023. Technical report.

Nov 2023 🎓 I earned my B.S. in Computer Science at VNUHCM-UIT.

Feb 2023 🔬 I joined VinAI Research as a Research Resident.

Experience

Research Intern • Stack AV

Working on 4D Reconstruction and Motion-controllable 4D novel view synthesis for autonomous driving.

Mentor: Dr. Sudipta N. Sinha

May 2026 – Present

AI Research Resident • Movian AI

Worked on 3D scene understanding, focusing on class-agnostic 3D instance segmentation and vocabulary-free 3D point cloud understanding.

Mentor: Dr. Anh Tran, Prof. Cuong Pham

Feb 2025 – Mar 2025

AI Research Resident • VinAI

Focused on 3D scene reconstruction-understanding and vision-language model.

Mentor: Dr. Anh Tran, Prof. Cuong Pham, and Prof. Minh Hoai

Feb 2023 – Feb 2025

Highlighted Research · See Full List At Scholar

Scalable 3D Reconstruction and Understanding

Phuc Nguyen, Xiyi Chen, Dongki Jung, Anshul Rai, Guan-Ming Su, Dinesh Manocha, Ming C. Lin

Under Review

We introduce Scale3D, a novel framework for Scalable 3D reconstruction and understanding.

Project Code

We present Scale3D, a unified framework for scalable 3D reconstruction and scene understanding from a complex and long image sequences. Existing methods typically emphasize either geometric reconstruction or object-level understanding, but struggle to maintain both global geometric consistency and coherent instance identities over hundreds to thousands of views. Our key insight is to exploit their mutual synergy: geometry provides a robust basis for cross-view object association, while perception regularizes and refines geometry. Scale3D decomposes long video into overlapping clusters, reconstructs cluster-wise geometry and 2D segmentation masks, and introduces a 3D-Aware Alignment module to align local predictions into a global proxy geometry while recovering temporally coherent, globally ID-consistent video object segmentation. We further propose Instance-Aware Bundle Adjustment, leveraging dense instance-consistent correspondences to refine the camera poses and geometry. We evaluate Scale3D on ScanNet200 and ScanNet++v2 across three different benchmarking tasks: 3D reconstruction, class-agnostic 3D instance segmentation, and panoptic lifting for novel-view rendering and it achieves the state-of-the-art results with the improvement of 5% on AUC@30, 11% on AP and 10% on Panoptic Quality. Overall, our results highlight the importance of jointly modeling geometry and perception for scalable scene reconstruction and understanding over long image sequences with hundreds to thousands of views.

OpenVO: Open-World Visual Odometry with Temporal Dynamics Awareness

Phuc Nguyen*, Anh. N. Nhu*, Ming C. Lin

IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR 2026 Compute Champion Award

We introduce OpenVO, a novel framework for Open-world Visual Odometry (VO) with temporal awareness under limited input conditions.

Project Paper Code

OpenVO effectively estimates real-world–scale ego-motion from monocular dashcam footage with varying observation rates and uncalibrated cameras, enabling robust trajectory dataset construction from rare driving events recorded in dashcam. Existing VO methods are trained on fixed observation frequency (e.g., 10Hz or 12Hz), completely overlooking temporal dynamics information. Many prior methods also require calibrated cameras with known intrinsic parameters. Consequently, their performance degrades when (1) deployed under unseen observation frequencies or (2) applied to uncalibrated cameras. These significantly limit their generalizability to many downstream tasks, such as extracting trajectories from dashcam footage. To address these challenges, OpenVO (1) explicitly encodes temporal dynamics information within a two-frame pose regression framework and (2) leverages 3D geometric priors derived from foundation models. We validate our method on three major autonomous-driving benchmarks -- KITTI, nuScenes, and Argoverse 2 -- achieving more than 20% performance improvement over state-of-the-art approaches. Under varying observation rate settings, our method is significantly more robust, achieving 46%–92% lower errors across all metrics. These results demonstrate the versatility of OpenVO for real-world 3D reconstruction and diverse downstream applications.

Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking

Phuc Nguyen, Minh Luu, Anh Tran, Cuong Pham, Khoi Nguyen

IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR 2025

A novel class-agnostic approach for 3D instance segmentation that leverages 2D mask tracking to segment 3D objects in point cloud scenes.

Project Paper Code

Existing 3D instance segmentation methods frequently encounter issues with over-segmentation, leading to redundant and inaccurate 3D proposals that complicate downstream tasks. This challenge arises from their unsupervised merging approach, where dense 2D instance masks are lifted across frames into point clouds to form 3D candidate proposals without direct supervision. These candidates are then hierarchically merged based on heuristic criteria, often resulting in numerous redundant segments that fail to combine into precise 3D proposals. To overcome these limitations, we propose a 3D-Aware 2D Mask Tracking module that uses robust 3D priors from a 2D mask segmentation and tracking foundation model (SAM-2) to ensure consistent object masks across video frames. Rather than merging all visible superpoints across views to create a 3D mask, our 3D Mask Optimization module leverages a dynamic programming algorithm to select an optimal set of views, refining the superpoints to produce a final 3D proposal for each object. Our approach achieves comprehensive object coverage within the scene while reducing unnecessary proposals, which could otherwise impair downstream applications. Evaluations on ScanNet200 and ScanNet++ confirm the effectiveness of our method, with improvements across Class-Agnostic, Open-Vocabulary, and Open-Ended 3D Instance Segmentation tasks.

Open-Ended 3D Point Cloud Instance Segmentation

Phuc Nguyen*, Minh Luu*, Anh Tran, Cuong Pham, Khoi Nguyen

IEEE/CVF International Conference on Computer Vision ICCV 2025

Introducing the Vocablulary-Free 3D point cloud instance segmentation with different solid baselines and a novel pointwise method using multimodal LLM.

Project Paper Code

Open-vocabulary 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated their generalization ability to unseen objects. However, these methods still depend on predefined class names during inference, restricting agents' autonomy. To mitigate this constraint, we propose a novel problem termed Open-Ended 3D Instance Segmentation (OE-3DIS), which eliminates the necessity for predefined class names during testing. We present a comprehensive set of strong baselines inspired by OV-3DIS methodologies, utilizing 2D Multimodal Large Language Models. In addition, we introduce a novel token aggregation strategy that effectively fuses information from multiview images. To evaluate the performance of our OE-3DIS system, we benchmark both the proposed baselines and our method on two widely used indoor datasets: ScanNet200 and ScanNet++. Our approach achieves substantial performance gains over the baselines on both datasets. Notably, even without access to ground-truth object class names during inference, our method outperforms Open3DIS, the current state-of-the-art in OV-3DIS.

HA-RDet: Hybrid Anchor Rotation Detector for Oriented Object Detection

Phuc Nguyen

IEEE/CVF International Conference on Computer Vision ICCV 2025

Bachelor's Thesis

Hybrid-Anchor Rotation Detector (HA-RDet), which combines the advantages of both anchor-based and anchor-free schemes for oriented object detection.

Project Paper Code

Oriented object detection in aerial images poses a significant challenge due to their varying sizes and orientations. Current state-of-the-art detectors typically rely on either two-stage or one-stage approaches, often employing Anchor-based strategies, which can result in computationally expensive operations due to the redundant number of generated anchors during training. In contrast, Anchor-free mechanisms offer faster processing but suffer from a reduction in the number of training samples, potentially impacting detection accuracy. To address these limitations, we propose the Hybrid-Anchor Rotation Detector (HA-RDet), which combines the advantages of both anchor-based and anchor-free schemes for oriented object detection. By utilizing only one preset anchor for each location on the feature maps and refining these anchors with our Orientation-Aware Convolution technique, HA-RDet achieves competitive accuracies, including 75.41 mAP on DOTA-v1, 65.3 mAP on DIOR-R, and 90.2 mAP on HRSC2016, against current anchor-based state-of-the-art methods, while significantly reducing computational resources.

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

Phuc Nguyen*, Tuan Ngo*, Chuang Gan, Evangelos Kalogeraki, Anh Tran, Cuong Pham, Khoi Nguyen

IEEE/CVF Conference on Computer Vision and Pattern Recognition CVPR 2024

Tackling the open-vocabulary 3D point cloud instance segmentation by using 2D prior.

Project Paper Code

We introduce Open3DIS a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes. Objects within 3D environments exhibit diverse shapes scales and colors making precise instance-level identification a challenging task. Recent advancements in Open-Vocabulary scene understanding have made significant strides in this area by employing class-agnostic 3D instance proposal networks for object localization and learning queryable features for each 3D mask. While these methods produce high-quality instance proposals they struggle with identifying small-scale and geometrically ambiguous objects. The key idea of our method is a new module that aggregates 2D instance masks across frames and maps them to geometrically coherent point cloud regions as high-quality object proposals addressing the above limitations. These are then combined with 3D class-agnostic instance proposals to include a wide range of objects in the real world. To validate our approach we conducted experiments on three prominent datasets including ScanNet200 S3DIS and Replica demonstrating significant performance gains in segmenting objects with diverse categories over the state-of-the-art approaches.

Awards and Achievements

Compute Champion Award CVPR: Highest recognition in methodology, and reproducibility. (2026)
UMD Dean’ Fellowship: Awarded to candidates with exceptional academic records. (2025-2027)
1st Prize CVPR Workshop: VinAI-3DIS ranked top-1 in OpenSUN3D CVPR workshop. (2024)
2nd Prize ICCV Workshop: VinAI-3DIS ranked top-2 in OpenSUN3D ICCV workshop. (2023)
Best Thesis Award: Awarded to thesis with the highest grade. (2023)
3rd Prize UIT AI Challenge: The team ranked top-3 in Scene Text recognition challenge. (2023)
2nd Prize UCPC: Ranked top-2 in UIT Collegiate Programming Contest. (2022)
Expert Codeforces: Reaching Expert title on Codeforces – Competitive Programming platform. (2022)
1st Prize UIT-AlgoBootcamp: Winning Competitive Programming Competition at UIT. (2021)
Outstanding Student Scholarship: Awarded to students with the best academic performance. (2021)
Outstanding Student in Physics: Awarded to students with the highest GPA in Physics. (2020)