--OE3DIS--

Open-Ended 3D Point Cloud Instance Segmentation

ICCV 2025 - OpenSUN3D

1Movian AI 2Posts & Telecommunications Inst. of Tech.
arXiv Code

Abstract

Open-Vocab 3D Instance Segmentation methods (OV-3DIS) have recently demonstrated their ability to generalize to unseen objects. However, these methods still depend on predefined class names during testing, restricting the autonomy of agents. To mitigate this constraint, we propose a novel problem termed Open-Ended 3D Instance Segmentation (OE-3DIS), which eliminates the necessity for predefined class names during testing. Moreover, we contribute a comprehensive set of strong baselines, derived from OV-3DIS approaches and leveraging 2D Multimodal Large Language Models. To assess the performance of our OE-3DIS system, we introduce a novel Open-Ended score, evaluating both the semantic and geometric quality of predicted masks and their associated class names, alongside the standard AP score. Our approach demonstrates significant performance improvements over the baselines on the ScanNet200 and ScanNet++ datasets. Remarkably, our method surpasses the performance of Open3DIS, the current state-of-the-art method in OV-3DIS, even in the absence of ground-truth object class names.



Method

Overview of our approach. First, we generate class-agnostic 2D instance segmentation masks for all views using segmenters like DETIC and SAM, and lift these 2D masks into 3D masks using Open3DIS. Simultaneously, the 2D masks and their corresponding RGB images are used to extract 2D visual tokens from an MLLM like OSM, which are then lifted into pointwise 3D visual tokens. Finally, for each 3D proposal mask, we aggregate the pointwise 3D visual tokens to form the final tokens for input to the LLM to predict final class names.

Qualitative examples are provided for ScanNet200 (first two columns) and ScanNet++ (last two columns). Both our baselines and approach yield notably good results, particularly in terms of accurately identifying class names even though they do not match exactly the GT classes.





BibTeX


@article{nguyen2024openend,
  title={Open-ended 3d point cloud instance segmentation},
  author={Nguyen, Phuc DA and Luu, Minh and Tran, Anh and Pham, Cuong and Nguyen, Khoi},
  journal={arXiv preprint arXiv:2408.11747},
  year={2024}
}