SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding

In order to operate in human environments, a robot's semantic perception has to overcome open-world challenges such as novel objects and domain gaps. Autonomous deployment to such environments therefore requires robots to update their knowledge and learn without supervision. We investigate how a robot can autonomously discover novel semantic classes and improve accuracy on known classes when exploring an unknown environment. To this end, we develop a general framework for mapping and clustering that we then use to generate a self-supervised learning signal to update a semantic segmentation model. In particular, we show how clustering parameters can be optimized during deployment and that fusion of multiple observation modalities improves novel object discovery compared to prior work.

We propose an autonomous optimisation scheme for the clustering parameters.

Many existing works use kNN clustering, but the assumption that the number of clusters is known is inadequate for open-world perception. Without the number of clusters, algorithms however get often stuck in local minima of over- or under-clustering. We propose an automatic optimisation to yield clustering parameters that are aligned with the segmentation model.

starting from overclustering


starting from underclustering


Anomaly detection + mapping find unknown parts of a scene.

By integrating single-frame anomaly detection and mapping, we can reliably find the unknown parts of a scene.

unknown: TV, bin, projection screen


unknown: towel, ladder, shower cabin