Publications - Shiu-Hong Kao

2026

CoT-RVS: Zero-Shot Chain-of-Thought Reasoning Segmentation for Videos

Shiu-Hong Kao, Yu-Wing Tai, Chi-Keung Tang

International Conference on Learning Representations (ICLR), 2026

CoT-RVS extracts temporal-semantic correlation in videos with chain of thoughts and achieves state-of-the-art performance for reasoning video segmentation.

arXiv Project Code Data

CoT-Seg: Rethinking Segmentation with Chain-of-Thought Reasoning and Self-Correction

Shiu-Hong Kao*, Chak-Ho Huang*, Huaiqian Liu*, Yu-Wing Tai, Chi-Keung Tang

ICLR 2026 Workshop on AI with Recursive Self-Improvement (RSI 2026)

*Equal contribution. CoT-Seg is a modular reasoning segmentation framework that improves segmentation masks with chain-of-thought reasoning and a self-correction loop.

arXiv Project Code Data

2025

StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams

Yang Li, Jinglu Wang, Lei Chu, Xiao Li, Shiu-Hong Kao, Yingcong Chen, Yan Lu

International Conference on Computer Vision (ICCV), 2025

StreamGS is an online generalizable 3DGS reconstruction method for unposed image streams, progressively transforming image streams to 3D Gaussian streams.

arXiv

Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts

Shiu-Hong Kao, Yu-Wing Tai, Chi-Keung Tang

Arxiv preprint, 2025

ThinkFirst is a Chain-of-Thought reasoning segmentation framework that generates accurate object masks from text prompts and handles difficult query scenarios.

arXiv Project

Beyond and Free from Diffusion: Invertible Guided Consistency Training

Chia-Hong Hsu, Shiu-Hong Kao, Randall Balestriero

Arxiv preprint, 2025

Invertible Guided Consistency Training is a data-driven framework for guided consistency models, supporting fast image generation and editing without diffusion-model distillation.

arXiv Project

UVRM: A Scalable 3D Reconstruction Model from Unposed Videos

Shiu-Hong Kao, Xiao Li, Jinglu Wang, Yang Li, Chi-Keung Tang, Yu-Wing Tai, Yan Lu

Arxiv preprint, 2025

UVRM is a 3D reconstruction model trained and evaluated on 360-degree monocular videos without requiring pose information.

arXiv Demo

2024

InceptionHuman: Controllable Prompt-to-NeRF for Photorealistic 3D Human Generation

Shiu-Hong Kao, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang

Arxiv preprint, 2024

InceptionHuman is a NeRF-based generative framework using diffusion models and flexible prompts such as text, pose, and style to generate realistic 3D humans.

arXiv

Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction

Xinhang Liu, Jiaben Chen, Shiu-Hong Kao, Yu-Wing Tai, Chi-Keung Tang

European Conference on Computer Vision (ECCV), 2024

Deceptive-NeRF/3DGS improves sparse-view reconstruction by using synthetically generated pseudo-observations to reduce artifacts and improve quality.

arXiv Project

2023

StableKD: Breaking Inter-block Optimization Entanglement for Stable Knowledge Distillation

Shiu-Hong Kao*, Jierun Chen*, S.-H. Gary Chan

Arxiv preprint, 2023

*Equal contribution. StableKD addresses Inter-block Optimization Entanglement in end-to-end knowledge distillation and improves stability, convergence, and data efficiency.

arXiv Code

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

Jierun Chen, Shiu-Hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S.-H. Gary Chan

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

This work proposes partial convolution (PConv) and FasterNet, a latency-efficient family of neural network architectures.

Paper Code