Abstract: In video-based emotion recognition, effective multi-modal fusion techniques are essential to leverage the complementary relationship between audio and visual modalities. Recent ...
Abstract: Audio–visual event localization (AVEL) aims to recognize events in videos by associating audio–visual information. However, events involved in existing AVEL tasks are usually coarse-grained ...