Abstract: Audio–visual event localization (AVEL) aims to recognize events in videos by associating audio–visual information. However, events involved in existing AVEL tasks are usually coarse-grained ...
Abstract: Currently, audio-visual speech separation methods utilize the speaker's audio and visual correlation information to help separate the speech of the target speaker. However, these methods ...