Denso Encoders - Search News

Encoder–Decoder Calibration for Multimodal Machine Translation

Abstract: The main purpose of multimodal machine translation (MMT) is to improve the quality of translation results by taking the corresponding visual context as an additional input. Recently many ...

IEEE

Do Vision and Language Encoders Represent the World Similarly?

Abstract: Aligned text-image encoders such as CLIP have become the de-facto model for vision-language tasks. Further-more, modality-specific encoders achieve impressive per-formances in their ...

GitHub

VideoPrism: A Foundational Visual Encoder for Video Understanding

VideoPrism is a general-purpose video encoder designed to handle a wide spectrum of video understanding tasks, including classification, retrieval, localization, captioning, and question answering. It ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Encoder–Decoder Calibration for Multimodal Machine Translation

Do Vision and Language Encoders Represent the World Similarly?

VideoPrism: A Foundational Visual Encoder for Video Understanding

Trending now