Cross-modal Generation (2024-now)
The exploration of cross-modal generation is a burgeoning field that seeks to integrate and synthesize diverse data forms into cohesive outputs. MAGNET introduces a novel method for audio-visual RAG enhancing the fluidity with which machines process multimodal information. MeLFusion (CVPR 2024) demonstrates techniques for effective multimedia synthesis, offering new avenues for creative and applied uses in technology. Meanwhile, Adverb (ICCV 2023) focuses on visually guided audio dereverberation, setting a new standard in blending visual cues with audio refinement, ensuring clarity and precision in soundscapes. Together, these works pave the way for more seamless and intuitive interactions among various sensory modalities, underscoring the importance of integration in advancing digital experiences.