Sanjoy's webpage

Sanjoy Chowdhury

I am a second year CS PhD student at University of Maryland, College Park advised by Prof. Dinesh Manocha. I am broadly interested in multi-modal learning and its different applications. My research primarily involves studying the interplay between the vision and audio modalities and developing systems equipped with their comprehensive understanding.

I am currently working as a research scientist intern at Meta Reality Labs. Before this, I was a student researcher at Google Research with Avisek Lahiri and Vivek Kwatra in the Talking heads team on speech driven facial synthesis. Previously, I spent a wonderful summer with Adobe Research working with Joseph K J in the Multi-modal AI team as a research PhD intern on multi-modal audio generation. I am also fortunate to have had the chance to work with Prof. Kristen Grauman , Prof. Mohamed Elhoseiny and Ruohan Gao among other wonderful mentors and collaborators.

Before joining for PhD, I was working as a Machine Learning Scientist with the Camera and Video AI team at ShareChat, India. I was also a visiting researcher at the Computer Vision and Pattern Recognition Unit at Indian Statistical Institute Kolkata under Prof. Ujjwal Bhattacharya. Even before, I was a Senior Research Engineer with the Vision Intelligence Group at Samsung R&D Institute Bangalore. I primarily worked on developing novel AI-powered solutions for different smart devices of Samsung.

I received my MTech in Computer Science & Engineering from IIIT Hyderabad where I was fortunate to be advised by Prof. C V Jawahar. During my undergrad, I worked as a research intern under Prof. Pabitra Mitra at IIT Kharagpur and the CVPR Unit at ISI Kolkata.

Updates

July 2024 - Work on Audio-Visual LLM got accepted to ECCV 2024
June 2024 - Invited talk at the Sight and Sound workshop at CVPR 2024
May 2024 - Joined Meta Reality Labs as a Research Scientist intern.
May 2024 - Paper on Improving Robustness Against Spurious Correlations got accepted to ACL 2024 Findings
May 2024 - Our paper on determining perceived audience intent from multi-modal social media posts got accepted to Nature Scientific Reports
Mar 2024 - Paper on LLM guided navigational instruction generation got accepted to NAACL 2024
Feb 2024 - MeLFusion ( Highlight, Top 2.8% ) got accepted to CVPR 2024
Feb 2024 - Joined Google Research as a student researcher.
Oct 2023 - APoLLo gets accepted to EMNLP 2023
Oct 2023 - Invited talk on AdVerb at AV4D Workshop, ICCV 2023
July 2023 - AdVerb got accepted to ICCV 2023
May 2023 - Joined Adobe Research as a research intern.
Aug 2022 - Joined as a CS PhD student at University of Maryland College Park . Awarded Dean's fellowship.
Oct 2021 - Paper on audio-visual summarization accepted in BMVC 2021.
Sep 2021 - Blog on Video Quality Enhancement released at Tech @ ShareChat.
July 2021 - Paper on reflection removal got accepted in ICCV 2021.
June 2021 - Joined ShareChat Data Science team.
May 2021 - Paper on audio-visual joint segmentation accepted in ICIP 2021.
Dec 2018 - Accepted Samsung Research offer. Will be joining in June'19.
Sep 2018 - Received Dean's Merit List Award for academic excellence at IIIT Hyderabad.
Oct 2017 - Our work on a multi-scale, low-latency face detection framework received Best Paper Award at NGCT-2017.

Selected publications

I am interested in solving computer vision, computer audition, and machine learning problems and applying them to broad AI applications. My research focuses on applying multi-modal learning (Vision + X) for generative modeling and holistic cross-modal understanding with minimal supervision. Representative papers are highlighted.