I am a third year CS PhD student at University of Maryland, College Park advised by Prof. Dinesh Manocha. I am broadly interested in multi-modal learning and its different applications. My research primarily involves studying the interplay between the vision and audio modalities and developing systems equipped with their comprehensive understanding.

I am currently working as an ML Research intern at Apple MLR hosted by Chun-Liang Li and Karren Yang . I spent the summer of '24 at Meta Reality Labs working as a research scientist intern hosted by Ruohan Gao . Before this, I was a student researcher at Google Research with Avisek Lahiri and Vivek Kwatra in the Talking heads team on speech driven facial synthesis. Previously, I spent a wonderful summer with Adobe Research working with Joseph K J in the Multi-modal AI team as a research PhD intern on multi-modal audio generation. I am also fortunate to have had the chance to work with Prof. Kristen Grauman , Prof. Salman Khan , Prof. Mohamed Elhoseiny among other wonderful mentors and collaborators.

Before joining for PhD, I was working as a Machine Learning Scientist with the Camera and Video AI team at ShareChat, India. I was also a visiting researcher at the Computer Vision and Pattern Recognition Unit at Indian Statistical Institute Kolkata under Prof. Ujjwal Bhattacharya. Even before, I was a Senior Research Engineer with the Vision Intelligence Group at Samsung R&D Institute Bangalore. I primarily worked on developing novel AI-powered solutions for different smart devices of Samsung.

I received my MTech in Computer Science & Engineering from IIIT Hyderabad where I was fortunate to be advised by Prof. C V Jawahar. During my undergrad, I worked as a research intern under Prof. Pabitra Mitra at IIT Kharagpur and the CVPR Unit at ISI Kolkata.

Feel free to contact me if you're interested in research collaboration!

Email  /  GitHub  /  Google Scholar  /  LinkedIn  /  Twitter

profile photo

Sanjoy's Research Garden
X

Updates

  • Mar 2025 - Joined Apple MLR as a ML Research intern. project image
  • Feb 2025 - Invited talk at NYC Computer Vision Day 2025 organised by New York University.
  • Oct 2024 - Invited talk on assessing and addressing the gaps in existing Audio-Visual LLMs at AIR lab at University of Rochester
  • July 2024 - Work on Audio-Visual LLM got accepted to ECCV 2024 project image
  • June 2024 - Invited talk at the Sight and Sound workshop at CVPR 2024
  • May 2024 - Joined Meta Reality Labs as a Research Scientist intern.
  • May 2024 - Paper on Improving Robustness Against Spurious Correlations got accepted to ACL 2024 Findings
  • May 2024 - Our paper on determining perceived audience intent from multi-modal social media posts got accepted to Nature Scientific Reports
  • Mar 2024 - Paper on LLM guided navigational instruction generation got accepted to NAACL 2024
  • Feb 2024 - MeLFusion ( Highlight, Top 2.8% ) got accepted to CVPR 2024
  • Feb 2024 - Joined Google Research as a student researcher.
  • Oct 2023 - APoLLo gets accepted to EMNLP 2023
  • Oct 2023 - Invited talk on AdVerb at AV4D Workshop, ICCV 2023
  • July 2023 - AdVerb got accepted to ICCV 2023
  • May 2023 - Joined Adobe Research as a research intern.
  • Aug 2022 - Joined as a CS PhD student at University of Maryland College Park . Awarded Dean's fellowship.
  • Oct 2021 - Paper on audio-visual summarization accepted in BMVC 2021.
  • Sep 2021 - Blog on Video Quality Enhancement released at Tech @ ShareChat.
  • July 2021 - Paper on reflection removal got accepted in ICCV 2021.
  • June 2021 - Joined ShareChat Data Science team.
  • May 2021 - Paper on audio-visual joint segmentation accepted in ICIP 2021.
  • Dec 2018 - Accepted Samsung Research offer. Will be joining in June'19.
  • Sep 2018 - Received Dean's Merit List Award for academic excellence at IIIT Hyderabad.
  • Oct 2017 - Our work on a multi-scale, low-latency face detection framework received Best Paper Award at NGCT-2017.






Blog(s)

Have tried my hand at writing technical blogs.

project image

The devil is in the details: Video Quality Enhancement Approaches


Link

The blog contextualizes the problem of video enhancement in present-day scenarios and talks about a couple of interesting approaches to handle this challenging task.

Academic services

I have served as a reviewer for the following venues:

CVPR, ICCV, ECCV, NeurIPS, AAAI, ICLR, WACV, ACL, ACMMM




Affiliations




IIT Kharagpur
Apr-Sep 2016

ISI Kolkata
Feb-July 2017

IIIT Hyderabad
Aug 2017 - May 2019

Mentor Graphics Hyderabad
May - July 2018

Samsung Research Bangalore
June 2019 - June 2021

ShareChat Bangalore
June 2021 - May 2022

UMD College Park
Aug 2022 - Present

Adobe Research
May 2023 - Aug 2023

KAUST
Jan 2024 - Present

Google Research
Feb 2024 - May 2024

Meta AI
May 2024 - Nov 2024

Apple MLR
Mar 2025 - Aug 2025

Template credits: Jon Barron, Philippe Laban and thanks to Richa for making this.