Sanjoy Chowdhury

I am a second year CS PhD student at University of Maryland, College Park advised by Prof. Dinesh Manocha. Prior to this, I was working as a Machine Learning Scientist with the Camera and Video AI team at ShareChat, India. I was also a visiting researcher at the Computer Vision and Pattern Recognition Unit at Indian Statistical Institute Kolkata under Prof. Ujjwal Bhattacharya.

Previously, I was a Senior Research Engineer with the Vision Intelligence Group at Samsung R&D Institute Bangalore. I primarily worked on developing novel AI-powered solutions for different smart devices of Samsung.

I received my MTech in Computer Science & Engineering from IIIT Hyderabad where I was fortunate to be advised by Prof. C V Jawahar. During my undergrad, I worked as a research intern under Prof. Pabitra Mitra at IIT Kharagpur and at the CVPR Unit at ISI Kolkata under Prof. Ujjwal Bhattacharya.

Email  /  GitHub  /  Google Scholar  /  LinkedIn  /  Twitter

profile photo

                   Iribe #5116, 8125 Paint Branch Dr
                          College Park, MD 20742


[Oct 2023] APoLLo gets accepted to EMNLP 2023
[Oct 2023] Invited talk on AdVerb at AV4D Workshop, ICCV 2023
[July 2023] AdVerb got accepted to ICCV 2023
[May 2023] Joined Adobe Research as a research intern.
[Aug 2022] Joined as a CS PhD student at University of Maryland College Park . Awarded Dean's fellowship.
[Oct 2021] Paper on audio-visual summarization accepted in BMVC 2021.
[Sep 2021] Blog on Video Quality Enhancement released at Tech @ ShareChat.
[July 2021] Paper on reflection removal got accepted in ICCV 2021.
[June 2021] Joined ShareChat Data Science team.
[May 2021] Paper on audio-visual joint segmentation accepted in ICIP 2021.
[Dec 2018] Accepted Samsung Research offer. Will be joining in June'19.
[Sep 2018] Received Dean's merit list award for academic excellence at IIIT Hyderabad.
[Oct 2017] Our work on a multi-scale, low-latency face detection framework received Best Paper Award at NGCT-2017.

Selected publications

My research is at the intersection of Computer vision, deep learning with a focus on multi-modal learning (Vision + X), generative modeling, visual understanding, and their various applications. I'm broadly interested in studying the interplay between different modalities with minimal supervision.

project image

APoLLo project image: Unified Adapter and Prompt Learning for Vision Language Models

Sanjoy Chowdhury*, Sayan Nag*, Dinesh Manocha
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Paper / Project Page (Coming soon)

Our method is designed to substantially improve the generalization capabilities of VLP models when they are fine-tuned in a few-shot setting. We introduce trainable cross-attention-based adapter layers in conjunction with vision and language encoders to strengthen the alignment between the two modalities.

project image

AdVerb: Visually Guided Audio Dereverberation

Sanjoy Chowdhury*, Sreyan Ghosh*, Subhrajyoti Dasgupta, Anton Ratnarajah, Utkarsh Tyagi, Dinesh Manocha
International Conference on Computer Vision (ICCV), 2023
Paper / Project Page / Video / Poster / Code

We present a novel audio-visual dereverberation framework that uses visual cues in addition to the reverberant sound to estimate clean audio.

project image

Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation

Jiaye Wu, Sanjoy Chowdhury, Hariharmano Shanmugaraja, David Jacobs, Soumyadip Sengupta
International Conference on Computational Photography (ICCP), 2023
Paper / Project Page / Dataset (coming soon)

In order to comprehensively evaluate albedo, we collect a new dataset, Measured Albedo in the Wild (MAW), and propose three new metrics that complement WHDR

project image

AudViSum: Self-Supervised Deep Reinforcement Learning for Diverse Audio-Visual Summary Generation

Sanjoy Chowdhury*, Aditya P. Patra*, Subhrajyoti Dasgupta, Ujjwal Bhattacharya
British Machine Vision Conference (BMVC), 2021
Paper / Code / Presentation

Introduced a novel deep reinforcement learning-based self-supervised audio-visual summarization model that leverages both audio and visual information to generate diverse yet semantically meaningful summaries.

project image

V-DESIRR: Very Fast Deep Embedded Single Image Reflection Removal

B H Pawan Prasad, Green Rosh K S, Lokesh R B, Kaushik Mitra, Sanjoy Chowdhury
International Conference on Computer Vision (ICCV), 2021
Paper / Code

We have proposed a multi-scale end-to-end architecture for detecting and removing weak, medium, and strong reflections from naturally occurring images.

project image

Listen to the Pixels

Sanjoy Chowdhury, Subhrajyoti Dasgupta, Sudip Das, Ujjwal Bhattacharya
International Conference on Image Processing (ICIP), 2021
Paper / Code / Presentation

In this study, we exploited the concurrency between audio and visual modalities in an attempt to solve the joint audio-visual segmentation problem in a self-supervised manner.


Have tried my hand at writing technical blogs.

project image

The devil is in the details: Video Quality Enhancement Approaches


The blog contextualizes the problem of video enhancement in present-day scenarios and talks about a couple of interesting approaches to handle this challenging task.

Academic services

I have served as a reviewer for the following conferences:

CVPR: 2023

ICCV: 2023

WACV: 2022, 2023, 2024

ACMMM: 2023


IIT Kharagpur
Apr-Sep 2016

ISI Kolkata
Feb-July 2017

IIIT Hyderabad
Aug 2017 - May 2019

Mentor Graphics Hyderabad
May - July 2018

Samsung Research Bangalore
June 2019 - June 2021

ShareChat Bangalore
June 2021 - May 2022

UMD College Park
Aug 2022 - Present

Adobe Research
May 2023 - Aug 2023

Template credits: Jon Barron and thanks to Richa for making this.