Romit Barua

Senior Computer Vision/ML Engineer

What was your path to joining GetReal Labs? Did anything surprise you along the way?

A: I began working with Dr. Farid during my master’s program at UC Berkeley in 2023 while working on a paper focused on audio deepfake detection. At the time, audio deepfakes were relatively new and just getting traction. In fact, Hany initially advised not to focus on it, only to reverse course when ElevenLabs was released. Close to graduation, Hany told us about an opportunity to join the smart and experienced team at GetReal, and the rest was history. I continue to be most surprised by the speed of innovation in the field of generation. In such a short time, we have seen audio deepfake generation go from “barely human” to high-quality real-time deepfake speech that can fool the public. 

What are you currently working on, and why is it important?

Currently, I am focused on deepfake audio detection with explainability. As audio generators improve in speed and quality, the risk of such technologies being leveraged in cases of fraud, identity theft and the dissemination of misinformation drastically increases. Explainability is critical for trust in the systems that we deploy. It is important for both us as researchers and the customer. Explainability allows us to understand the strengths and weaknesses of the various methods and provides a good pathway to improve performance. 

What industries or real-world problems could your research help solve?

For decades, communication via technology has been critical for business operations. In finance alone, wire transfers, low-liquidity instrument trades and customer account access use traditional phone calls. In recent years, especially since the pandemic, many other industries have leveraged the power of Zoom and Microsoft Teams to conduct company meetings and candidate interviews. We currently live in a world where we cannot confirm that the person on the other side of a phone call or video call is who they say they are, potentially bringining business operations to a halt and resulting in significant financial losses. The research and development we are conducting provides a mechanism by which companies can have confidence in knowing who they are talking to and conducting business with.

What’s the biggest lesson you’ve learned as a researcher?

A lesson that all researchers learn early on is that attention to detail is everything. Especially in the field of digital forensics, it is critical to fully understand how media is generated, processed and stored. This is particularly important in today’s world of computational photography and social media. Each platform has its own processing pipeline, each leaving different traces & artifacts that we need to understand.

What do you do outside of research? Any surprising hobbies or passions?

I spend a lot of time listening to and playing music. I played piano for many years growing up and joined my dad’s Indian band as a bassist when I was 12. Since then, it has continued to be a passion and a big part of my life.

What trends in your field excite you the most?

I am most excited about the focus on explainability. There are very few ways to understand why many of the common models, especially neural network-based models, are making the decisions that they are. This is a growing concern as models like ChatGPT or Claude become more and more deeply ingrained in our everyday lives. Over the past decade, work such as LIME and integrated gradients provided us with more insight into how the model makes predictions, but there is still a lot of focus and work to be done in this area.