IIT Madras & US University Develop AI-Powered Algorithms to Enhance 3D Effects in Phone Videos

New Delhi: IIT Madras and Northwestern University in the United States researchers have created deep learning algorithms that can significantly improve depth perception and 3D effects in videos captured with smartphone cameras.

Such algorithms, according to authorities, would prevent mobile phone photographs from becoming "flat" and will give them a true 3D sense. One significant advantage of the created algorithm is that it eliminates the requirement for expensive equipment or dozens of lenses to shoot videos with depth.

Kaushik Mitra, assistant professor, Department of Electrical Engineering, IIT Madras told PTI, "It is a common complaint, especially among amateur and professional photographers, that photographs and videos shot using smartphone cameras have a flat, two-dimensional look. Apart from the flat look, some 3D features such as the 'Bokeh Effect' – the aesthetic blurring of the background – that are easy with the DSLR camera, are challenging in smartphone cameras".

He further added, "While a few mid and high-end smartphone cameras are now programmed to incorporate such effects in still photographs, especially in portrait mode, it is not yet possible to render them in videos captured using smartphones".

IIT Madras Courses & Fees

"The LF capture is done by the use of an array of microlenses that are put between the primary lens of the camera and the camera sensor. Due to space limits, many micro lenses cannot be put on mobile phones. Rather, methods for post-processing images acquired by existing smartphone cameras are being developed.

"Artificial Intelligence and machine learning techniques are used for such image manipulation. Our team looked into this issue and has built a deep learning algorithm that converts the stereo images captured using a smartphone into LF images," Mitra said.

IIT Madras Admission 2022

He also explained, “The research has been published in the 'Proceedings of International Conference on Computer Vision (ICCV), 2021'. "The algorithm first captures two videos (called stereo pair) simultaneously using the two adjacent cameras that are present in many smartphones these days. These stereo pairs go through a sequence of steps involving deep learning models. The stereo pairs are converted into a 7X7 grid of images, mimicking a 7X7 array of cameras, thereby producing the LF image".

"A crucial advantage of the algorithm developed by our team is that it eliminates the need for fancy equipment or an array of lenses to capture videos with depth. The Bokeh and other such aesthetic 3D effects can be achieved with a smartphone that is equipped with a dual-camera system. "In addition to providing depth, our algorithm enables us to view the same video from not just one point of view but from any of the 7x7 grid of viewpoints," he further added.

Also Read: