Useful Links: Github / Blog / LinkedIn / Google Scholar
I'm currently a Master's Student at New York University studying Informatics. I am also a Research Assistant jointly affiliated to RiskEcon Lab for Decision Metrics & Agile Robotics & Perception Lab. Previously, I was a researcher at IIT, Madras & IIT, Bombay.
Master of Science in Informatics
Bachelor of Engineering in Computer Science
We propose a large scale Isolated Indian Sign Language Recognition dataset and we evaluate several deep neural networks combining different methods for augmentation, feature extraction, encoding and decoding. We also identify a novel xgboost model which achieves competitive performance to deep neural networks.
We evaluate different neural networks for task assignment based on efficiency and effectiveness. We also compare a global and local planning algorithm and propose a modificaiton to PotentialField planner to adaptively scale the velocity of the drone to perform collision avoidance.
- Developed a 2D and 3D simulation for testing path planning and task assignment algorithms for autonomous drone swarms.
- Reduced mapping coverage time by 45% by using transformers and Graph Neural Nets as policy networks.
- Performed Asynchronous multi-processing training of policy network with Actor-Critic algorithm.
- Utilized Wavefront, PotentialField & Velocity Obstacle method to perform motion planning and obstacle avoidance in 2D and 3D.
- Improved the accuracy of object recognition models for drone swarms by 10% by sharing sparsely encoded multi-view information.
- Increased the spectral & spatial resolution of satellite images by 2% using CycleGAN & Pix2Pix with custom encoder models.
- Primary designer & developer for building a deep learning pipeline to convert Indian Sign Language videos to words.
- Created a large scale Indian Sign Language dataset of size 55GB consisting of high resolution videos with 264 classes and released it publicly.
- Evaluated several deep neural networks combining different methods for augmentation, feature extraction, encoding, and decoding.
- Built a pipeline that uses pose estimation model, CNN video feature encoders and bidirectional LSTMs to classify signs.
- Based on the rigorous experiments, we observed that the combination of OpenPose as pose estimator, MobileNet as our video feature extractor & Bidirectional LSTM as our sequence model works best.
- Achieved state-of-the-art accuracy of 92.1% on the American Sign Language (ASLLVD) dataset for the architecture.
- Increased throughput of the model by 15% by performing post-training quantization and pruning.
- Developed an interactive OCR framework for low-resource languages including Sanskrit, Hindi and Gujarati.
- Built a cross-platform GUI desktop application in C++ language using Qt Creator that converts documents into editable format.
- Reduced OCR conversion errors by 5\% by using LSTMs, n-gram based edit distance methods \& updating LSTMs on the fly.
- Created adversarial examples for the sentiment classification task by perturbing the input words based on attention.
- Reduced training time from 12 hrs to 3 hrs by utilizing distributed data parallelism.
- Improved adversarial accuracy from 13% to 66% on selected GLUE and SuperGLUE tasks by performing adversarial MLM pre-training.
- Performed feature selection for housing price prediction by performing Data Wrangling and Exploratory Data Analysis.
- Reduced processing time for data pipelines by 1.5 times using Dask and PySpark.
- Created a dashboard for visualization of the features that influence the price of a house for each zipcode in NYC boroughs.
- Built Linear Regression, Decision Tree and Ensemble models to accurately predict the price of a house in NYC boroughs.
- Utilized building meta-data and weather data to predict a building's water, electricity and gas meter readings.
- Performed data cleaning and exploratory data analysis to identify outliers, impute missing data and identify correlations in data.
- Improved model predictive power by performing feature engineering and used LightGBM model to train on the data.
- Utilized cross-validation to train and evaluate the model and visualized the results by performing PCA on the data.
- SimpleGAN is a python framework built on top of TensorFlow that aims to facilitate the training of AutoEncoders and GANs by provding high-level APIs
- Primary designer & developer for SimpleGAN, a framework built using Tensorflow that aims to facilitate the training of Autoencoders & GANs.
- The open-source project achieved over 5000 downloads.
- Featured in the HacktoberFest of Made with ML.
- Gathered the Latitude and Longitude values of objects present at the intersection of Jay St and Myrtle Ave and modelled them in ArcMap
- Studied the movement of people crossing Jay St and modelled their interaction with the objects in the environment
- Studied how obtrusion in footpath affects the mood of people by using Convolutional Neural Network to identify their emotions and used Moran's I statistic to measure its auto-correlation
- Built a personal voice assistant through a automatic speech recognition model for automating tasks on a desktop.
- Created a custom dataset and built a CNN to classify audio spectrograms into tasks and used shell scripts to automate the task.
- Performed model compression using Tensorflow-Lite and deployed it on the browser using TensorflowJS.
- A real-time artist identification model that could identify the artist of a song
- The recorded audio sample is denoised and a short-time fourier transform (STFT) is applied on the signal and converted to mel-scale.
- A Convolutional Neural Network is used as a feature extractor to create a fingerprint of the audio sample which is used to match it to samples in a database within the L2 Norm space.