" I think; therefore I am. " — René Descartes.
I work in reinforcement learning and artificial intelligence. I completed my MSc in Computing Science at the University of Alberta and was co-supervised by Adam White and Marlos Machado; affliated with RLAI Lab and Alberta Machine Intelligence Institute (Amii).
My research interests lie broadly in reinforcement learning, representation learning and continual learning. In my MSc thesis, I proposed a recurrent alternative to the transformer’s self-attention mechanism, which offers context-independent inference cost and parallelization over an input sequence. The proposed approach called the Recurrent Linear Transformer was shown to outperform state-of-the-art transformers and recurrent neural networks in partially observable reinforcement learning problems, both in terms of computational efficiency and performance. ( Thesis URL, ICLR Submitted Paper)
I have worked in several industry positions in machine learning. During my MSc, I interned at Huawei Research Edmonton, applying reinforcement learning to neural network operator fusion. Previously, I had worked with IBM Cloud as an ML Engineer (around 2 years) and collaborated with IBM Research on various research projects in representation learning and deep learning. I also helped deploy several machine learning algorithms at scale in IBM and Kone.
Contact: spramanik [at] ualberta [dot] ca, email [at] subho [dot] in
MSc in Computer Science (thesis based, Fully funded), 2021 - 2023
University of Alberta
B.Tech in Computer Science and Engineering, 2015 - 2019
Vellore Institute of Technology
In this paper, we propose a multi-task learning-based framework that utilizes a combination of self-supervised and supervised pre-training tasks to learn a generic document representation. We design the network architecture and the pre-training tasks to incorporate the multi-modal document information across text, layout, and image dimensions and allow the network to work with multi-page documents. We showcase the applicability of our pre-training framework on a variety of different real-world document tasks such as document classification, document information extraction, and document retrieval. We conduct exhaustive experiments to compare performance against different ablations of our framework and state-of-the-art baselines. We discuss the current limitations and next steps for our work.
Transformer is a popularly used neural network architecture, especially for language understanding. We introduce an extended and unified architecture that can be used for tasks involving a variety of modalities like image, text, videos, etc. We propose a spatio-temporal cache mechanism that enables learning spatial dimension of the input in addition to the hidden states corresponding to the temporal input sequence. The proposed architecture further enables a single model to support tasks with multiple input modalities as well as asynchronous multi-task learning, thus we refer to it as OmniNet. For example, a single instance of OmniNet can concurrently learn to perform the tasks of part-of-speech tagging, image captioning, visual question answering and video activity recognition. We demonstrate that training these four tasks together results in about three times compressed model while retaining the performance in comparison to training them individually. We also show that using this neural network pre-trained on some modalities assists in learning unseen tasks such as video captioning and video question answering. This illustrates the generalization capacity of the self-attention mechanism on the spatio-temporal cache present in OmniNet.
We perform text normalization, i.e. the transformation of words from the written to the spoken form, using a memory augmented neural network. With the addition of dynamic memory access and storage mechanism, we present a neural architecture that will serve as a language-agnostic text normalization system while avoiding the kind of unacceptable errors made by the LSTM-based recurrent neural networks. By successfully reducing the frequency of such mistakes, we show that this novel architecture is indeed a better alternative. Our proposed system requires significantly lesser amounts of data, training time and compute resources. Additionally, we perform data up-sampling, circumventing the data sparsity problem in some semiotic classes, to show that sufficient examples in any particular class can improve the performance of our text normalization system. Although a few occurrences of these errors still remain in certain semiotic classes, we demonstrate that memory augmented networks with meta-learning capabilities can open many doors to a superior text normalization system.
Primarily assigned as an AI/ML Developer for IBM App Connect:
Actively collaborating with IBM Research:
Intern at the IBM Watson TRIRIGA Building Insights team.
Selected amongst hundreds of competitors in Kone IBM hackathon for a two month sponsorship to Kone in Finland as a visiting researcher.