Low-complexity reinforcement learning decoders for autonomous, scalable, neuromorphic intra-cortical brain machine interfaces
1 Department of Electronics and Telecommunication Engineering, Indian Institute of Engineering Science and Technology, Shibpur, India
2 Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois 61801, USA
3 School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore
4 Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR, China
5 Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A∗STAR), Singapore 138673, Singapore
6 Department of Psychology, National University of Singapore, Singapore 117570, Singapore
7 Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A∗STAR), Singapore 138632, Singapore
8 Department of Biomedical Engineering, National University of Singapore, Singapore 117583, Singapore
Abstract

Intra-cortical brain machine interfaces (iBMIs) with wireless capability could scale the number of recording channels by integrating an intention decoder to reduce data rates. However, the need for frequent retraining due to neural signal non-stationarity is a big impediment. This paper presents an alternate neuromorphic paradigm of online reinforcement learning (RL) with a binary evaluative feedback in iBMIs to tackle this issue. This paradigm eliminates time-consuming calibration procedures. Instead, it relies on updating the model on a sequential sample-by-sample basis based on an instantaneous evaluative binary feedback signal. Such online learning is a hallmark of neuromorphic systems and is different from batch updates of weight in popular deep networks that is very resource consuming and incompatible with constraints of an implant. In this work, using open-loop analysis on pre-recorded data, we show application of a simple RL algorithm—Banditron in discrete-state iBMIs and compare it against previously reported state of the art RL algorithms—Hebbian RL (HRL), Attention Gated RL (AGREL), deep Q-learning. Owing to its simplistic single-layer architecture, Banditron is estimated to be at least two orders of magnitude of reduction in power dissipation compared to state of the art RL algorithms. At the same time, offline analysis performed on four pre-recorded experimental datasets procured from the motor cortex of two non-human primates performing joystick-based movement-related tasks indicate Banditron performing significantly better than state of the art RL algorithms by at least ∼5%, 10%, 7% and 7% in experiments 1, 2, 3 and 4 respectively. Furthermore, we propose a non-linear variant of Banditron—“Banditron-RP”, which gives an average improvement of 6% and 2% in decoding accuracy in experiments 2 and 4 respectively with only a moderate increase in computations (and concomitantly power consumption).

Keywords

brain-machine interface; neuromorphic; reinforcement learning; hardware-friendly

Preview