Small Language Models: opportunities and obstacles
Department of Electrical and Computer Engineering, University of Thessaly, Volos, Greece
Abstract

This paper investigates the challenges and opportunities of Small Language Models as efficient, privacy-aware alternatives to Large Language Models in resource-constrained and real-time environments. It elucidates the related basic concepts and combines an up-to-date comprehensive, yet compact, literature review of architectural and optimization techniques for Small Language Models with a systematic experimental evaluation of selected prototype models that integrate fine-tuning, Retrieval-Augmented Generation, and model quantization for multi-platform deployment. It essentially aims to pave the way towards practical implementations that offer measurable improvements over existing methods and can be readily adopted in applied settings.

Keywords

Small Language Models; fine-tuning; LoRA; quantization; Retrieval-Augmented Generation; on-device deployment; Gemma

Preview