Utility of low-parameter count, quantized large language models and visual language models in semantic navigation

Arturs Elksnis; Ziheng  Xue; Martin Pearson; Ning Wang

doi:10.55092/rl20250004

Article

Open Access

Cite

Expand

Utility of low-parameter count, quantized large language models and visual language models in semantic navigation

download PDF

Arturs Elksnis¹, Ziheng Xue¹, Martin Pearson¹, Ning Wang²^,∗

¹ Bristol Robotics Laboratory, University of the West of England and University of Bristol, Bristol, UK

² School of Computing and Digital Technologies, Sheffield Hallam University, Sheffield, UK

* ning.wang@shu.ac.uk

Volume
Volume 2 Issue 1, 2025
Citation
Elksnis A, Xue Z, Pearson M, Wang N. Utility of low-parameter count, quantized large language models and visual language models in semantic navigation. Robot Learn. 2025(1):0004, https://doi.org/10.55092/rl20250004.
DOI
10.55092/rl20250004
Copyright
Copyright2025 by the authors. Published by ELSP.

Abstract

The use of Large Language Models (LLMs) in Semantic Navigation (SN) is a novel approach and has so far centered around high-parameter count cloud-hosted LLMs. In this study we explore the capabilities of low parameter count ( 8.54 × 109 or less) quantized local LLMs that can be hosted on a consumer grade laptop. We demonstrate the use of such LLMs in extracting semantic knowledge from the environment which is then used, in a simulated environment, to create an executable path to lead the agent to a goal set by a human operator. Toward this, we explore two main tasks for which LLM can be employed: room classification and goal selection, and compare the performance of several LLMs and an appropriately constructed Support Vector Classifier (SVC) for this purpose. We then present a simple framework for evaluating Visual Language Models (VLMs) and LLMs in SN. We compare 4 LLMs-Gemma, Llama3 and two quantizations of Mistral. Among the LLMs we tested, Llama3 is the most accurate room classifier whereas goal selection performance varies between rooms and different LLMs. Finally, we evaluated 3 Visual Language Models (VLMs)- 4-bit quantized form of Chameleon, un-quantized Chameleon and un-quantized Moondream2. We found that they are best used for hard-to-classify scenes where a list of objects cannot be extracted for classification with an LLM. Our combined approach takes advantage of the best aspects of both types of model. The source code and test data for this project are available at: https://github.com/archie1983/cvm_semantic_navigation.

Keywords

semantic navigation; semantic reasoning; large language model; quantized LLM; visual language model; Gemma; LLama3; Mistral; Chameleon; Moondream2

Preview

view pdf