The use of Large Language Models (LLMs) in Semantic Navigation (SN) is a novel approach and has so far centered around high-parameter count cloud-hosted LLMs. In this study we explore the capabilities of low parameter count ( 8.54 × 109 or less) quantized local LLMs that can be hosted on a consumer grade laptop. We demonstrate the use of such LLMs in extracting semantic knowledge from the environment which is then used, in a simulated environment, to create an executable path to lead the agent to a goal set by a human operator. Toward this, we explore two main tasks for which LLM can be employed: room classification and goal selection, and compare the performance of several LLMs and an appropriately constructed Support Vector Classifier (SVC) for this purpose. We then present a simple framework for evaluating Visual Language Models (VLMs) and LLMs in SN. We compare 4 LLMs-Gemma, Llama3 and two quantizations of Mistral. Among the LLMs we tested, Llama3 is the most accurate room classifier whereas goal selection performance varies between rooms and different LLMs. Finally, we evaluated 3 Visual Language Models (VLMs)- 4-bit quantized form of Chameleon, un-quantized Chameleon and un-quantized Moondream2. We found that they are best used for hard-to-classify scenes where a list of objects cannot be extracted for classification with an LLM. Our combined approach takes advantage of the best aspects of both types of model. The source code and test data for this project are available at: https://github.com/archie1983/cvm_semantic_navigation.
semantic navigation; semantic reasoning; large language model; quantized LLM; visual language model; Gemma; LLama3; Mistral; Chameleon; Moondream2