Running LLM's locally with Ollama

Posted by

Introduction

Ollama allows you to run open-source large language models (LLM's) like Llama 2 and Mistral, locally. It's an LLM serving platform written in GoLang.

It's currently only available for OSX and Linux.

Simply download Ollama from ollama.ai/download

Or run:

curl https://ollama.ai/install.sh | sh

Once installed, pick your LLM of choice from this list. For example, lets setup llama2:

ollama run llama2

This will download the most basic version of the model and runs a REPL where you can converse with the downloaded LLM.

Ollama has a REST API for running and managing models.

ollama serve

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
}'

curl http://localhost:11434/api/chat -d '{
  "model": "mistral",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Review the API documentation for all endpoints.

💡