- Published on
Ollama - Practical Use of Local AI Models Exemplified by the OMNIMES System
- Authors
- Name
- Martin Szerment
Let's start from the beginning: what is the Ollama project? It is open, free software for running local AI models, specifically large language models (LLMs), on your own computer or server with relatively low hardware requirements. This software, along with one of the models, Llama3, originates from META (formerly Facebook).
Currently, the software allows medium and small businesses to use 93 available AI models with few or no restrictions.
The software is available on all platforms, including Windows/Linux/MacOS, and is also available on Docker.
Launching
Launching the software is incredibly simple. Below, I'll briefly summarize the process using a Windows example. For detailed installation instructions, please refer to the project's website: https://ollama.com
From the project website, simply download the installer. One click and the software is installed. After installation, type in your browser:
localhost:11434
You should see that the server has started.
The available models along with their descriptions are on the project's website https://ollama.com/library
After installation, simply download a model by typing the following in cmd, for example:
ollama pull llama3
And from now on, we have a ready server with the Llama3 model. There are several ways to communicate with the model. The simplest one, if we want to check the model's operation, is through cmd. Simply enter:
ollama run llama3
After which we will have a ready communication interface.
We enter a question and wait for the answer.
Of course, not everyone will prefer this type of communication interface. Another way is through a web interface, but this requires Docker to be installed. If we have Docker installed, then from the website:
https://github.com/open-webui/open-webui
We can download the version that interests us. I choose the standard version, and all I need to do is paste the following into cmd:
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
We open a web browser and type in:
localhost:3000
and the interface very well known from OpenAI will appear before your eyes:
From now on, we can fully utilize our local model through the same interface as ChatGPT.
For configuring this interface and complete server installation of Ollama, I invite you to watch this video:
https://www.youtube.com/watchv=Wjrdr0NU4Sk
We've covered a brief overview of the software and its installation. In another post, I'll present a complete installation of the Ollama server as a remote AI server, along with a comparison of the best available models.
Technical Requirements
Of course, it's worth noting that the Ollama server also offers an API that we can use for our own applications, eliminating the need to install a web interface.
Another interesting feature is the ability to add custom models outside of the Ollama library. This means we can create our own AI model and add it to the server in a format compatible with Ollama.
As for technical requirements - everything depends on the model used, the number of queries to the model, and the size of the AI model itself.
For personal use or operating as a small AI, a single graphics card is sufficient - this has been tested on a server with an RTX 3070 Ti (350W) and on an RTX 4060 laptop version (140W), with the Llama3 AI model.
The performance is surprisingly good, with responses taking between 5 to 10 seconds, which is very good for a local AI model.
The quality of responses is also high, especially in terms of logical syntax. A more detailed comparison of models will be covered in another post. For now, let's focus on the Llama3 model - an AI model endorsed by META (formerly Facebook).
Comparison with Other Models
Arguably the best comparison showcasing the capabilities of the Llama3 model would be its comparison with models like GPT-2, GPT-3.5, GPT-4, as well as with Google's AI model Gemma and Apache's Mistral. Such a comparison can be made by benchmarking these models.
Significance of Benchmarks:
- MMLU is an indicator of overall performance and versatility of the language model across a wide range of language tasks.
- GPQA measures general knowledge and the ability to answer questions.
- HumanEval assesses the ability to generate correct programming code.
- GSM-8K evaluates skills in solving basic mathematical problems.
- MATH tests more advanced mathematical abilities.
The most significant result comes from the MMLU test, where we see that the Llama3 model outperforms GPT-3.5 by 8.5 units.
As a result, the performance of using this model as our local AI reflects a positive experience in its usage.
Now that we have described the Ollama software and the Llama3 model, including a partial comparison with other models, let's proceed to demonstrate its functionality in action using the production execution system example - Ominmes.
Utilizing Ollama with Omnimes Example
Using the Ollama server API, we have implemented the use of local AI within the Omnimes system. Its operation is based on straightforward data setup similar to using the OpenAI API, as discussed in my earlier post linked below:
Currently, within the Omnimes system, we have the option to choose whether to use external AI, specifically the API from OpenAI, or internal AI, which is Ollama.
Once the AI type is selected, in the case of Ollama AI, we can choose the specific AI model.
The number of available models depends on the models downloaded onto the Ollama server. Additionally, we have the option to add our own model, ensuring it is saved in a format compatible with Ollama software.
Below is an overview of data analysis from the summary located on the main panel of the Omnimes system.
Analysis result by GPT-3.5
Analysis result by Llama3
As we can see, the logical syntax of the local Llama3 model is very good and comparable to GPT-3.5. It's worth noting that the Llama3 model itself weighs only 4.5 GB, whereas the GPT-2 XL model weighs 6 GB. The weight of the GPT-3.5 model has not been publicly disclosed.
Summary
As we can see, using local AI provided by Ollama software does not lag behind using external Open AI servers through their API in any way.
This gives us significant capabilities, albeit hardware and therefore financial constraints apply, as the performance of the AI depends solely on the hardware we use. The Llama3 model also offers a larger version, Llama3:70b, but this model weighs 40 GB, requiring more than a single graphics card to operate.
However, utilizing Ollama for enterprise needs in systems like Omnimes, which focuses on production execution, is a good approach and addresses the strict security policies of the enterprise, such as inability to connect production data systems to external servers or even lack of external internet in the production facility.
Additionally, we benefit from an AI model that is less susceptible to external world misinformation, thereby reducing the risk of hallucinations or false outputs.
If you are interested in our solution for your machinery park or production floor to enhance production processes using modern data collection and analysis methods, I invite you to explore our system's offerings Omnimes
← Back to the blog