Experimenting with GenAI, LLMs & RAG - Part 1: Building a RAG Application

18 Oct, 2025

Introduction

After experimenting with Google GenAI/Vertex and having a frustrating experience, I wanted to try to tackle something more simple and hopefully something useful & practical. My previous employer, Ocado Group, is publicly seeking to reach a "cashflow positive" situation and this made me curious:

What exactly is the companies financial situation? How has the level of debt fluctuated? Is the level of debt versus turnover significantly different from other FTSE 250 companies?
And so, I wondered if it would be possible to build something that could glean this information from a sequence of company accounts - which in turn might lead to the ability to compare analysis across similar companies?
I also wondered - if having established this for FTSE 250 companies, could the approach then be applied to say UK Universities (which are businesses in reality) or maybe NHS trusts?

Leaving the obvious scope creep behind and returning to the idea of something simple - 😸 - first set of questions - could one get an LLM to give an accurate analysis of a company account, and explain why it gave that analysis?

Working With and Around LLMs and their Limitations:

So LLM's have three significant issues:

Accuracy: They hallucinate i.e. make stuff up. Yes, latest and most expensive models do this less, but, there's always that risk. I would argue that's part of their inherit make-up.
File/token limit: They have file upload limits (unless you pay a small fortune) e.g. 100–200 file or 512 MB per file limit (as of mid-2025)
Provenance: When they give an answer, it's not always clear what they base their answers on (although Google's Gemini seems reasonable at this).

Is it possible to work around these limitations? If it is, is the effort required worth it - i.e. do you have to expend such effort (and money) to constrain and focus LLMs that for whatever reason - e.g. financial or risk - that it's not worth it?

Which Takes Us to "Retrieval-Augmented Generation (RAG)":

RAG, in essence is:

Upload your set of documents to a "vector database" such as https://www.pinecone.io/
Fire your query at the vector database, and get a subset of your documents from the vector database in the form of matching "chunks of content" from the full set of documents in the vector database.
Craft a query to an LLM that:
- Includes all the "chunks of content" from the vector database relevant to your query.
- Gives strict instructions to the LLM to just use those "chunks of content" and nothing else and to not hallucinate.
- And instructs the LLM to give an answer that explicitly references the chunks of content.

Thus, hopefully:

Accuracy: The LLM doesn't hallucinate: it's been told to use only specific content and not to hallucinate.
File/token limit: By uploading a subset you hopefully work around the file limitations.
Provenance: By giving specific chunks of content tagged with helpful meta-data, it's hopefully possible to get clear provenance from the LLM about what it's based it's answers on.

If you'd like a more detailed explanation of "Retrieval-Augmented Generation (RAG)", checkout this from Pinecone (a provider of an online vector database): Pinecone blog explaining RAG in detail

Small Tangent: What is a Vector Database?

Unlike normal databases that compare bits of text e.g. "is there a student name that begins with Chris"?, a vector database looks at the semantic meaning E.g. find me all chunks of content that are meaningfully related to Ocado's level of debt. It does this by:

First converting all chunks of text in the database into points in hyper-dimensional space. Think 3D trigonometry, except with thousands of dimensions.
Second, it does the same for a query (i.e. converts it to points in hyper-dimensional space), and then finds the nearest chunks of text in the database to the query. Amazingly, it does this using a "nearest neighbour algorithm" - which at its heart (this was my final year project ironically) - is pure Pythagorean theorem: two-thousand year old Maths still going strong.

Risks Associated with RAG:

One of the key risks associated with a RAG-based approach is the quality of the matching of your query to the chunks of content in the vector database. If this is poor, then you provide poor content to the LLM and you get poor results; if the quality of matching is accurate and relevant, you increase the chances of better results. So the quality of your matching between query and data in the vector database is key.
The other risk is of course the LLM itself: you're still at the mercy of the LLM. If for whatever reason the LLM decides that your prompt isn't quite directing enough, it might decide to use sources in addition to those you've provided and provide unexpected results. So the quality and precision of your LLM prompts are absolutely crucial.

So What Have I Built?

A small application that takes an organisations annual report (e.g. company account) and generates a short summary - see below:

And:

Answers questions based solely on the organisations annual reports that the application holds, and for each answer provides an explanation what the answer is based on (provenance in other words)
Records questions and answers for quick retrieval (avoids querying LLM if query already answered)

Currently it contains:

Feel free to try it and ask questions! - see https://dnalorroland.pythonanywhere.com/ns/ftse?tab=qa

Tech / Nerdy Stuff

Built using:

https://www.pinecone.io/
SQLite FTS
Python
And hosted on www.pythonanywhere.com

So Does RAG Help When Working with LLMs?

That, and further discussion of LLMs, and more is all in Part Two: Experimenting with GenAI, LLMs & RAG - Part 2: When Are LLMs Useful?

Previous Post | Next Post