Experimenting with GenAI, LLMs & RAG - Part 2: When Are LLMs Useful?

22 Oct, 2025

Introduction

So in Part One: Experimenting with GenAI, LLMs & RAG - Part 1: Building a RAG Application, I touched upon what "Retrieval-Augmented Generation (RAG)" is, and discussed a little RAG application that I'd built: see https://dnalorroland.pythonanywhere.com/ns/ftse?tab=qa

Does RAG Help Overcome the Limitations of LLMs?

In short:

Accuracy: Could we get an LLM to be accurate & to be accurate all the time? Almost, but ultimately, not reliably. In testing I managed to get the LLM to generate some really inaccurate or incomplete answers to some questions that it really should have been able to answer given the data provided. E.g. "Who is Oxfam's current chief executive?" (see screengrab above)
File/token limit: Maybe. I didn't have problems with file upload limits, but I didn't really test it at scale.
Provenance: Yes - this did seem viable, though the UI I created could be better.

Was There More I Could Do?

Well, it seems that there probably were more things I could try - see "DiamantAI: How to Stop AI Hallucinations: 10 Proven Techniques That Actually Work". That's really at least ten techniques.

Interestingly, I was already doing some of these, namely:

Choose Advanced Models
Write Clear Instructions (at least I thought so)
Use Step-by-Step Reasoning - yep, in my LLM prompt
Lower the Temperature: check

The Real Problem: This Is Not a Good Use Case for LLMs

I think the reason I haven't persisted with this any further is because being precise fundamentally isn't what LLMs are good at. In reality when you ask an LLM a question the core engine can give you thousands of answers in decreasing levels of relevance to your query, but that doesn't make a good product.

If we ask a chatbot for a summary, we want certainty "hey, no problem, here's your summary". We don't want "here's what I think is the most relevant summary, but you should check it" - particularly if we're summarising a 200 page PDF. So this certainty is what the platform providers have built us, but that presentation does seem somewhat disengenous.

LLMs are by their nature probablistic, and so any answer is always just one of an infinite number of answers determined by the LLM to be the most relevant to your query.

Which does make me a little nervous about this:

Still, I'm sure they'll work it out in high-pressure military situations where certainty is rather important 🫣

What Are Good Use Cases for LLM?

So this begs the question - what are LLM's useful for?

Where the risk associated with the answer is low
Where the effort required to assess accuracy of an answer is low
Or, where having an uncertain outcome is part of the question, e.g. "what might a UK MP say in response to a particular question?"

Returning to My Original Intention:

"Could one get an LLM to give an accurate analysis of a company account, and explain why it gave that analysis?"

Give an accurate analysis of a company account: given I pretty easily got it to give me an inaccurate answer to a query, have to say no.
Explain why it gave that analysis: yes, pretty much.

Given this, I've decided not to take this project much further.

What's Next?

Thinking about using technology to build something useful and/or interesting. At this point, possibly not using an LLM.

Previous Post | Next Post

Noodling About - Experiments, Writing & More