top of page
Writer's pictureMiguel Carreira Neves

Prompt Eng vs RAG vs Fine-Tuning - What do you need?

You might have heard of these methods, but do you know which to focus on to improve your LLM Application? What are the pros and cons of each method and how do they compare to each other?


In this blogpost we make an in-depth analysis of each technique and the consequences of using them, along with giving some advice based on our previous experience with many LLM Projects.


Index

  • TLDR: When to use each method?

  • Quick Summary of Concepts

  • In-Depth Comparison - Requirements

  • Still not sure?

  • Use Case Comparison

  • Recommendations from our Experience

  • Wrapping things up


TLDR: When to use each method?

Use Prompt Engineering for dealing with lack of Accuracy. It is great for prototyping. It also helps enhance the other methods.
Use RAG when you have lack of knowledge and need to connect to a Knowledge Base. Secure your information based on the user's permissions.
Use Fine-Tuning for specifying Formatting or Tone. Save in costs of larger prompts by inserting knowledge and instructions into the model.

Basically, you are good with Prompt Engineering + RAG for 70-80% of the cases, but some business cases can leverage Fine-Tuning to maximize performance when self-hosting.

Use Prompt Engineering for Lack of Accuracy. Use RAG for Lack of knowledge. Use Fine-Tuning for Formatting or Tone
High level Heuristic of when to use each method

Of course, this is a very simplistic way of looking at the problem and deciding on how to proceed. This is the executive summary you might give to someone but it is not sufficient to decide on what is required.


If you want to explore the differences more in-depth you need to look at the problem from several directions, which we will explore in the next sections. But before that, let's understand the base concepts.



Quick Summary of Concepts


Prompt Engineering

Prompt engineering involves crafting inputs (prompts) to an LLM in a way that guides the model to generate the desired output. This method does not alter the underlying model but relies on the user's skill in formulating questions or statements that lead to optimal results.

This prompt uses: Generated Knowledge, Few Shot, Emotion Prompt, Chain of Thought, Self-Reflection
Example of a Prompt refined with a lot of techniques

We recommend watching our webinar for more details:

Or you can read our blogpost:


RAG - Retrieval Augmented Generation

RAG combines the generative capabilities of LLMs with the retrieval of information from external databases or documents. This approach enables the model to incorporate more factual and up-to-date information in its responses.

The quality of the application's outputs hinges on the retrieval system's effectiveness. Therefore, optimizing this component is essential, as it directly influences the relevance and accuracy of the model's answers based on the given context.


 Langchain RAG Diagram: Knowledge Source  2. Search Relevant Documents for the Query  3. Return Relevant Documents  4. Prompt + Relevant Documents + Query  Application  LLM  1. Input: Query
Example of a RAG Application built with Langchain

We recommend reading this blogpost for more details on some of its use cases:


Fine-Tuning

Fine-tuning involves retraining a pre-trained LLM on a specific dataset to adapt its responses more closely to the desired outcomes. This method modifies the model’s weights through additional training, allowing it to better reflect specific nuances of the new data.


It is usually used to make small adjustments to the LLM to make it adjust better to your use case. As such, you should feed it only high-quality curated data that is highly relevant to what you want to teach it with examples of inputs and desired outputs.

Data for Training is whatever you find, just dump it all. Data for Fine-Tuning should be a handful of high quality samples
Left - Data for Training Right - Data for Fine-Tuning

We recommend watching our webinar for more details:

Or you can read this blogpost:


In-Depth Comparison - Requirements

When looking at your current product or future prototype from a high-level project perspective there are a couple of things you need to consider. These can be certain limitations or requirements that should help guide you to the best solution.

These can be derived from the project's goals or hard constraints like latency, costs, time until deployment, etc.


Use Prompt Engineering for Prototyping or MVP. Use RAG for Prototypes that require connecting with Databases. Use Fine-Tuning for Production Grade Projects that need better Formatting and Tone.
Why use each method? Depends on Limitations and Requirements


As seen in the table above, Fine-Tuning is a process that requires many resources and expertise, so it should only be employed when necessary. Therefore, being able to identify the correct use cases for it is key! More on this will be discussed further ahead.


Upon discussion with all the Stakeholders, you should be ready to look at all these requirements and see which method suits you better. Of course, this also depends on the stage that your project is at and the nature of it, it can get a bit tricky to pinpoint exactly what you need.


Still not sure?

Calling for the help of experts can help ensure that you go down the optimal path and avoid major costs or delays later.

Schedule a free 30 min test with us to get recommendations on your LLM Project

If you want to learn more, I recommend contacting us to undertake our:

Find out where your project stands in terms of the AI adoption curve and possible next steps.

To do so, choose the test option in the topic of the contact form.



Use Case Comparison

Despite the phase in which your project is at and the requirements being key to this problem, identifying your use case and which method handles your problems better is paramount.

Each technique focuses on very specific problems and improves specific areas despite each also having their drawbacks.


Use Prompt Engineering for increasing Accuracy. Use RAG for connecting with Databases. Use Fine-Tuning for Formatting and Tone.
Comparison of methods by Use Case

Looking at the table above it becomes clear why you should use each of the techniques presented.


Prompt Engineering

Prompt engineering is fast to implement and requires little technical expertise, so it is an easy solution. However, it does have its limitations, especially in more complex scenarios, where the other two methods come in handy.

Nevertheless, prompt engineering should be used throughout all parts of a LLM application.


In every step between the user and the final answer there are several prompts which can be refined through prompt engineering
Diagram of a typical LLM Application

RAG

RAG is key for some products: i.e. a chatbot to query your documents.

  • However, keep in mind that RAG is based on search techniques and that to build a truly good RAG system you may need in-house knowledge of the search area and how to optimize it. A good RAG Product is only as good as the retrieval system it is using.


Fine-Tuning

It may look scary and questionable as to why use it, although it is very good for some particular scenarios:

  • When you want your answers to be in a certain format: i.e. JSON, XML, or even just free text but following certain guidelines.

  • If you need the tone to be adjusted, say to talk more formally or informally, or in a specific manner.

  • Need it to not be aggressive, lash out, or address topics outside your interest. In short, you need better guardrails and instead of putting them into the prompts, you can use fine-tuning for this.

  • All in all, Fine-Tuning represents an initial investment that then enables you to get a more robust model, better aligned with your goals. It saves on costs & latency in the day-to-day by requiring fewer tokens in the prompt since these behaviors have been hardcoded onto the model.


Fine-Tuning may also require running your model locally or at least managing it in a cloud-computing platform. It requires more in-house knowledge to set up, but after doing it once it is all the same for the next time. Therefore if a new model comes out or you just want to fine-tune again an older model you can do it with much more confidence and efficiency.


Recommendations from our Experience

In our experience, there isn't a single optimal approach to developing applications with LLMs; it greatly depends on your specific project needs and priorities.

The following image was inspired and adapted from a famous OpenAI Talk, with some modifications to represent our experience better.

First try Prompt Engineering with Few Shot, then go for Chain of though. Then do instruction tuning, then leverage Agents. Now connect a simple RAG. Try to improve prompts by fine tuning for formatting and Tone. Then do Complex RAG and finally add the RAG to the training examples
Example of an LLM App Development Cycle

Typically, development starts with Prompt Engineering to refine how you interact with the model to get the best possible responses for your use case. If you're using a self-hosted model, the next step might involve fine-tuning to align the model with your specific requirements better.

Alternatively, you might incorporate Agents or leverage RAG to enhance the model’s ability to pull in relevant external data.

Ultimately, a combination of these techniques—prompt engineering, fine-tuning, and RAG—is often employed to create a robust application. This integrated approach ensures that the application not only performs well but also remains adaptable and efficient across various use cases.



Wrapping things up

From what we saw it is clear that these techniques are quite distinct, however, they can also complement each other quite well. A successful LLM developer should always be on the lookout for different ways to improve it's project so keeping an eye out for any new discoveries, papers and models is essential!


Some methods may take more effort and knowledge than others, like Fine-Tuning. Although, I usually argue that obtaining this knowledge is invaluable when the LLM area is evolving so rapidly, so not being dependent on third-parties and being able to move fast when new models come out is key.

Robot sailing the tides of AI while reading a book and learning how to sail properly
The tides keep changing

If you want to remain on top of everything you should aim to be knowledgeable on many areas such as LLM Cost Management, Quantization, Mixture-of-Experts, Tracking & Monitoring.


Feel free to contact us for any additional information!


Sign up to get updates when we release another amazing article

Thanks for subscribing!

bottom of page