You might have heard of these methods, but do you know which to focus on to improve your LLM Application? What are the pros and cons of each method and how do they compare to each other?
In this blogpost we make an in-depth analysis of each technique and the consequences of using them, along with giving some advice based on our previous experience with many LLM Projects.
Index
TLDR: When to use each method?
Quick Summary of Concepts
In-Depth Comparison - Requirements
Still not sure?
Use Case Comparison
Recommendations from our Experience
Wrapping things up
TLDR: When to use each method?
Use Prompt Engineering for dealing with lack of Accuracy. It is great for prototyping. It also helps enhance the other methods.
Use RAGÂ when you have lack of knowledge and need to connect to a Knowledge Base. Secure your information based on the user's permissions.
Use Fine-Tuning for specifying Formatting or Tone. Save in costs of larger prompts by inserting knowledge and instructions into the model.
Basically, you are good with Prompt Engineering + RAG for 70-80% of the cases, but some business cases can leverage Fine-Tuning to maximize performance when self-hosting.
Of course, this is a very simplistic way of looking at the problem and deciding on how to proceed. This is the executive summary you might give to someone but it is not sufficient to decide on what is required.
If you want to explore the differences more in-depth you need to look at the problem from several directions, which we will explore in the next sections. But before that, let's understand the base concepts.
Quick Summary of Concepts
Prompt Engineering
Prompt engineering involves crafting inputs (prompts) to an LLM in a way that guides the model to generate the desired output. This method does not alter the underlying model but relies on the user's skill in formulating questions or statements that lead to optimal results.
We recommend watching our webinar for more details:
Or you can read our blogpost:
RAG - Retrieval Augmented Generation
RAG combines the generative capabilities of LLMs with the retrieval of information from external databases or documents. This approach enables the model to incorporate more factual and up-to-date information in its responses.
The quality of the application's outputs hinges on the retrieval system's effectiveness. Therefore, optimizing this component is essential, as it directly influences the relevance and accuracy of the model's answers based on the given context.
We recommend reading this blogpost for more details on some of its use cases:
Fine-Tuning
Fine-tuning involves retraining a pre-trained LLM on a specific dataset to adapt its responses more closely to the desired outcomes. This method modifies the model’s weights through additional training, allowing it to better reflect specific nuances of the new data.
It is usually used to make small adjustments to the LLM to make it adjust better to your use case. As such, you should feed it only high-quality curated data that is highly relevant to what you want to teach it with examples of inputs and desired outputs.
We recommend watching our webinar for more details:
Or you can read this blogpost:
In-Depth Comparison - Requirements
When looking at your current product or future prototype from a high-level project perspective there are a couple of things you need to consider. These can be certain limitations or requirements that should help guide you to the best solution.
These can be derived from the project's goals or hard constraints like latency, costs, time until deployment, etc.
As seen in the table above, Fine-Tuning is a process that requires many resources and expertise, so it should only be employed when necessary. Therefore, being able to identify the correct use cases for it is key! More on this will be discussed further ahead.
Upon discussion with all the Stakeholders, you should be ready to look at all these requirements and see which method suits you better. Of course, this also depends on the stage that your project is at and the nature of it, it can get a bit tricky to pinpoint exactly what you need.
Still not sure?
Calling for the help of experts can help ensure that you go down the optimal path and avoid major costs or delays later.
If you want to learn more, I recommend contacting us to undertake our:
Find out where your project stands in terms of the AI adoption curve and possible next steps.
To do so, choose the test option in the topic of the contact form.
Use Case Comparison
Despite the phase in which your project is at and the requirements being key to this problem, identifying your use case and which method handles your problems better is paramount.
Each technique focuses on very specific problems and improves specific areas despite each also having their drawbacks.
Looking at the table above it becomes clear why you should use each of the techniques presented.
Prompt Engineering
Prompt engineering is fast to implement and requires little technical expertise, so it is an easy solution. However, it does have its limitations, especially in more complex scenarios, where the other two methods come in handy.
Nevertheless, prompt engineering should be used throughout all parts of a LLM application.
RAG
RAG is key for some products: i.e. a chatbot to query your documents.
However, keep in mind that RAG is based on search techniques and that to build a truly good RAG system you may need in-house knowledge of the search area and how to optimize it. A good RAG Product is only as good as the retrieval system it is using.
Fine-Tuning
It may look scary and questionable as to why use it, although it is very good for some particular scenarios:
When you want your answers to be in a certain format: i.e. JSON, XML, or even just free text but following certain guidelines.
If you need the tone to be adjusted, say to talk more formally or informally, or in a specific manner.
Need it to not be aggressive, lash out, or address topics outside your interest. In short, you need better guardrails and instead of putting them into the prompts, you can use fine-tuning for this.
All in all, Fine-Tuning represents an initial investment that then enables you to get a more robust model, better aligned with your goals. It saves on costs & latency in the day-to-day by requiring fewer tokens in the prompt since these behaviors have been hardcoded onto the model.
Fine-Tuning may also require running your model locally or at least managing it in a cloud-computing platform. It requires more in-house knowledge to set up, but after doing it once it is all the same for the next time. Therefore if a new model comes out or you just want to fine-tune again an older model you can do it with much more confidence and efficiency.
Recommendations from our Experience
In our experience, there isn't a single optimal approach to developing applications with LLMs; it greatly depends on your specific project needs and priorities.
The following image was inspired and adapted from a famous OpenAI Talk, with some modifications to represent our experience better.
Typically, development starts with Prompt Engineering to refine how you interact with the model to get the best possible responses for your use case. If you're using a self-hosted model, the next step might involve fine-tuning to align the model with your specific requirements better.
Alternatively, you might incorporate Agents or leverage RAG to enhance the model’s ability to pull in relevant external data.
Ultimately, a combination of these techniques—prompt engineering, fine-tuning, and RAG—is often employed to create a robust application. This integrated approach ensures that the application not only performs well but also remains adaptable and efficient across various use cases.
Wrapping things up
From what we saw it is clear that these techniques are quite distinct, however, they can also complement each other quite well. A successful LLM developer should always be on the lookout for different ways to improve it's project so keeping an eye out for any new discoveries, papers and models is essential!
Some methods may take more effort and knowledge than others, like Fine-Tuning. Although, I usually argue that obtaining this knowledge is invaluable when the LLM area is evolving so rapidly, so not being dependent on third-parties and being able to move fast when new models come out is key.
If you want to remain on top of everything you should aim to be knowledgeable on many areas such as LLM Cost Management, Quantization, Mixture-of-Experts, Tracking & Monitoring.
Feel free to contact us for any additional information!