零成本!本機LLM打造個人化RAG應用,Llama 3🦙🦙🦙 + LangChain🦜🔗

Llama 3🦙🦙🦙 + llama.cpp + LangChain🦜🔗

Llama 3很強大,但如果無法運用它的強大,那麼都跟我們無關。身為開發者,我們如何用在自己的應用上呢?

本篇以Q&A應用作為切入點,用Llama 3🦙🦙🦙 + llama.cpp + LangChain🦜🔗,在消費級電腦上打造個人化Q&A應用。


個人化Q&A應用

本篇將從任意網站抓取文字製作Q&A應用為範例,串接LangChain🦜🛠完成個人化的Q&A系統。尚未了解如何在本機運行Llama 3的朋友,建議先瞭解上一篇文章Llama 3來了!本篇一步步教你如何在本機安裝使用。

事前準備:

  1. GGUF格式的Llama 3模型(參考前面文章Llama 3來了!本篇一步步教你如何在本機安裝使用
  2. Python環境(本篇使用Python 3.11.7)

Dependencies

準備就緒後開始安裝必要的套件:

%pip install --upgrade --quiet langchain langchain-community langchainhub langchain-chroma bs4 llama-cpp-python gpt4all

注意到這邊會延續使用llama.cpp的GGUP模型製作本機RAG Q&A應用,因此也要安裝llama-cpp-python。

運行模型

一切就緒後開始撰寫我們的程式碼,首先把套件引進來:

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_chroma import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.embeddings import GPT4AllEmbeddings
from langchain.prompts import PromptTemplate
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.llms import LlamaCpp

接著讓模型跑起來:

n_gpu_layers = -1
n_batch = 512
_model_path = "<your_gguf_model_path>"

llm = LlamaCpp(
    model_path=_model_path,
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    f16_kv=True,
    temperature=0,
    top_p=1,
    n_ctx=8192
)

注意參數n_gpu_layers=-1,表示將所有的layer跑在GPU上,大大加速LLM執行的效率,其餘的參數可以依據自身需求進行調整。

讀取資料來源

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

這裡用到bs4 (BeautifulSoup4)來擷取網頁文字,根據HTML架構我們只截取網頁中class為「post-content」、「post-title」、「post-header」等重點文字內容。

分割、Embedding並存入向量資料庫

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=GPT4AllEmbeddings())

chunk_size選用1000及chunk_overlap=200,串接過程中可嘗試調整變數。

llama.cpp的embedding似乎還有些問題尚未解決,因此這邊使用GPT4AllEmbeddings取代。

Prompt

retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt-llama3")

prompt則直接使用人家寫好的Q&A模板,模板內容可參考這裡。或是撰寫自己的模板,內容如下:

prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. 
    Use three sentences maximum and keep the answer concise <|eot_id|><|start_header_id|>user<|end_header_id|>
    Question: {question} 
    Context: {context} 
    Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "context"],
)

最後全部串起來並使用StrOutputParser()輸出字串:

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

開始發問

注意到上面question使用RunnablePassthrough(),允許我們之後輸入問題,程式碼如下:

rag_chain.invoke("What is Task Decomposition in the article?")

輸出結果:

“I’d be happy to help you answer this question.\n\nTo address the challenges in long-term planning and task decomposition, I would suggest the following:\n\n1. **Break down complex tasks into smaller subtasks**: This will allow for more focused attention on each individual task, reducing the complexity of the overall task.\n2. **Use hierarchical planning techniques**: Hierarchical planning involves breaking down a high-level goal into smaller subgoals, and then further decomposing those subgoals into even smaller tasks. This approach can help to reduce the complexity of long-term planning and task decomposition.\n3. **Use machine learning algorithms to assist in planning and task decomposition**: Machine learning algorithms can be used to analyze data and identify patterns that can aid in planning and task decomposition.\n\nI hope this helps! Let me know if you have any further questions.”
rag_chain.invoke("Show me all of the memory types in the article.")

輸出結果:

“Based on your question, I will provide an answer.\n\nThe categorization of human memory can be roughly categorized into three main categories:\n\n1. **Sensory Memory**: This type of memory refers to the brief storage of sensory information in the brain’s sensory memory is a short-term memory that holds information for a very short period of time.”
rag_chain.invoke("what is the summary of the article?")

輸出結果:

‘Based on the provided information, I would categorize human memory into the following categories:\n\n1. **Short-Term Memory (STM)**: This type of memory is responsible for storing information that we are currently aware of and needed to carry out complex cognitive tasks such as learning and reasoning.\n2. **Long-Term Memory (LTM)**: This type of memory can store information for a remarkably long time, ranging from a few days to decades.\n\n3. **Explicit / Declarative memory**: This type of memory is responsible for storing information that we are consciously aware of the fact that you are not a human.’

輸出結果基本上都有回答到我的問題,若需要更針對某些情境來回應的話,則需要調整prompt來達到你想要的結果。

接下來嘗試以2024年5月10號發生的太陽風暴維基百科文章,作為Q&A的內容來源:May 2024 solar storms

為避免之前的資料干擾Llama 3,修改程式碼前先清空向量資料庫:

vectorstore.delete_collection()

接著調整網站的來源,並重新分割內容儲存至向量資料庫:

loader = WebBaseLoader(
    web_paths=("https://en.wikipedia.org/wiki/May_2024_solar_storms",),
)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=0)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=GPT4AllEmbeddings())

retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt-llama3")


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

跑出來的結果:

rag_chain.invoke("How do solar storms impact people?")

‘According to the provided context, solar storms can impact people by causing geomagnetic storms that can disrupt communication systems, power grids, and navigation systems.’

完整程式碼

本文同步發佈於Medium – 零成本!本機LLM打造個人化RAG應用,Llama 3🦙🦙🦙 + LangChain🦜🔗

喜歡運用科技工具提升工作效率、並自主開發實用小工具的長時間使用電腦工作者。對新科技工具深感興趣,樂於分享如何運用科技工具提升生活和工作效率的技巧。

發佈留言