制作做网站的基本流程沈阳媒体
2026/4/6 9:15:51 网站建设 项目流程
制作做网站的基本流程,沈阳媒体,江苏军民融合网站建设,河北建设厅网站最近公司处理LLM项目的同事咨询了我一个问题#xff1a;明明文档中多次提到同一个专有名词#xff0c;RAG却总是漏掉关键信息。排查后发现#xff0c;问题出在传统的分块方法上——那些相隔几页却密切相关的句子#xff0c;被无情地拆散了。我给了一些通用的建议#xff0…最近公司处理LLM项目的同事咨询了我一个问题明明文档中多次提到同一个专有名词RAG却总是漏掉关键信息。排查后发现问题出在传统的分块方法上——那些相隔几页却密切相关的句子被无情地拆散了。我给了一些通用的建议比如使用混合检索代替单一的语义检索基于chunk生成QA对等等。接着他又提出了一个问题有没有通过分块技术能减少这类问题的发生我说你也可以试试最近新提出的一种分块策略Agentic Chunking.为什么分块如此重要在RAG模型中文本分块是第一步也是最关键的一步。传统的分块方法比如递归字符分割Recursive character splitting虽然简单易用但它有一个明显的缺点它依赖于固定的token长度进行分割这可能导致一个主题被分割到不同的文本块中从而破坏了上下文的连贯性。另一种常见的分块方法是语义分割semantic splitting它通过检测句子之间的语义变化来进行分割。这种方法虽然比递归字符分割更智能但它也有局限性。比如当文档中的话题来回切换时语义分割可能会将相关内容分割到不同的块中导致信息不连贯。比如遇到下面这种场景时它们就会集体失灵“小明介绍了Transformer架构…中间插入5段其他内容…最后他强调Transformer的核心是自注意力机制。”传统方法要么把这两句话拆到不同区块要么被中间内容干扰导致语义断裂。而人工分块时我们自然会将它们归为“模型原理”组——这种跨越文本距离的关联性正是Agentic Chunking要解决的。Agentic Chunking的工作原理Agentic Chunking的核心思想是让大语言模型LLM主动评估每一句话并将其分配到最合适的文本块中。与传统的分块方法不同Agentic Chunking不依赖于固定的token长度或语义变化而是通过LLM的智能判断将文档中相隔较远但主题相关的句子归入同一组。举个例子假设我们有以下文本On July 20, 1969, astronaut Neil Armstrong walked on the moon. He was leading the NASA’s Apollo 11 mission. Armstrong famously said, “That’s one small step for man, one giant leap for mankind” as he stepped onto the lunar surface.在Agentic Chunking中LLM会将这些句子进行propositioning处理即将每个句子独立化确保每个句子都有自己的主语。处理后的文本如下On July 20, 1969, astronaut Neil Armstrong walked on the moon. Neil Armstrong was leading the NASA’s Apollo 11 mission. Neil Armstrong famously said, “That’s one small step for man, one giant leap for mankind” as he stepped onto the lunar surface.这样LLM就可以单独检查每一个句子并将其分配到最合适的文本块中。propositioning 可以看做是对文档进行“句子级整容”确保每个句子独立完整如何实现Agentic Chunking实现Agentic Chunking的关键在于propositioning和文本块的动态创建与更新。我们可以使用Langchain和Pydantic等工具来实现这一过程。流程图如下Propositioning文本首先我们需要将文本中的每个句子进行propositioning处理。我们可以使用Langchain提供的提示词模板让LLM自动完成这项工作。以下是一个简单的代码示例from langchain.chains import create_extraction_chain_pydantic from langchain_core.pydantic_v1 import BaseModel from typing import Optional from langchain.chat_models import ChatOpenAI import uuid import os from typing import List from langchain import hub from langchain_core.prompts import ChatPromptTemplate from langchain_openai import ChatOpenAI from pydantic import BaseModel obj hub.pull(wfh/proposal-indexing) llm ChatOpenAI(modelgpt-4o) classSentences(BaseModel): sentences: List[str] extraction_llm llm.with_structured_output(Sentences) extraction_chain obj | extraction_llm sentences extraction_chain.invoke( On July 20, 1969, astronaut Neil Armstrong walked on the moon. He was leading the NASAs Apollo 11 mission. Armstrong famously said, Thats one small step for man, one giant leap for mankind as he stepped onto the lunar surface. )创建和更新文本块接下来我们需要创建一个函数来动态生成和更新文本块。每个文本块包含主题相似的propositions并且随着新propositions的加入文本块的标题和摘要也会不断更新。def create_new_chunk(chunk_id, proposition): summary_llm llm.with_structured_output(ChunkMeta) summary_prompt_template ChatPromptTemplate.from_messages([ (system, Generate a new summary and a title based on the propositions.), (user, propositions:{propositions}), ]) summary_chain summary_prompt_template | summary_llm chunk_meta summary_chain.invoke({propositions: [proposition]}) chunks[chunk_id] { summary: chunk_meta.summary, title: chunk_meta.title, propositions: [proposition], }将proposition推送到合适的文本块最后我们需要一个AI Agent来判断新的proposition应该被添加到哪个文本块中。如果没有合适的文本块Agent会创建一个新的文本块。def find_chunk_and_push_proposition(proposition): class ChunkID(BaseModel): chunk_id: int Field(descriptionThe chunk id.) allocation_llm llm.with_structured_output(ChunkID) allocation_prompt ChatPromptTemplate.from_messages([ (system, Find the chunk that best matches the proposition. If no chunk matches, return a new chunk id.), (user, proposition:{proposition} chunks_summaries:{chunks_summaries}), ]) allocation_chain allocation_prompt | allocation_llm chunks_summaries {chunk_id: chunk[summary] for chunk_id, chunk in chunks.items()} best_chunk_id allocation_chain.invoke({proposition: proposition, chunks_summaries: chunks_summaries}).chunk_id if best_chunk_id not in chunks: create_new_chunk(best_chunk_id, proposition) else: add_proposition(best_chunk_id, proposition)实测效果如何我选择了新加坡圣淘沙著名景点 Wings of Time 的介绍文本作为测试对象使用 GPT-4 模型进行处理。这段文本包含了景点介绍、票务信息、开放时间等多个方面的内容是一个很好的测试样本。Product Name: Wings of Time Product Description: Wings of Time is one of Sentosas most breathtaking attractions, combining water, laser, fire, and music to create a mesmerizing night show about friendship and courage. Situated on the scenic (https://www.sentosa.com.sg/en/things-to-do/attractions/siloso-beach/) Siloso Beach , this award-winning spectacle is staged nightly, promising an unforgettable experience for visitors of all ages. Be wowed by spellbinding laser, fire, and water effects set to a majestic soundtrack, complete with a jaw-dropping fireworks display. A fitting end to your day out at Sentosa, it’s possibly the only place in Singapore where you can witness such an awe-inspiring performance. Get ready for an even better experience starting 1 February 2025 ! Wings of Time Fireworks Symphony, Singapore’s only daily fireworks show, now features a fireworks display that is four times longer! Important Note: Please visit (https://www.sentosa.com.sg/sentosa-reservation) here if you need to change your visit date. All changes must be made at least 1 day prior to the visit date. Product Category: Shows Product Type: Attraction Keywords: Wings of Time, Sentosa night show, Sentosa attractions, laser show Sentosa, water show Singapore, Sentosa events, family activities Sentosa, Singapore night shows, outdoor night show Sentosa, book Wings of Time tickets Meta Description: Experience Wings of Time at Sentosa! A breathtaking night show featuring water, laser, and fire effects. Perfect for a memorable evening. Product Tags: Family Fun,Popular experiences,Frequently Bought Locations: Beach Station [Tickets] Name: Wings of Time (Std) Terms: • All Wings of Time (WOT) Open-Dated tickets require prior redemption at Singapore Cable Car Ticketing counters and are subjected to seats availability on a first come first serve basis. • This is a rain or shine event. Tickets are non-exchangeable or nonrefundable under any circumstances. • Once timeslot is confirmed, no further amendments are allowed. Please proceed to WOT admission gates to scan your issued QR code via mobile or physical printout for admission. • Gates will open 15 minutes prior to the start of the show. • Show Duration: 20 minutes per show. • Please be punctual for your booked time slot. • Admission will be on a first come first serve basis within the allocated timeslot or at the discretion of the attraction host. • Standard seats are applicable to guest aged 4 years and above. • No outside Food Drinks are allowed. • Refer to (https://www.mountfaberleisure.com/attraction/wings-of-time/) https://www.mountfaberleisure.com/attraction/wings-of-time/ for more information on Wings of Time. Pax Type: Standard Promotion A: Enjoy $1.90 off when you purchase online! Discount will automatically be applied upon checkout. Price: 19 Opening Hours: Daily Show 1: 7.40pm Show 2: 8.40pm Accessibilities: Wheelchair [Information] Title: Terms Conditions Description: For more information, click (https://www.sentosa.com.sg/en/promotional-general-store-terms-and-conditions) here for Terms Conditions Title: Getting Here Description: By Sentosa Express: Alight at Beach Station By Public Bus: Board Bus 123 and alight at Beach Station By Intra-Island Bus: Board Sentosa Bus A or B and alight at Beach Station Nearest Car Park Beach Station Car Park Title: Contact Us Description: Beach Station 65 6361 0088 (mailto:guestrelationsmflg.com.sg) guestrelationsmflg.com.sg系统首先将原文转化为 50 多个独立的陈述句propositions。有趣的是在这个过程中系统自动将每句话的主语统一为Wings of Time这显示出了 AI 对文本主题的准确把握。[ Wings of Time is one of Sentosas most breathtaking attractions., Wings of Time combines water, laser, fire, and music to create a mesmerizing night show., The night show of Wings of Time is about friendship and courage., Wings of Time is situated on the scenic Siloso Beach., Wings of Time is an award-winning spectacle staged nightly., Wings of Time promises an unforgettable experience for visitors of all ages., Wings of Time features spellbinding laser, fire, and water effects set to a majestic soundtrack., Wings of Time includes a jaw-dropping fireworks display., Wings of Time is a fitting end to a day out at Sentosa., Wings of Time is possibly the only place in Singapore where such an awe-inspiring performance can be witnessed., Wings of Time will offer an even better experience starting 1 February 2025., Wings of Time Fireworks Symphony is Singapore’s only daily fireworks show., Wings of Time Fireworks Symphony now features a fireworks display that is four times longer., Visitors should visit the provided link if they need to change their visit date to Wings of Time., All changes to the visit date must be made at least 1 day prior to the visit date., Wings of Time is categorized as a show., Wings of Time is a type of attraction., Keywords for Wings of Time include: Wings of Time, Sentosa night show, Sentosa attractions, laser show Sentosa, water show Singapore, Sentosa events, family activities Sentosa, Singapore night shows, outdoor night show Sentosa, book Wings of Time tickets., The meta description for Wings of Time is: Experience Wings of Time at Sentosa! A breathtaking night show featuring water, laser, and fire effects. Perfect for a memorable evening., Product tags for Wings of Time include: Family Fun, Popular experiences, Frequently Bought., Wings of Time is located at Beach Station., Wings of Time (Std) tickets require prior redemption at Singapore Cable Car Ticketing counters., Wings of Time (Std) tickets are subjected to seats availability on a first come first serve basis., Wings of Time is a rain or shine event., Tickets for Wings of Time are non-exchangeable or nonrefundable under any circumstances., Once the timeslot for Wings of Time is confirmed, no further amendments are allowed., Visitors should proceed to Wings of Time admission gates to scan their issued QR code via mobile or physical printout for admission., Gates for Wings of Time will open 15 minutes prior to the start of the show., The show duration for Wings of Time is 20 minutes per show., Visitors should be punctual for their booked time slot for Wings of Time., Admission to Wings of Time will be on a first come first serve basis within the allocated timeslot or at the discretion of the attraction host., Standard seats for Wings of Time are applicable to guests aged 4 years and above., No outside food and drinks are allowed at Wings of Time., More information on Wings of Time can be found at the provided link., The pax type for Wings of Time is Standard., Promotion A for Wings of Time offers $1.90 off when purchased online., The discount for Promotion A will automatically be applied upon checkout., The price for Wings of Time is 19., Wings of Time has opening hours daily with Show 1 at 7.40pm and Show 2 at 8.40pm., Wings of Time is accessible by wheelchair., The title for terms and conditions is Terms Conditions., More information on terms and conditions can be found at the provided link., The title for getting to Wings of Time is Getting Here., Visitors can get to Wings of Time by Sentosa Express by alighting at Beach Station., Visitors can get to Wings of Time by Public Bus by boarding Bus 123 and alighting at Beach Station., Visitors can get to Wings of Time by Intra-Island Bus by boarding Sentosa Bus A or B and alighting at Beach Station., The nearest car park to Wings of Time is Beach Station Car Park., The title for contacting Wings of Time is Contact Us., The contact location for Wings of Time is Beach Station., The contact phone number for Wings of Time is 65 6361 0088., The contact email for Wings of Time is guestrelationsmflg.com.sg.]经过 AI 的智能分块agentic chunking整个文本被自然地划分为四个主要部分主体信息块包含了 Wings of Time 的核心介绍、特色、位置等综合信息日程政策块专门处理预约变更相关的信息价格优惠块聚焦于折扣和支付相关内容法律条款块归纳了各项条款和规定Chunk (a641f): Sentosas Wings of Time Show Visitor Information Summary: This chunk contains comprehensive details about the Wings of Time attraction in Sentosa, including its features, themes, location, visitor experience, ticketing and admission procedures, future enhancements, promotions, classification as a show and attraction, unique fireworks display, daily show schedule, accessibility options, importance of punctuality and ticket redemption, extended fireworks display in the Fireworks Symphony, transportation options to reach the venue, and the necessity of adhering to non-exchangeable ticket policies, with a focus on the standard ticketing process and visitor guidelines, and the recent update on the extended fireworks display, as well as the contact information and accessibility details, and the new experience starting February 2025. Chunk (ae2b8): Scheduling Policies Summary: This chunk contains information about policies regarding changes to scheduled dates and times. Chunk (dadbb): Retail Discounts Summary: This chunk contains information about the application of discounts during the checkout process. Chunk (3347c): Legal Terms Conditions Summary: This chunk contains information about terms and conditions, including their titles and where to find more information.经过这样的分块之后各个块的主题明确不重叠且重要信息优先辅助信息分类存放。把这样的信息放在一起也有助于提升向量库的召回率从而提升RAG的准确率。总结Agentic Chunking是一种非常强大的文本分块技术它能够将文档中相隔较远但主题相关的句子归入同一组从而提升RAG模型的效果但是这种方法在成本和延迟上相对较高。同事尝试了Agentic chunking之后据他说准确率提升了40%但成本也增加了3倍。那么我们时候应该使用Agentic chunking呢根据我的项目经验以下场景特别适合非结构化文本如客服对话记录主题反复横跳的内容技术沙龙实录需要跨段落关联的QA系统而面对结构清晰的论文、说明书等传统分块和语义分块仍是性价比之选。大模型算是目前当之无愧最火的一个方向了算是新时代的风口有小伙伴觉得作为新领域、新方向人才需求必然相当大与之相应的人才缺乏、人才竞争自然也会更少那转行去做大模型是不是一个更好的选择呢是不是更好就业呢是不是就暂时能抵抗35岁中年危机呢答案当然是这样大模型必然是新风口那如何学习大模型 由于新岗位的生产效率要优于被取代岗位的生产效率所以实际上整个社会的生产效率是提升的。但是具体到个人只能说是最先掌握AI的人将会比较晚掌握AI的人有竞争优势。这句话放在计算机、互联网、移动互联网的开局时期都是一样的道理。但现在很多想入行大模型的人苦于现在网上的大模型老课程老教材学也不是不学也不是基于此我用做产品的心态来打磨这份大模型教程深挖痛点并持续修改了近100余次后终于把整个AI大模型的学习路线完善出来在这个版本当中您只需要听我讲跟着我做即可为了让学习的道路变得更简单这份大模型路线学习教程已经给大家整理并打包分享出来, 有需要的小伙伴可以扫描下方二维码领取↓↓↓CSDN大礼包全网最全《LLM大模型学习资源包》免费分享安全咨料放心领取​一、大模型经典书籍免费分享AI大模型已经成为了当今科技领域的一大热点那以下这些大模型书籍就是非常不错的学习资源。二、640套大模型报告免费分享这套包含640份报告的合集涵盖了大模型的理论研究、技术实现、行业应用等多个方面。无论您是科研人员、工程师还是对AI大模型感兴趣的爱好者这套报告合集都将为您提供宝贵的信息和启示。(几乎涵盖所有行业)三、大模型系列视频教程免费分享四、2025最新大模型学习路线免费分享我们把学习路线分成L1到L4四个阶段一步步带你从入门到进阶从理论到实战。L1阶段:启航篇丨极速破界AI新时代L1阶段了解大模型的基础知识以及大模型在各个行业的应用和分析学习理解大模型的核心原理、关键技术以及大模型应用场景。L2阶段攻坚篇丨RAG开发实战工坊L2阶段AI大模型RAG应用开发工程主要学习RAG检索增强生成包括Naive RAG、Advanced-RAG以及RAG性能评估还有GraphRAG在内的多个RAG热门项目的分析。L3阶段跃迁篇丨Agent智能体架构设计L3阶段大模型Agent应用架构进阶实现主要学习LangChain、 LIamaIndex框架也会学习到AutoGPT、 MetaGPT等多Agent系统打造Agent智能体。L4阶段精进篇丨模型微调与私有化部署L4阶段大模型的微调和私有化部署更加深入的探讨Transformer架构学习大模型的微调技术利用DeepSpeed、Lamam Factory等工具快速进行模型微调并通过Ollama、vLLM等推理部署框架实现模型的快速部署。L5阶段专题集丨特训篇 【录播课】全套的AI大模型学习资源已经整理打包有需要的小伙伴可以微信扫描下方二维码免费领取CSDN大礼包全网最全《LLM大模型学习资源包》免费分享安全资料放心领取​

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询