2026/5/5 0:39:36
网站建设
项目流程
厦门网站建设哪家好厦门最好的网站建设,网站开发建设技术规范书,摄影网站制作流程,网站建设销售要懂什么接前篇#xff0c;常规卷积在CUDA上回进行内存重排#xff0c;使之变为连续的#xff0c;然后放到CUDA核或者Tensor核上进行一系列高性能的乘加操作。但是风车卷积不是常规的卷积#xff0c;虽说参数量也小#xff0c;但是在jetson上无对应的高性能算子#xff0c;导致访…接前篇常规卷积在CUDA上回进行内存重排使之变为连续的然后放到CUDA核或者Tensor核上进行一系列高性能的乘加操作。但是风车卷积不是常规的卷积虽说参数量也小但是在jetson上无对应的高性能算子导致访存不连续进而拉慢了推理性能。本篇去掉了风车型卷积改回SPDConv同时去掉了边缘设备上不友好的DFL结构并将激活函数从SiLU改为ReLU重新训练以提高边缘设备推理性能。一、模型信息模型结构图YAML文件nc:1# number of classesscales:# model compound scaling constants, i.e. modelyolo11n.yaml will call yolo11.yaml with scale n# [depth, width, max_channels]n:[0.5,0.50,1024]# s: [1.0, 1.00, 1024]# m: [1.00, 2.00, 512]backbone:# [from, repeats, module, args]-[-1,1,SPDConv,[32]]-[-1,1,SPDConv,[64]]-[-1,2,C3k2,[64,True,0.25]]# 2 P2-[-1,1,Conv,[64,3,2]]-[-1,2,C3k2,[128,True,0.25]]# 4 P3-[-1,1,Conv,[128,3,2]]-[-1,2,C3k2,[256,False]]# 6 P4-[-1,1,SPPF,[256,5]]-[-1,2,C2PSA,[256]]# 8head:-[-1,1,nn.Upsample,[None,2,nearest]]-[[-1,4],1,Concat,[1]]# cat backbone P3-[-1,2,C3k2,[128,False]]# 11-[-1,1,nn.Upsample,[None,2,nearest]]-[[-1,2],1,Concat,[1]]# cat backbone P2-[-1,2,C3k2,[64,False]]# 14-[-1,1,Conv,[64,3,2]]-[[-1,11],1,Concat,[1]]-[-1,2,C3k2,[128,False]]# 17# 向上分支融合原始特征-[-1,1,nn.Upsample,[None,2,nearest]]-[[-1,2],1,Concat,[1]]# cat backbone P2-[-1,2,MicroC3,[64]]# 20-[-1,1,HDC,[64]]-[-1,1,ART,[64]]# 22-[17,1,Conv,[128,3,2]]-[[-1,8],1,Concat,[1]]# 24-[-1,2,C3k2,[256,True]]#-[[22,17,25],1,Detect,[nc]]# Detect(P2, P3, P4)# - [[21, 17, 24], 1, Detect, [nc]] # 减少一个concat模型参数量分析n-model总体FLOPs很小只有4.78G参数量500多K。s-modelFLOPs也只有21.554G二、详细改动1.关闭DFLultralytics/nn/modules/head.pyclassDetect(nn.Module):...def__init__(self,nc:int80,ch:tuple()): Initialize the YOLO detection layer with specified number of classes and channels. Args: nc (int): Number of classes. ch (tuple): Tuple of channel sizes from backbone feature maps. super().__init__()self.ncnc# number of classesself.nllen(ch)# number of detection layers# self.reg_max 16 # DFL channels (ch[0] // 16 to scale 4/8/12/16/20 for n/s/m/l/x)self.reg_max1# !!!注释掉上面一句修改为这个2.修改模块激活函数ultralytics/nn/modules/conv.pyclassConv(nn.Module): Standard convolution module with batch normalization and activation. Attributes: conv (nn.Conv2d): Convolutional layer. bn (nn.BatchNorm2d): Batch normalization layer. act (nn.Module): Activation function layer. default_act (nn.Module): Default activation function (SiLU). # default_act nn.SiLU() # default activationdefault_actnn.ReLU()# !!!修改在此处其余使用到的模块也需要检查激活函数是否为ReLU.三、实验结果测试集上混淆矩阵网络在自制测试集上的召回率和准确率都很高。推理性能n-model在jetson nx板子上可以达到90FPS四、后续推理代码分享