LLMxAICC自救手册_镜像仓库网

返回列表

LLMxAICC自救手册

2023-10-18

浏览量：8066

# 写在前面开源和互助精神是互联网最好的品质，但是提出好问题也是提问者的美德。在提问之前，我们建议你基于本文的内容进行初步的自救。 # 查看NPU状态执行```npu-smi info```可以查看NPU状态，示例如 ``` (base) [root@devserver-d64GB-1 ma-user]# npu-smi info +------------------------------------------------------------------------------------------------+ | npu-smi 23.0.rc3.b090 Version: 23.0.rc2.2 | +---------------------------+---------------+----------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)| | Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) | +===========================+===============+====================================================+ | 6 64GB | OK | 93.2 41 0 / 0 | | 0 | 0000:82:00.0 | 0 0 / 0 4185 / 65536 | +===========================+===============+====================================================+ +---------------------------+---------------+----------------------------------------------------+ | NPU Chip | Process id | Process name | Process memory(MB) | +===========================+===============+====================================================+ | No running processes found in NPU 6 | +===========================+===============+====================================================+ (base) [root@devserver-d64GB-1 ma-user]# ``` 基于上述内容，我们可以查看当前HBM使用情况，被哪个进程占用等。你也可以通过```npu-smi info watch```来实时查看NPU占用情况。 # 查看NPU型号用于训练场景的NPU型号较多，广为人所知的是32GB、64GB两个型号的NPU，通过```npu-smi info```可以基于Name确认型号，如果这一步仍然存有疑问，可以查看HBM的大小来分辨两种型号的NPU，64GB产品采用64GB的HBM规格。 # 查看CANN版本可以执行```cat /usr/local/Ascend/ascend-toolkit/latest/version.cfg```查看CANN版本信息，示例如 ``` (base) [root@devserver-d64GB-1 ma-user]# cat /usr/local/Ascend/ascend-toolkit/latest/version.cfg # version: 1.0 runtime_running_version=[7.0.0.5.242:7.0.RC1] compiler_running_version=[7.0.0.5.242:7.0.RC1] opp_running_version=[7.0.0.5.242:7.0.RC1] toolkit_running_version=[7.0.0.5.242:7.0.RC1] aoe_running_version=[7.0.0.5.242:7.0.RC1] ncs_running_version=[7.0.0.5.242:7.0.RC1] runtime_upgrade_version=[7.0.0.5.242:7.0.RC1] compiler_upgrade_version=[7.0.0.5.242:7.0.RC1] opp_upgrade_version=[7.0.0.5.242:7.0.RC1] toolkit_upgrade_version=[7.0.0.5.242:7.0.RC1] aoe_upgrade_version=[7.0.0.5.242:7.0.RC1] ncs_upgrade_version=[7.0.0.5.242:7.0.RC1] runtime_installed_version=[7.0.0.5.242:7.0.RC1] compiler_installed_version=[7.0.0.5.242:7.0.RC1] opp_installed_version=[7.0.0.5.242:7.0.RC1] toolkit_installed_version=[7.0.0.5.242:7.0.RC1] aoe_installed_version=[7.0.0.5.242:7.0.RC1] ncs_installed_version=[7.0.0.5.242:7.0.RC1] (base) [root@devserver-d64GB-1 ma-user]# ``` # Docker启动方式启动Docker的过程中需要指定占用的特定芯片 ``` docker run -it -u root --ipc=host --network=host --device=/dev/davinci7 --device=/dev/davinci_manager --device=/dev/devmm_svm --device=/dev/hisi_hdc -v /var/log/npu/:/usr/slog -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi --name cann_7_0 -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /home/aicc/l00564131:/home/aicc/l00564131 --entrypoint=/bin/bash swr.cn-central-221.ovaijisuan.com/wuh-aicc_dxy/pytorch_kernels:PyTorch_1.11-cann7.0rc1_py_3.9-euler_2.8.3-d64GB ``` # Ascend+PyTorch性能极慢 Ascend+PyTorch性能极慢的情况，很可能是因为程序没有实际运行在NPU上，可以通过pdb打印当前运行的芯片信息（当然也可以通过```npu-smi info```查看），以QWEN为例，原始的代码直接在NPU上运行，原始代码为 ``` import torch import torch_npu print("torch && torch_npu import successfully") from modelscope import snapshot_download from transformers import AutoModelForCausalLM, AutoTokenizer from modelscope import GenerationConfig model_dir = snapshot_download('qwen/Qwen-7B-Chat', revision='v1.0.4', cache_dir='/home/aicc/l00564131/models/qwen') tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_dir, device_map="auto", trust_remote_code=True ).eval() print("model load successfully") response, history = model.chat(tokenizer, "你好", history=None) print(response) ``` 执行pdb进行调测 ``` (PyTorch-2.1) [root@devserver-d64GB-1 playground]# python -m pdb run_on_cpu.py > /home/aicc/l00564131/pt/playground/run_on_cpu.py(5)() > -> import torch > (Pdb) b 24 > Breakpoint 1 at /home/aicc/l00564131/pt/playground/run_on_cpu.py:24 > (Pdb) r > Warning : ASCEND_HOME_PATH environment variable is not set. > torch && torch_npu import successfully > 2023-10-18 15:19:57,567 - modelscope - INFO - PyTorch version 2.1.0 Found. > 2023-10-18 15:19:57,570 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer > 2023-10-18 15:19:57,680 - modelscope - INFO - Loading done! Current index file version is 1.9.2, with md5 1df843dce7fce5d45f0fd63b372e06db and a total number of 941 components indexed > 2023-10-18 15:19:58,884 - modelscope - INFO - Use user-specified model revision: v1.0.4 > Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:24<00:00, 3.01s/it] > /home/aicc/l00564131/pt/playground/run_on_cpu.py(24)() > -> print("model load successfully") > (Pdb) print(next(model.parameters()).device) > cpu > (Pdb) ``` 可以通过```print(next(model.parameters()).device)```看到，模型实际依然运行在CPU上，这里就需要显式的指定设备。我们做如下的修改即可显式的指定运行在NPU上。 ``` import torch import torch_npu print("torch && torch_npu import successfully") from modelscope import snapshot_download from transformers import AutoModelForCausalLM, AutoTokenizer from modelscope import GenerationConfig model_dir = snapshot_download('qwen/Qwen-7B-Chat', revision='v1.0.4', cache_dir='/home/aicc/l00564131/models/qwen') tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) torch_device = "npu:0" model = AutoModelForCausalLM.from_pretrained( model_dir, device_map="auto", trust_remote_code=True ).to(torch_device).eval() model.generation_config = GenerationConfig.from_pretrained("qwen/Qwen-7B-Chat", revision='v1.0.5', cache_dir='/home/aicc/l00564131/models/qwen', trust_remote_code=True) # 可指定>不同的生成长度、top_p等相关超参 print("model load successfully") response, history = model.chat(tokenizer, "你好", history=None) print(response) ``` # 环境变量设置通常情况下，当出现一些未正确设置的环境变量时，优先执行以下命令进行环境变量设置 ``` source /usr/local/Ascend/ascend-toolkit/set_env.sh ``` # BF16等未支持的精度问题：RuntimeError: call aclnnSort failed, detail:EZ1001: self not implemented for DT_BFLOAT16, 在PyTorch上我们可能会碰到一些未经适配精度的算子问题，我们是有办法规避他们的，这里以Baichuan2-13B的推理为例原始代码为 ``` import torch import torch_npu from transformers import AutoModelForCausalLM, AutoTokenizer from transformers.generation.utils import GenerationConfig model_dir = "/home/aicc/l00564131/models/baichuan2_13b" torch_device = "npu:0" tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast = False, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_dir, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True ).to(torch_device).eval() model.generation_config = GenerationConfig.from_pretrained( model_dir) print("model load successfully") messages = [] messages.append({"role":"user", "content":"解释一下温故而知新"}) response = model.chat(tokenizer, messages) print(response) ``` 直接执行的结果是 ``` (PyTorch-2.1) [root@devserver-d64GB-1 baichuan2]# python run_on_npu.py Warning : ASCEND_HOME_PATH environment variable is not set. Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers pip install xformers. Loading checkpoint shards: 100%|█████████████████████████████| 3/3 [00:30<00:00, 10.31s/it] model load successfully Traceback (most recent call last): File "/home/aicc/l00564131/pt/baichuan2/run_on_npu.py", line 24, in messages.append({"role":"user", "content":"解释一下温故而知新"}) NameError: name 'messages' is not defined /home/ma-user/anaconda3/envs/PyTorch-2.1/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up _warnings.warn(warn_message, ResourceWarning) (PyTorch-2.1) [root@devserver-d64GB-1 baichuan2]# python run_on_npu.py Warning : ASCEND_HOME_PATH environment variable is not set. Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers pip install xformers. Loading checkpoint shards: 100%|█████████████████████████████| 3/3 [00:29<00:00, 9.94s/it] model load successfully Traceback (most recent call last): File "/home/aicc/l00564131/pt/baichuan2/run_on_npu.py", line 25, in response = model.chat(tokenizer, messages) File "/root/.cache/huggingface/modules/transformers_modules/baichuan2_13b/modeling_baichuan.py", line 825, in chat outputs = self.generate(input_ids, generation_config=generation_config) File "/home/ma-user/anaconda3/envs/PyTorch-2.1/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1/lib/python3.9/site-packages/transformers/generation/utils.py", line 1642, in generate return self.sample( File "/home/ma-user/anaconda3/envs/PyTorch-2.1/lib/python3.9/site-packages/transformers/generation/utils.py", line 2738, in sample next_token_scores = logits_warper(input_ids, next_token_scores) File "/home/ma-user/anaconda3/envs/PyTorch-2.1/lib/python3.9/site-packages/transformers/generation/logits_process.py", line 97, in __call__ scores = processor(input_ids, scores) File "/home/ma-user/anaconda3/envs/PyTorch-2.1/lib/python3.9/site-packages/transformers/generation/logits_process.py", line 419, in __call__ sorted_logits, sorted_indices = torch.sort(scores, descending=False) RuntimeError: call aclnnSort failed, detail:EZ1001: self not implemented for DT_BFLOAT16, should be in dtype support list [DT_FLOAT16,DT_FLOAT,DT_UINT8,DT_INT8,DT_INT16,DT_INT32,DT_INT64,]. /home/ma-user/anaconda3/envs/PyTorch-2.1/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up _warnings.warn(warn_message, ResourceWarning) (PyTorch-2.1) [root@devserver-d64GB-1 baichuan2]# ``` 显示torch.sort()这个算子未提供对应的DT_BFLOAT16在NPU的实现，一种比较暴力的方法是，在报错的地方进行定向的修改（"/home/ma-user/anaconda3/envs/PyTorch-2.1/lib/python3.9/site-packages/transformers/generation/logits_process.py", line 419），显式的增加一步scores = scores.half()，将BF16精度转换为FP16（半精度），程序可以正常运行。 ``` (PyTorch-2.1) [root@devserver-d64GB-1 baichuan2]# python run_on_npu.py NPU options set correctly Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers pip install xformers. Loading checkpoint shards: 100%|██████████████████████████| 3/3 [00:31<00:00, 10.57s/it] model load successfully /home/ma-user/anaconda3/envs/PyTorch-2.1/lib/python3.9/site-packages/transformers/generation/logits_process.py:434: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/_internal/cpython-3.9.17/lib/python3.9/site-packages/torch/include/ATen/core/LegacyTypeDispatch.h:74.) sorted_indices_to_remove[..., -self.min_tokens_to_keep :] = 0 "温故而知新"是一句中国古代名言，出自《论语·为政》篇。这句话的意思是：通过回顾过去的学习和经验，可以从中获得新的理解和感悟。具体来说，它鼓励我们在学习过程中要不断地复习和巩固已学过的知识，以便更好地理解和掌握这些知识。同时，通过学习过去的经验和知识，我们可以发现其中的规律和原理，从而获得新的启示和领悟。这句话强调了学习和成长的一个重要方法，即不断地回顾和反思。通过这种方式，我们可以更好地理解自己的成长过程，发现自己的优点和不足，从而在未来的学习和生活中做出更好的选择。同时，这也有助于我们更好地理解和把握世界的变化和发展，从而在生活中取得更好的成就。 /home/ma-user/anaconda3/envs/PyTorch-2.1/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up _warnings.warn(warn_message, ResourceWarning) (PyTorch-2.1) [root@devserver-d64GB-1 baichuan2]# ``` # 未支持的JIT实现：NotImplementedError: Unknown device for graph fuser PyTorch提供了动态图这种易于上手和编程的机制，作为代价，在古早版本的PyTorch中其性能对比其他框架是有差距的，在这样的背景下PyTorch在1.0就提出了JIT（Just In Time Compilation机制，参考[链接](https://zhuanlan.zhihu.com/p/370455320)），而在当前的一些大模型中，部分JIT实现在NPU上还没有完全适配，对于这类型模型可以提供如下的方式进行规避，这里以ChatGLM3-6B为例来做介绍(PyTorch2.1/CANN7.0)。我们使用如下的代码在NPU上进行推理，显式的指定设备为NPU ``` import torch import torch_npu from modelscope import snapshot_download from transformers import AutoModelForCausalLM, AutoTokenizer model_dir = snapshot_download("ZhipuAI/chatglm3-6b", cache_dir = "/home/ma-user/work/aicc/l00564131/models", revision = "master") tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True) torch_device = "npu:0" model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True).half().to(torch_device) print(next(model.parameters()).device) model = model.eval() print("model eval successfully") response, history = model.chat(tokenizer, "你好", history=[]) print(response) response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history) print(response) ``` 此时会出现以下的报错 ``` (PyTorch-2.1.0) [root@d83bf3d08145 GLM3]# python infer.py torch && torch_npu import successfully 2023-10-28 15:36:51,343 - modelscope - INFO - PyTorch version 2.1.0 Found. 2023-10-28 15:36:51,344 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2023-10-28 15:36:51,406 - modelscope - INFO - Loading done! Current index file version is 1.9.4, with md5 358319940f86983d51801b10d2415602 and a total number of 945 components indexed 2023-10-28 15:36:51,856 - modelscope - WARNING - Using the master branch is fragile, please use it with caution! 2023-10-28 15:36:51,856 - modelscope - INFO - Use user-specified model revision: master Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:17<00:00, 2.53s/it] npu:0 model eval successfully Traceback (most recent call last): File "/home/ma-user/work/aicc/l00564131/pt/GLM3/infer.py", line 19, in response, history = model.chat(tokenizer, "你好", history=[]) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 1035, in chat outputs = self.generate(**inputs, **gen_kwargs, eos_token_id=eos_token_id) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 1572, in generate return self.sample( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 2619, in sample outputs = self( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 937, in forward transformer_outputs = self.transformer( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 830, in forward hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 640, in forward layer_ret = layer( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 544, in forward attention_output, kv_cache = self.self_attention( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 408, in forward query_layer = apply_rotary_pos_emb(query_layer, rotary_pos_emb) NotImplementedError: Unknown device for graph fuser /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up _warnings.warn(warn_message, ResourceWarning) (PyTorch-2.1.0) [root@d83bf3d08145 GLM3]# ``` 看报错信息，不难发现是因为apply_rotary_pos_emb没有提供对应设备上的实现引起的，这里就比较有意思了，我们查看下载文件路径下的modeling_chatglm.py文件（注意不是报错中的/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py这个文件，而是和保存权重统一路径下的modeling_chatglm.py） ``` 160 @torch.jit.script 161 def apply_rotary_pos_emb(x: torch.Tensor, rope_cache: torch.Tensor) -> torch.Tensor: 162 # x: [sq, b, np, hn] 163 sq, b, np, hn = x.size(0), x.size(1), x.size(2), x.size(3) 164 rot_dim = rope_cache.shape[-2] * 2 165 x, x_pass = x[..., :rot_dim], x[..., rot_dim:] 166 # truncate to support variable sizes 167 rope_cache = rope_cache[:sq] 168 xshaped = x.reshape(sq, -1, np, rot_dim // 2, 2) 169 rope_cache = rope_cache.view(sq, -1, 1, xshaped.size(3), 2) 170 x_out2 = torch.stack( 171 [ 172 xshaped[..., 0] * rope_cache[..., 0] - xshaped[..., 1] * rope_cache[..., 1], 173 xshaped[..., 1] * rope_cache[..., 0] + xshaped[..., 0] * rope_cache[..., 1], 174 ], 175 -1, 176 ) 177 x_out2 = x_out2.flatten(3) 178 return torch.cat((x_out2, x_pass), dim=-1) ``` 会发现其实这里提供了对应torch.jit.script的实现，那么为了解决上述问题，我们其实可以使用一个非jit的实现即可，这里我们重新创建一个新的apply_rotary_pos_emb_pt()，函数与apply_rotary_pos_emb()完全一致，只有一点不同，那就是去掉了jit的修饰词，修改后的代码区别如下 ``` (PyTorch-2.1.0) [root@d83bf3d08145 chatglm3-6b]# diff modeling_chatglm.py modeling_chatglm.py.ok 179a180,198 > def apply_rotary_pos_emb_pt(x: torch.Tensor, rope_cache: torch.Tensor) -> torch.Tensor: > > # x: [sq, b, np, hn] > > sq, b, np, hn = x.size(0), x.size(1), x.size(2), x.size(3) > rot_dim = rope_cache.shape[-2] * 2 > x, x_pass = x[..., :rot_dim], x[..., rot_dim:] > > # truncate to support variable sizes > > rope_cache = rope_cache[:sq] > xshaped = x.reshape(sq, -1, np, rot_dim // 2, 2) > rope_cache = rope_cache.view(sq, -1, 1, xshaped.size(3), 2) > x_out2 = torch.stack( > [ > xshaped[..., 0] * rope_cache[..., 0] - xshaped[..., 1] * rope_cache[..., 1], > xshaped[..., 1] * rope_cache[..., 0] + xshaped[..., 0] * rope_cache[..., 1], > ], > -1, > ) > x_out2 = x_out2.flatten(3) > return torch.cat((x_out2, x_pass), dim=-1) 408,409c427,428 < query_layer = apply_rotary_pos_emb(query_layer, rotary_pos_emb) < key_layer = apply_rotary_pos_emb(key_layer, rotary_pos_emb) ------------------------------------------------------------------------- > ``` > query_layer = apply_rotary_pos_emb_pt(query_layer, rotary_pos_emb) > key_layer = apply_rotary_pos_emb_pt(key_layer, rotary_pos_emb) > ``` (PyTorch-2.1.0) [root@d83bf3d08145 chatglm3-6b]# ``` 此时我们再次进行推理，即可得到正确的ChatGLM3-6B在NPU上的测试结果！ ``` (PyTorch-2.1.0) [root@d83bf3d08145 GLM3]# python infer.py torch && torch_npu import successfully 2023-10-28 15:43:59,946 - modelscope - INFO - PyTorch version 2.1.0 Found. 2023-10-28 15:43:59,947 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2023-10-28 15:44:00,010 - modelscope - INFO - Loading done! Current index file version is 1.9.4, with md5 358319940f86983d51801b10d2415602 and a total number of 945 components indexed 2023-10-28 15:44:00,563 - modelscope - WARNING - Using the master branch is fragile, please use it with caution! 2023-10-28 15:44:00,563 - modelscope - INFO - Use user-specified model revision: master Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:17<00:00, 2.45s/it] npu:0 model eval successfully 你好！我是人工智能助手 ChatGLM3-6B，很高兴见到你，欢迎问我任何问题。晚上睡不着应该采取以下措施： 1. 尝试放松：深呼吸、冥想、渐进性肌肉松弛等放松技巧，帮助减轻压力和焦虑，促进睡眠。 2. 规律作息：保持规律的作息时间，每天尽量在相同的时间上床睡觉和起床，有助于调整身体的生物钟。 3. 减少使用电子设备：睡前避免使用电子设备，如手机、平板电脑等，这些设备会发出蓝光，影响褪黑激素的分泌，从而影响睡眠。 4. 创造良好的睡眠环境：保持卧室安静、舒适、黑暗，可以选择使用眼罩和耳塞等辅助工具。 5. 适量运动：白天进行适量运动，如散步、跑步等，有助于晚上更好地入睡。但避免在临近睡觉前进行剧烈运动。 6. 饮食注意事项：避免在睡前过量进食或饮用咖啡因、酒精等刺激性饮料。 7. 尝试睡眠辅助工具：如睡眠药物或睡眠追踪器等，但在使用前请咨询医生建议。如果以上方法不能解决你的问题，建议咨询专业医生或睡眠专家，获得更具体的建议和诊断。 /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up _warnings.warn(warn_message, ResourceWarning) (PyTorch-2.1.0) [root@d83bf3d08145 GLM3]# ``` # 未正确设置上下文：Check whether acl.rt.set_context or acl.rt.set_device is called 我们在执行一些代码的过程中，可能会发现在NPU上运行这些代码会碰到未正确设置context带来的问题。这里依然是以ChatGLM3为例，在安装对应的依赖后，我们执行`streamlit run main.py --server.port 31009`启动一个后端页面，输入一段文字后出现以下的报错 ``` (chatglm3-demo) [root@1519db6dd380 composite_demo]# streamlit run main.py --server.port 31009 Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False. You can now view your Streamlit app in your browser. Network URL: http://172.17.0.2:31009 External URL: http://27.18.114.8:31009 Warning : ASCEND_HOME_PATH environment variable is not set. 2023-11-02 08:53:13,321 - modelscope - INFO - PyTorch version 2.1.0 Found. 2023-11-02 08:53:13,322 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2023-11-02 08:53:13,385 - modelscope - INFO - Loading done! Current index file version is 1.9.4, with md5 7e7f003c8d5c0503c47bbef5888adfd1 and a total number of 945 components indexed 2023-11-02 08:53:14,030 - modelscope - WARNING - Using the master branch is fragile, please use it with caution! 2023-11-02 08:53:14,030 - modelscope - INFO - Use user-specified model revision: master Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:17<00:00, 2.53s/it] /usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/context/op_context.py:38: DeprecationWarning: currentThread() is deprecated, use current_thread() instead return _contexts.setdefault(threading.currentThread().ident, []) [registered tool] {'description': 'Generates a random number x, s.t. range[0] <= x < range[1]', 'name': 'random_number_generator', 'params': [{'description': 'The random seed used by the generator', 'name': 'seed', 'required': True, 'type': 'int'}, {'description': 'The range of the generated numbers', 'name': 'range', 'required': True, 'type': 'tuple[int, int]'}]} [registered tool] {'description': 'Get the current weather for `city_name`', 'name': 'get_weather', 'params': [{'description': 'The name of the city to be queried', 'name': 'city_name', 'required': True, 'type': 'str'}]} <|user|> Hello None === Input: <|system|> You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown.<|user|> Hello<|assistant|> === History: [Conversation(role=, content='Hello', tool=None, image=None)] 2023-11-02 08:54:44.938 Uncaught app exception Traceback (most recent call last): File "/home/ma-user/anaconda3/envs/chatglm3-demo/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script exec(code, module.__dict__) File "/home/ma-user/work/aicc/l00564131/pt/chatglm3/composite_demo/main.py", line 50, in demo_chat.main(top_p, temperature, system_prompt, prompt_text) File "/home/ma-user/work/aicc/l00564131/pt/chatglm3/composite_demo/demo_chat.py", line 50, in main for response in client.generate_stream( File "/home/ma-user/work/aicc/l00564131/pt/chatglm3/composite_demo/client.py", line 118, in generate_stream for new_text, _ in stream_chat(self.model, File "/home/ma-user/work/aicc/l00564131/pt/chatglm3/composite_demo/client.py", line 61, in stream_chat inputs = inputs.to(self.device) File "/home/ma-user/anaconda3/envs/chatglm3-demo/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 777, in to self.data = {k: v.to(device=device) for k, v in self.data.items()} File "/home/ma-user/anaconda3/envs/chatglm3-demo/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 777, in self.data = {k: v.to(device=device) for k, v in self.data.items()} RuntimeError: allocate:/usr1/03/workspace/j_dH0bSshu/pytorch/torch_npu/csrc/core/npu/NPUCachingAllocator.cpp:1406 NPU error, error code is 107002 [Error]: The context is empty. Check whether acl.rt.set_context or acl.rt.set_device is called. EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null] Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship. TraceBack (most recent call last): ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4290] The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null] ``` 我们只需要在原始的报错处`/home/ma-user/work/aicc/l00564131/pt/chatglm3/composite_demo/client.py`的61行处进行修改，通过`torch.npu.set_device()`显式的设定npu的context即可。 ``` [root@devserver-fae-mirrors composite_demo]# diff client.py client.py.bk 60d59 < torch.npu.set_device(self.device) [root@devserver-fae-mirrors composite_demo]# ``` 修改后重新通过streamlit拉起应用，即可以得到正确的流式推理结果。 # Tensor未正确的设置到NPU上：RuntimeError: only npu tensor is supported 这里以01AI/Yi-6B-200K运行过程中碰到的问题为例，官方给出的推理代码如下所示 ``` from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("01-ai/Yi-34B", device_map="auto", torch_dtype="auto", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("01-ai/Yi-34B", trust_remote_code=True) inputs = tokenizer("There's a place where time stands still. A place of breath taking wonder, but also", return_tensors="pt") max_length = 256 outputs = model.generate( inputs.input_ids.cuda(), max_length=max_length, eos_token_id=tokenizer.eos_token_id do_sample=True, repetition_penalty=1.3, no_repeat_ngram_size=5, temperature=0.7, top_k=40, top_p=0.8, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` 按照一些常规的对本地加载模型和NPU的改动，我们将推理代码修改为 ``` from transformers import AutoModelForCausalLM, AutoTokenizer import torch import torch_npu from modelscope import snapshot_download model_path = snapshot_download("01ai/Yi-6B-200K", cache_dir = "/home/ma-user/work/aicc/models", revision = "master") torch_device = "npu:0" model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype="auto", trust_remote_code=True).half().to(torch_device) model = model.eval() tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) inputs = tokenizer("There's a place where time stands still. A place of breath taking wonder, but also", return_tensors="pt") max_length = 256 outputs = model.generate( inputs.input_ids, max_length=max_length, eos_token_id=tokenizer.eos_token_id, do_sample=True, repetition_penalty=1.3, no_repeat_ngram_size=5, temperature=0.7, top_k=40, top_p=0.8, ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` 此时执行代码，我们可能会碰到以下的报错 ``` (PyTorch-2.1.0) [root@1519db6dd380 Yi]# python inf.py Warning : ASCEND_HOME_PATH environment variable is not set. 2023-11-09 14:46:57,006 - modelscope - INFO - PyTorch version 2.1.0 Found. 2023-11-09 14:46:57,007 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2023-11-09 14:46:57,071 - modelscope - INFO - Loading done! Current index file version is 1.9.4, with md5 d9bf8cd9801e1ca5a36d1e63da64d502 and a total number of 945 components indexed Embedding set to NPU:0 successfully Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:13<00:00, 6.69s/it] /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/generation/utils.py:1452: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on npu. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('npu') before running `.generate()`. warnings.warn( DEBUG::INFO input_ids - tensor([[ 6444, 59610, 59575, 562, 1700, 1151, 922, 8954, 1451, 98, 647, 1700, 593, 8253, 2863, 3755, 97, 796, 962]]) Traceback (most recent call last): File "/home/ma-user/work/aicc/l00564131/pt/yi/Yi/inf.py", line 17, in outputs = model.generate( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 1572, in generate return self.sample( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 2619, in sample outputs = self( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/Yi-6B-200K/modeling_yi.py", line 823, in forward outputs = self.model( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/Yi-6B-200K/modeling_yi.py", line 642, in forward inputs_embeds = self.embed_tokens(input_ids) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 162, in forward return F.embedding( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/nn/functional.py", line 2233, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: only npu tensor is supported /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up _warnings.warn(warn_message, ResourceWarning) (PyTorch-2.1.0) [root@1519db6dd380 Yi]# ``` 这种问题在之前的ChatGLM3、QWEN的使用过程中没有出现过，以ChatGLM3为例，我们打印input_ids的信息，可以看到input_ids是一个设备已经指定为npu的tensor。 ``` input_ids tensor([[64790, 64792, 64794, 30910, 13, 20115, 267, 1762, 2554, 362, 1077, 362, 344, 457, 30930, 809, 431, 1675, 289, 267, 1762, 4159, 30954, 13, 30995, 13, 296, 30982, 13, 352, 30955, 2323, 2932, 449, 287, 30948, 429, 1252, 13, 352, 30955, 16302, 2932, 449, 30938, 8870, 3567, 1149, 895, 30948, 429, 30930, 1916, 332, 9168, 293, 15082, 1224, 5885, 5568, 291, 2463, 5469, 23490, 13, 352, 30955, 24940, 2932, 729, 13, 753, 30955, 3543, 2932, 449, 8178, 1252, 13, 753, 30955, 27448, 2932, 729, 13, 647, 30955, 8272, 2932, 729, 13, 647, 296, 30955, 3543, 2932, 449, 4423, 1252, 13, 647, 296, 30955, 16302, 2932, 449, 1036, 2659, 9065, 3211, 30955, 13, 647, 30983, 13, 753, 30983, 13, 352, 4143, 13, 352, 30955, 20379, 2932, 790, 13, 753, 30955, 8272, 30955, 13, 352, 30996, 13, 296, 30983, 13, 30996, 64795, 30910, 13, 53560, 55013, 55381, 31809, 55079, 54918, 55072, 54862, 31301, 20876, 397, 5301, 13930, 14308, 283, 31300, 54530, 33012, 31707, 64796]], device='npu:0') ``` 然而目前的input_ids没有进行这样的指定，打印起信息为 ``` input_ids - tensor([[ 6444, 59610, 59575, 562, 1700, 1151, 922, 8954, 1451, 98, 647, 1700, 593, 8253, 2863, 3755, 97, 796, 962]]) ``` 我们显式的指定torch_device即可修改此问题 ``` (PyTorch-2.1.0) [root@1519db6dd380 Yi]# diff inf.py inf.py.OK 18c18 < inputs.input_ids, ----------------------- > ``` > inputs.input_ids.to(torch_device), > ``` (PyTorch-2.1.0) [root@1519db6dd380 Yi]# ``` 可以正确得到YI-6B-200K的推理结果 ``` (PyTorch-2.1.0) [root@1519db6dd380 Yi]# python inf.py Warning : ASCEND_HOME_PATH environment variable is not set. 2023-11-09 15:19:57,195 - modelscope - INFO - PyTorch version 2.1.0 Found. 2023-11-09 15:19:57,196 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2023-11-09 15:19:57,258 - modelscope - INFO - Loading done! Current index file version is 1.9.4, with md5 d9bf8cd9801e1ca5a36d1e63da64d502 and a total number of 945 components indexed Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00, 6.25s/it] /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/generation/logits_process.py:495: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at /opt/_internal/cpython-3.9.17/lib/python3.9/site-packages/torch/include/ATen/core/LegacyTypeDispatch.h:74.) scores[i, banned_tokens] = -float("inf") There's a place where time stands still. A place of breath taking wonder, but also one that is filled with fear and danger at every turn . This was the world I had known as my home for over two thousand years; this is what it felt like to be back in its embrace once again after so many long months away from her shores. The first thing you notice when arriving on these islands are all those people who seem very different than your own kind: they have skin tones ranging anywhere between dark tan through almost black right up into an alabaster white complexion while their hair color runs the gamut too - there could even possibly exist some individuals whose eyes shine brightly blue or green instead! It would take more research before anyone can say exactly how much diversity exists here among us humans living together peacefully under same roof called Earth which makes me proud just being able call myself part-time resident now :) So if ever someone asks "What do u think about coming across such variety during travels?" Just smile & answer simply yet confidently : 'It’s beautiful isn’t it dear friend ;-) ### What did she look forward most upon returning to England? How does Tessa feel towards Jem? Is he /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up _warnings.warn(warn_message, ResourceWarning) (PyTorch-2.1.0) [root@1519db6dd380 Yi]# ``` ``` ```