返回列表
ModelScope魔搭X昇腾快速上手
2024-01-16
浏览量:4610
# 背景 ModelScope作为更适合中国宝宝体质的HuggingFace社区,别的不说,在模型下载和获取方面很好的扮演了一个镜像平台的角色。除了下载模型以外,ModelScope提供了类似Transformers库语法风格和定义的一套接口。而在本文成文时(2024.1),ModelScope社区尚未公开支持Ascend系列硬件,但在对ModelScope的代码做了一定的研读后,发现对于ModelScope的源码进行少数的基础改动(得益于良好的代码可读性和松散的耦合关系),原始的ModelScope代码就可以基于Ascend系列硬件运行(ModelScope的几个官方用例可以跑通)。 具体的修改请见下文。 # 测试环境 pytorch == 2.1.0 modelscope == 1.9.4 硬件==64GB # 官方示例 ``` from modelscope.pipelines import pipeline word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base') input_str = '今天天气不错,适合出去游玩' print(word_segmentation(input_str)) ``` 为了更好的观察模型的运行情况,我们稍微修改下打印这部分的代码 ``` from modelscope.pipelines import pipeline word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base') input_str = '今天天气不错,适合出去游玩' print("word segment result is {} on device {}".format(word_segmentation(input_str), next(word_segmentation.model.parameters()).device)) ``` 输出为 ``` word segment result is {'output': ['今天', '天气', '不错', ',', '适合', '出去', '游玩']} on device cpu ``` 按照通常指定设备的信息,我们需要设置NPU的设备 ``` from modelscope.pipelines import pipeline import torch_npu device = "npu:0" word_segmentation_npu = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base', device = device) input_str = '今天天气不错,适合出去游玩' print("word segment result is {} on device {}".format(word_segmentation(input_str), next(word_segmentation.model.parameters()).device)) ``` 执行以上代码会出现报错 ``` (PyTorch-2.1.0) [root@4bfd19a25abf playground]# python npu_orig.py 2024-01-16 09:05:49,901 - modelscope - INFO - PyTorch version 2.1.0 Found. 2024-01-16 09:05:49,902 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer 2024-01-16 09:05:50,107 - modelscope - INFO - Loading done! Current index file version is 1.9.4, with md5 6354b5190fb2274895e8f10bfc329a7d and a total number of 945 components indexed Warning : ASCEND_HOME_PATH environment variable is not set. 2024-01-16 09:05:53,885 - modelscope - WARNING - Model revision not specified, use revision: v1.0.3 Traceback (most recent call last): File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/utils/registry.py", line 212, in build_from_cfg return obj_cls(**args) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/nlp/token_classification_pipeline.py", line 50, in __init__ super().__init__( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/base.py", line 95, in __init__ verify_device(device) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/utils/device.py", line 27, in verify_device assert eles[0] in ['cpu', 'cuda', 'gpu'], err_msg AssertionError: device should be either cpu, cuda, gpu, gpu:X or cuda:X where X is the ordinal for gpu device. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/aicc/playground/npu_orig.py", line 5, in word_segmentation_npu = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base', device = device) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/builder.py", line 164, in pipeline return build_pipeline(cfg, task_name=task) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/pipelines/builder.py", line 67, in build_pipeline return build_from_cfg( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/modelscope/utils/registry.py", line 215, in build_from_cfg raise type(e)(f'{obj_cls.__name__}: {e}') AssertionError: WordSegmentationPipeline: device should be either cpu, cuda, gpu, gpu:X or cuda:X where X is the ordinal for gpu device. (PyTorch-2.1.0) [root@4bfd19a25abf playground]# ``` 在当前modelscope已经注册的设备中还没有包含npu,那么我们截下来可以对 `modelscope/utils/device.py`这部分代码稍作修改。这部分的函数主要有3个函数待修改:`verify_device`、`device_placement`、`create_device`。 这里直接将修改后的device.py贴在此处,以供参考 ``` # Copyright (c) Alibaba, Inc. and its affiliates. import os from contextlib import contextmanager from modelscope.utils.constant import Devices, Frameworks from modelscope.utils.logger import get_logger logger = get_logger() def verify_device(device_name): """ Verify device is valid, device should be either cpu, cuda, gpu, cuda:X or gpu:X. Args: device (str): device str, should be either cpu, cuda, gpu, gpu:X or cuda:X where X is the ordinal for gpu device. Return: device info (tuple): device_type and device_id, if device_id is not set, will use 0 as default. """ err_msg = 'device should be either cpu, cuda, gpu, gpu:X or cuda:X where X is the ordinal for gpu device.' assert device_name is not None and device_name != '', err_msg device_name = device_name.lower() eles = device_name.split(':') assert len(eles) <= 2, err_msg assert device_name is not None assert eles[0] in ['cpu', 'cuda', 'gpu', 'npu'], err_msg device_type = eles[0] device_id = None if len(eles) > 1: device_id = int(eles[1]) if device_type == 'cuda': device_type = Devices.gpu if device_type == Devices.gpu and device_id is None: device_id = 0 return device_type, device_id @contextmanager def device_placement(framework, device_name='gpu:0'): """ Device placement function, allow user to specify which device to place model or tensor Args: framework (str): tensorflow or pytorch. device (str): gpu or cpu to use, if you want to specify certain gpu, use gpu:$gpu_id or cuda:$gpu_id. Returns: Context manager Examples: >>> # Requests for using model on cuda:0 for gpu >>> with device_placement('pytorch', device='gpu:0'): >>> model = Model.from_pretrained(...) """ device_type, device_id = verify_device(device_name) if framework == Frameworks.tf: import tensorflow as tf if device_type == Devices.gpu and not tf.test.is_gpu_available(): logger.debug( 'tensorflow: cuda is not available, using cpu instead.') device_type = Devices.cpu if device_type == Devices.cpu: with tf.device('/CPU:0'): yield else: if device_type == Devices.gpu: with tf.device(f'/device:gpu:{device_id}'): yield elif framework == Frameworks.torch: import torch import torch_npu if device_type == Devices.gpu: if torch.cuda.is_available(): torch.cuda.set_device(f'cuda:{device_id}') else: logger.debug( 'pytorch: cuda is not available, using cpu instead.') elif device_type == "npu": torch.npu.set_device(f'npu:{device_id}') yield else: yield def create_device(device_name): """ create torch device Args: device_name (str): cpu, gpu, gpu:0, cuda:0 etc. """ import torch import torch_npu device_type, device_id = verify_device(device_name) use_cuda = False if device_type == Devices.gpu: use_cuda = True if not torch.cuda.is_available(): logger.info('cuda is not available, using cpu instead.') use_cuda = False if device_type == "npu": torch_npu.npu.set_device(f"npu:{device_id}") device = torch.device(f"npu:{device_id}") elif use_cuda: device = torch.device(f'cuda:{device_id}') else: device = torch.device('cpu') return device def get_device(): import torch from torch import distributed as dist if torch.cuda.is_available(): if dist.is_available() and dist.is_initialized( ) and 'LOCAL_RANK' in os.environ: device_id = f"cuda:{os.environ['LOCAL_RANK']}" else: device_id = 'cuda:0' else: device_id = 'cpu' return torch.device(device_id) ``` # 结果比较 我们加上性能的打点,然后比较两者之间的差异,可以看到NPU的性能远高于CPU执行推理的性能 原始代码为 ``` from modelscope.pipelines import pipeline import torch_npu import time word_segmentation = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base') input_str = '今天天气不错,适合出去游玩' tik = time.time() result = word_segmentation(input_str) tok = time.time() print("word segment result is {} on device {} with perf {} tokens/s".format(result, next(word_segmentation.model.parameters()).device, len(result)/(tok-tik))) device = "npu:0" word_segmentation_npu = pipeline('word-segmentation',model='damo/nlp_structbert_word-segmentation_chinese-base', device = device) input_str = '今天天气不错,适合出去游玩' tik = time.time() result = word_segmentation_npu(input_str) tok = time.time() print("word segment result is {} on device {} with perf {} tokens/s".format(result, next(word_segmentation_npu.model.parameters()).device, len(result)/(tok-tik))) ``` 输出为(已经删除了一些冗余的打印内容) ``` word segment result is {'output': ['今天', '天气', '不错', ',', '适合', '出去', '游玩']} on device cpu with perf 0.34250816868505096 tokens/s word segment result is {'output': ['今天', '天气', '不错', ',', '适合', '出去', '游玩']} on device npu:0 with perf 1.7934692348675776 tokens/s ```