ImportError: dlopen(/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so, 0x0002):
tried: '/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so'(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')),
'/System/Volumes/Preboot/Cryptexes/OS/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so'(no such file),
'/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so'(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))
python3 -m llama.download --model_size 7B
❤️ Resume download is supported. You can ctrl-c and rerun the program to resume the downloading
Downloading tokenizer...
✅ pyllama_data/tokenizer.model
✅ pyllama_data/tokenizer_checklist.chk
tokenizer.model: OK
Downloading 7B
downloading file to pyllama_data/7B/consolidated.00.pth ...please waitfor a few minutes ...
✅ pyllama_data/7B/consolidated.00.pth
✅ pyllama_data/7B/params.json
✅ pyllama_data/7B/checklist.chk
Checking checksums for the 7B model
consolidated.00.pth: OK
params.json: OK
2.1.3 脚本下载(作者使用的这个方法)
#!/bin/bash# Function to handle stopping the scriptfunctionstop_script(){echo"Stopping the script."exit0}# Register the signal handlertrap stop_script SIGINT
whiletrue;do# Run the command with a timeout of 200 secondstimeout2000 python -m llama.download --model_size$1--folder model
echo"restart download"sleep1# Wait for 1 second before starting the next iteration# Wait for any key to be pressed within a 1-second timeoutread-t1-n1-s key
if[[$key]];then
stop_script
fidone
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: Tesla V100S-PCIE-32GB, compute capability 7.0, VMM: yes
llm_load_tensors: ggml ctx size =0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size =1002.00 MiB
llm_load_tensors: CUDA0 buffer size =14315.02 MiB
.........................................................................................
llama_new_context_with_model: n_ctx =512
llama_new_context_with_model: n_batch =512
llama_new_context_with_model: n_ubatch =512
llama_new_context_with_model: flash_attn =0
会启动一个类似web服务器的进程,默认端口号为8080,这样就启动了一个 API 服务,可以使用 curl 命令进行测试。
curl--request POST
--url http://localhost:8080/completion
--header"Content-Type: application/json"--data'{"prompt": "What color is the sun?","n_predict": 512}'{"content":".....","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,......}}
此外可通过web页面或者OpenAI api等进行访问。安装openai依赖
pip install openai
使用OpenAI api访问:
import openai
client = openai.OpenAI(
base_url="http://127.0.0.1:8080/v1",
api_key ="sk-no-key-required")
completion = client.chat.completions.create(
model="qwen",# model name can be chosen arbitrarily
messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"tell me something about michael jordan"}])print(completion.choices[0].message.content)
ImportError: dlopen(/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so, 0x0002):
tried: '/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so'(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')),
'/System/Volumes/Preboot/Cryptexes/OS/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so'(no such file),
'/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so'(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))
python3 -m llama.download --model_size 7B
❤️ Resume download is supported. You can ctrl-c and rerun the program to resume the downloading
Downloading tokenizer...
✅ pyllama_data/tokenizer.model
✅ pyllama_data/tokenizer_checklist.chk
tokenizer.model: OK
Downloading 7B
downloading file to pyllama_data/7B/consolidated.00.pth ...please waitfor a few minutes ...
✅ pyllama_data/7B/consolidated.00.pth
✅ pyllama_data/7B/params.json
✅ pyllama_data/7B/checklist.chk
Checking checksums for the 7B model
consolidated.00.pth: OK
params.json: OK
2.1.3 脚本下载(作者使用的这个方法)
#!/bin/bash# Function to handle stopping the scriptfunctionstop_script(){echo"Stopping the script."exit0}# Register the signal handlertrap stop_script SIGINT
whiletrue;do# Run the command with a timeout of 200 secondstimeout2000 python -m llama.download --model_size$1--folder model
echo"restart download"sleep1# Wait for 1 second before starting the next iteration# Wait for any key to be pressed within a 1-second timeoutread-t1-n1-s key
if[[$key]];then
stop_script
fidone
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: Tesla V100S-PCIE-32GB, compute capability 7.0, VMM: yes
llm_load_tensors: ggml ctx size =0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size =1002.00 MiB
llm_load_tensors: CUDA0 buffer size =14315.02 MiB
.........................................................................................
llama_new_context_with_model: n_ctx =512
llama_new_context_with_model: n_batch =512
llama_new_context_with_model: n_ubatch =512
llama_new_context_with_model: flash_attn =0
会启动一个类似web服务器的进程,默认端口号为8080,这样就启动了一个 API 服务,可以使用 curl 命令进行测试。
curl--request POST
--url http://localhost:8080/completion
--header"Content-Type: application/json"--data'{"prompt": "What color is the sun?","n_predict": 512}'{"content":".....","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,......}}
此外可通过web页面或者OpenAI api等进行访问。安装openai依赖
pip install openai
使用OpenAI api访问:
import openai
client = openai.OpenAI(
base_url="http://127.0.0.1:8080/v1",
api_key ="sk-no-key-required")
completion = client.chat.completions.create(
model="qwen",# model name can be chosen arbitrarily
messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"tell me something about michael jordan"}])print(completion.choices[0].message.content)
ImportError: dlopen(/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so, 0x0002):
tried: '/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so'(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')),
'/System/Volumes/Preboot/Cryptexes/OS/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so'(no such file),
'/Library/Python/3.9/site-packages/_itree.cpython-39-darwin.so'(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))
python3 -m llama.download --model_size 7B
❤️ Resume download is supported. You can ctrl-c and rerun the program to resume the downloading
Downloading tokenizer...
✅ pyllama_data/tokenizer.model
✅ pyllama_data/tokenizer_checklist.chk
tokenizer.model: OK
Downloading 7B
downloading file to pyllama_data/7B/consolidated.00.pth ...please waitfor a few minutes ...
✅ pyllama_data/7B/consolidated.00.pth
✅ pyllama_data/7B/params.json
✅ pyllama_data/7B/checklist.chk
Checking checksums for the 7B model
consolidated.00.pth: OK
params.json: OK
2.1.3 脚本下载(作者使用的这个方法)
#!/bin/bash# Function to handle stopping the scriptfunctionstop_script(){echo"Stopping the script."exit0}# Register the signal handlertrap stop_script SIGINT
whiletrue;do# Run the command with a timeout of 200 secondstimeout2000 python -m llama.download --model_size$1--folder model
echo"restart download"sleep1# Wait for 1 second before starting the next iteration# Wait for any key to be pressed within a 1-second timeoutread-t1-n1-s key
if[[$key]];then
stop_script
fidone
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
ggml_cuda_init: found 1 CUDA devices:
Device 0: Tesla V100S-PCIE-32GB, compute capability 7.0, VMM: yes
llm_load_tensors: ggml ctx size =0.30 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size =1002.00 MiB
llm_load_tensors: CUDA0 buffer size =14315.02 MiB
.........................................................................................
llama_new_context_with_model: n_ctx =512
llama_new_context_with_model: n_batch =512
llama_new_context_with_model: n_ubatch =512
llama_new_context_with_model: flash_attn =0
会启动一个类似web服务器的进程,默认端口号为8080,这样就启动了一个 API 服务,可以使用 curl 命令进行测试。
curl--request POST
--url http://localhost:8080/completion
--header"Content-Type: application/json"--data'{"prompt": "What color is the sun?","n_predict": 512}'{"content":".....","generation_settings":{"frequency_penalty":0.0,"grammar":"","ignore_eos":false,"logit_bias":[],"mirostat":0,"mirostat_eta":0.10000000149011612,"mirostat_tau":5.0,......}}
此外可通过web页面或者OpenAI api等进行访问。安装openai依赖
pip install openai
使用OpenAI api访问:
import openai
client = openai.OpenAI(
base_url="http://127.0.0.1:8080/v1",
api_key ="sk-no-key-required")
completion = client.chat.completions.create(
model="qwen",# model name can be chosen arbitrarily
messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"tell me something about michael jordan"}])print(completion.choices[0].message.content)