Errors Related to GPU
ubuntu-drivers
UnboundLocalError: local variable 'version' referenced before assignment
This error solved with upgrading Ubuntu version from 22.04.4 to 24.04.2.
nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
My Computer equipped NVIDIA RTX 2060 so the driver version 550 should have been installed.
$ sudo ubuntu-drivers install nvidia:550
ref. NVIDIA drivers installation | Ubuntu
I tried to install it by the official documentation, but the following error occurred.
The following packages have unmet dependencies:
nvidia-kernel-common-550 : Conflicts: nvidia-kernel-common
nvidia-kernel-common-570 : Conflicts: nvidia-kernel-common
E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.
The following command runs correctly.
$ sudo apt install nvidia-driver-550
This package has completely installed but nvidia-smi
still doesn't work. It works after checking the status and rebooting.
$ dkms status
nvidia/550.144.03, 6.8.0-53-generic, x86_64: installed
$ sudo shutdown -r now
$ nvidia-smi
Tue Feb 25 02:28:11 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2060 Off | 00000000:01:00.0 On | N/A |
| 37% 50C P8 8W / 160W | 117MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1596 G /usr/lib/xorg/Xorg 105MiB |
| 0 N/A N/A 2162 G /usr/bin/gnome-shell 9MiB |
+-----------------------------------------------------------------------------------------+
Requiring Packages are Missing
In these cases, you must install the required packages.
'torch' has no attribute 'float8_e4m3fn'
ref. python - 'torch' has no attribute 'float8_e4m3fn' - Stack Overflow
ImportError: cannot import name 'cached_download' from 'huggingface_hub'
Difference of Package Version
This error probably occurs due to the difference of the package version.
ModuleNotFoundError: No module named 'diffusers.pipeline_utils'
You need to edit as shown below:
- from diffusers.pipeline_utils import DiffusionPipeline
+ from diffusers.pipelines.pipeline_utils import DiffusionPipeline
Fetching Model from HuggingFace
Error while deserializing header: HeaderTooLarge
ref. meta-llama/Meta-Llama-3-8B-Instruct · Error while deserializing header: HeaderTooLarge
You must fetch the repository by your huggingface account.
$ sudo apt-get update
$ sudo apt-get install git-lfs
$ git lfs install
$ git clone git@hf.co:<user>/<model>
Password authentication is no longer supported, so you must register your ssh key in huggingface and run the command as shown above.
Invalid Package
RuntimeError: operator torchvision::nms does not exist
The following packages are installed in my environment. The version of torch
and CUDA is the same but it is invalid.
$ pip list | grep torch
pytorch-lightning 2.5.0.post0
torch 2.6.0+cu126
torchmetrics 1.6.1
torchvision 0.21.0
$ pip uninstall torch
$ pip install torch==2.6.0