Deep Learning and Machine Learning with CUDA

A course designed for Deep Learning and Machine Learning with CUDA, specially focusing on the basics.


github Source : https://github.com/kcsanjeeb/GPU-CUDA-Study

Reference

Deep Learning and Machine Learning with GPGPU and CUDA: Unlocking the Power of Parallel Computing

Ming Li, Ziqian Bi, Tianyang Wang, Yizhu Wen, Qian Niu, Junyu Liu, Benji Peng, Sen Zhang, Xuanhe Pan, Jiawei Xu, Jinlang Wang, Keyu Chen, Caitlyn Heqi Yin, Pohsun Feng, Ming Liu

https://arxiv.org/abs/2410.05686

Basic GPU Information

1. NVIDIA System Management Interface (nvidia-smi)

In [2]:
! nvidia-smi
Out [2]:
Fri Sep 26 08:59:44 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-PCIE-32GB            On | 00000000:3B:00.0 Off |                    0 |
| N/A   34C    P0               25W / 250W|      4MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE-32GB            On | 00000000:86:00.0 Off |                    0 |
| N/A   33C    P0               23W / 250W|      4MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla V100-PCIE-32GB            On | 00000000:AF:00.0 Off |                    0 |
| N/A   33C    P0               26W / 250W|      4MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla V100-PCIE-32GB            On | 00000000:D8:00.0 Off |                    0 |
| N/A   33C    P0               24W / 250W|      4MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

2. Simple GPU listing

In [3]:
! nvidia-smi --query-gpu=index,name,memory.total,memory.free --format=csv
Out [3]:
index, name, memory.total [MiB], memory.free [MiB]
0, Tesla V100-PCIE-32GB, 32768 MiB, 32497 MiB
1, Tesla V100-PCIE-32GB, 32768 MiB, 32497 MiB
2, Tesla V100-PCIE-32GB, 32768 MiB, 32497 MiB
3, Tesla V100-PCIE-32GB, 32768 MiB, 32497 MiB

3. Check if NVIDIA drivers are loaded

In [4]:
! lsmod | grep nvidia

Out [4]:
nvidia_uvm           1261568  2
nvidia_drm             61440  0
nvidia_modeset       1265664  1 nvidia_drm
nvidia              55652352  116 nvidia_uvm,nvidia_modeset
drm_kms_helper        172032  2 ast,nvidia_drm
drm                   401408  6 drm_kms_helper,ast,nvidia,nvidia_drm,ttm

Python/PyTorch Specific Checks

4. Check GPU from Python

In [11]:
import torch

print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("GPU count:", torch.cuda.device_count())
    for i in range(torch.cuda.device_count()):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
        print(f"  Memory: {torch.cuda.get_device_properties(i).total_memory / 1e9:.1f} GB")
Out [11]:
CUDA available: True
GPU count: 4
GPU 0: Tesla V100-PCIE-32GB
  Memory: 34.1 GB
GPU 1: Tesla V100-PCIE-32GB
  Memory: 34.1 GB
GPU 2: Tesla V100-PCIE-32GB
  Memory: 34.1 GB
GPU 3: Tesla V100-PCIE-32GB
  Memory: 34.1 GB

5. One-line Python check

In [12]:
! python -c "import torch; print(f'GPUs: {torch.cuda.device_count()}, Available: {torch.cuda.is_available()}')"
Out [12]:
GPUs: 4, Available: True

Detailed GPU Information

6. Extended nvidia-smi info

In [5]:
! nvidia-smi -a
Out [5]:
==============NVSMI LOG==============

Timestamp                                 : Fri Sep 26 09:05:05 2025
Driver Version                            : 530.30.02
CUDA Version                              : 12.1

Attached GPUs                             : 4
GPU 00000000:3B:00.0
    Product Name                          : Tesla V100-PCIE-32GB
    Product Brand                         : Tesla
    Product Architecture                  : Volta
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1423519026061
    GPU UUID                              : GPU-9a4abfd2-efbe-a488-ef49-cd66e56b33d3
    Minor Number                          : 0
    VBIOS Version                         : 88.00.7E.00.03
    MultiGPU Board                        : No
    Board ID                              : 0x3b00
    Board Part Number                     : 900-2G500-0110-030
    GPU Part Number                       : 1DB6-897-A1
    FRU Part Number                       : N/A
    Module ID                             : 1
    Inforom Version
        Image Version                     : G500.0202.00.02
        OEM Object                        : 1.1
        ECC Object                        : 5.0
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    GPU Reset Status
        Reset Required                    : No
        Drain and Reset Recommended       : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x3B
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1DB610DE
        Bus Id                            : 00000000:3B:00.0
        Sub System Id                     : 0x124A10DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 3
                Device Current            : 3
                Device Max                : 3
                Host Max                  : 3
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
        Atomic Caps Inbound               : N/A
        Atomic Caps Outbound              : N/A
    Fan Speed                             : N/A
    Performance State                     : P0
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 32768 MiB
        Reserved                          : 266 MiB
        Used                              : 4 MiB
        Free                              : 32497 MiB
    BAR1 Memory Usage
        Total                             : 32768 MiB
        Used                              : 4 MiB
        Free                              : 32764 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    ECC Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : 0
            Double Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : 0
                Total                     : 0
        Aggregate
            Single Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : 0
            Double Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : 0
                Total                     : 0
    Retired Pages
        Single Bit ECC                    : 0
        Double Bit ECC                    : 0
        Pending Page Blacklist            : No
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 34 C
        GPU Shutdown Temp                 : 90 C
        GPU Slowdown Temp                 : 87 C
        GPU Max Operating Temp            : 83 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : 32 C
        Memory Max Operating Temp         : 85 C
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 25.35 W
        Power Limit                       : 250.00 W
        Default Power Limit               : 250.00 W
        Enforced Power Limit              : 250.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 250.00 W
    Clocks
        Graphics                          : 135 MHz
        SM                                : 135 MHz
        Memory                            : 877 MHz
        Video                             : 555 MHz
    Applications Clocks
        Graphics                          : 1230 MHz
        Memory                            : 877 MHz
    Default Applications Clocks
        Graphics                          : 1230 MHz
        Memory                            : 877 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1380 MHz
        SM                                : 1380 MHz
        Memory                            : 877 MHz
        Video                             : 1237 MHz
    Max Customer Boost Clocks
        Graphics                          : 1380 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Fabric
        State                             : N/A
        Status                            : N/A
    Processes                             : None

GPU 00000000:86:00.0
    Product Name                          : Tesla V100-PCIE-32GB
    Product Brand                         : Tesla
    Product Architecture                  : Volta
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1423619043950
    GPU UUID                              : GPU-1fef7b71-9f62-e048-4a10-52654e59bfda
    Minor Number                          : 1
    VBIOS Version                         : 88.00.7E.00.03
    MultiGPU Board                        : No
    Board ID                              : 0x8600
    Board Part Number                     : 900-2G500-0110-030
    GPU Part Number                       : 1DB6-897-A1
    FRU Part Number                       : N/A
    Module ID                             : 1
    Inforom Version
        Image Version                     : G500.0202.00.02
        OEM Object                        : 1.1
        ECC Object                        : 5.0
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    GPU Reset Status
        Reset Required                    : No
        Drain and Reset Recommended       : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0x86
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1DB610DE
        Bus Id                            : 00000000:86:00.0
        Sub System Id                     : 0x124A10DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 3
                Device Current            : 3
                Device Max                : 3
                Host Max                  : 3
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
        Atomic Caps Inbound               : N/A
        Atomic Caps Outbound              : N/A
    Fan Speed                             : N/A
    Performance State                     : P0
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 32768 MiB
        Reserved                          : 266 MiB
        Used                              : 4 MiB
        Free                              : 32497 MiB
    BAR1 Memory Usage
        Total                             : 32768 MiB
        Used                              : 4 MiB
        Free                              : 32764 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    ECC Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : 0
            Double Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : 0
                Total                     : 0
        Aggregate
            Single Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : 0
            Double Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : 0
                Total                     : 0
    Retired Pages
        Single Bit ECC                    : 0
        Double Bit ECC                    : 0
        Pending Page Blacklist            : No
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 33 C
        GPU Shutdown Temp                 : 90 C
        GPU Slowdown Temp                 : 87 C
        GPU Max Operating Temp            : 83 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : 34 C
        Memory Max Operating Temp         : 85 C
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 24.00 W
        Power Limit                       : 250.00 W
        Default Power Limit               : 250.00 W
        Enforced Power Limit              : 250.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 250.00 W
    Clocks
        Graphics                          : 135 MHz
        SM                                : 135 MHz
        Memory                            : 877 MHz
        Video                             : 555 MHz
    Applications Clocks
        Graphics                          : 1230 MHz
        Memory                            : 877 MHz
    Default Applications Clocks
        Graphics                          : 1230 MHz
        Memory                            : 877 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1380 MHz
        SM                                : 1380 MHz
        Memory                            : 877 MHz
        Video                             : 1237 MHz
    Max Customer Boost Clocks
        Graphics                          : 1380 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Fabric
        State                             : N/A
        Status                            : N/A
    Processes                             : None

GPU 00000000:AF:00.0
    Product Name                          : Tesla V100-PCIE-32GB
    Product Brand                         : Tesla
    Product Architecture                  : Volta
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1423619047372
    GPU UUID                              : GPU-8178c4e0-3010-ec7b-473d-de79dac8f629
    Minor Number                          : 2
    VBIOS Version                         : 88.00.7E.00.03
    MultiGPU Board                        : No
    Board ID                              : 0xaf00
    Board Part Number                     : 900-2G500-0110-030
    GPU Part Number                       : 1DB6-897-A1
    FRU Part Number                       : N/A
    Module ID                             : 1
    Inforom Version
        Image Version                     : G500.0202.00.02
        OEM Object                        : 1.1
        ECC Object                        : 5.0
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    GPU Reset Status
        Reset Required                    : No
        Drain and Reset Recommended       : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0xAF
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1DB610DE
        Bus Id                            : 00000000:AF:00.0
        Sub System Id                     : 0x124A10DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 3
                Device Current            : 3
                Device Max                : 3
                Host Max                  : 3
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
        Atomic Caps Inbound               : N/A
        Atomic Caps Outbound              : N/A
    Fan Speed                             : N/A
    Performance State                     : P0
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 32768 MiB
        Reserved                          : 266 MiB
        Used                              : 4 MiB
        Free                              : 32497 MiB
    BAR1 Memory Usage
        Total                             : 32768 MiB
        Used                              : 4 MiB
        Free                              : 32764 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    ECC Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : 0
            Double Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : 0
                Total                     : 0
        Aggregate
            Single Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : 0
            Double Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : 0
                Total                     : 0
    Retired Pages
        Single Bit ECC                    : 0
        Double Bit ECC                    : 0
        Pending Page Blacklist            : No
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 33 C
        GPU Shutdown Temp                 : 90 C
        GPU Slowdown Temp                 : 87 C
        GPU Max Operating Temp            : 83 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : 32 C
        Memory Max Operating Temp         : 85 C
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 26.12 W
        Power Limit                       : 250.00 W
        Default Power Limit               : 250.00 W
        Enforced Power Limit              : 250.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 250.00 W
    Clocks
        Graphics                          : 135 MHz
        SM                                : 135 MHz
        Memory                            : 877 MHz
        Video                             : 555 MHz
    Applications Clocks
        Graphics                          : 1230 MHz
        Memory                            : 877 MHz
    Default Applications Clocks
        Graphics                          : 1230 MHz
        Memory                            : 877 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1380 MHz
        SM                                : 1380 MHz
        Memory                            : 877 MHz
        Video                             : 1237 MHz
    Max Customer Boost Clocks
        Graphics                          : 1380 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Fabric
        State                             : N/A
        Status                            : N/A
    Processes                             : None

GPU 00000000:D8:00.0
    Product Name                          : Tesla V100-PCIE-32GB
    Product Brand                         : Tesla
    Product Architecture                  : Volta
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1423519026168
    GPU UUID                              : GPU-84787d0b-4a24-f198-b7ea-4dfad802af81
    Minor Number                          : 3
    VBIOS Version                         : 88.00.7E.00.03
    MultiGPU Board                        : No
    Board ID                              : 0xd800
    Board Part Number                     : 900-2G500-0110-030
    GPU Part Number                       : 1DB6-897-A1
    FRU Part Number                       : N/A
    Module ID                             : 1
    Inforom Version
        Image Version                     : G500.0202.00.02
        OEM Object                        : 1.1
        ECC Object                        : 5.0
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : None
        Host VGPU Mode                    : N/A
    GPU Reset Status
        Reset Required                    : No
        Drain and Reset Recommended       : N/A
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0xD8
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1DB610DE
        Bus Id                            : 00000000:D8:00.0
        Sub System Id                     : 0x124A10DE
        GPU Link Info
            PCIe Generation
                Max                       : 3
                Current                   : 3
                Device Current            : 3
                Device Max                : 3
                Host Max                  : 3
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 0 KB/s
        Rx Throughput                     : 0 KB/s
        Atomic Caps Inbound               : N/A
        Atomic Caps Outbound              : N/A
    Fan Speed                             : N/A
    Performance State                     : P0
    Clocks Throttle Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    FB Memory Usage
        Total                             : 32768 MiB
        Reserved                          : 266 MiB
        Used                              : 4 MiB
        Free                              : 32497 MiB
    BAR1 Memory Usage
        Total                             : 32768 MiB
        Used                              : 4 MiB
        Free                              : 32764 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    ECC Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : 0
            Double Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : 0
                Total                     : 0
        Aggregate
            Single Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : N/A
                Total                     : 0
            Double Bit            
                Device Memory             : 0
                Register File             : 0
                L1 Cache                  : 0
                L2 Cache                  : 0
                Texture Memory            : N/A
                Texture Shared            : N/A
                CBU                       : 0
                Total                     : 0
    Retired Pages
        Single Bit ECC                    : 0
        Double Bit ECC                    : 0
        Pending Page Blacklist            : No
    Remapped Rows                         : N/A
    Temperature
        GPU Current Temp                  : 33 C
        GPU Shutdown Temp                 : 90 C
        GPU Slowdown Temp                 : 87 C
        GPU Max Operating Temp            : 83 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : 31 C
        Memory Max Operating Temp         : 85 C
    Power Readings
        Power Management                  : Supported
        Power Draw                        : 24.00 W
        Power Limit                       : 250.00 W
        Default Power Limit               : 250.00 W
        Enforced Power Limit              : 250.00 W
        Min Power Limit                   : 100.00 W
        Max Power Limit                   : 250.00 W
    Clocks
        Graphics                          : 135 MHz
        SM                                : 135 MHz
        Memory                            : 877 MHz
        Video                             : 555 MHz
    Applications Clocks
        Graphics                          : 1230 MHz
        Memory                            : 877 MHz
    Default Applications Clocks
        Graphics                          : 1230 MHz
        Memory                            : 877 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1380 MHz
        SM                                : 1380 MHz
        Memory                            : 877 MHz
        Video                             : 1237 MHz
    Max Customer Boost Clocks
        Graphics                          : 1380 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : N/A
    Fabric
        State                             : N/A
        Status                            : N/A
    Processes                             : None

7. Real-time GPU monitoring

In [7]:
# Watch GPU usage every 2 seconds
! watch -n 2 nvidia-smi
Out [7]:
[?1l>---------------------------------------+----------------------+-------------========|ri Sep 26 09:05:22 2044779931313366[?47l8

8. Check GPU compute capability

In [8]:
! nvidia-smi --query-gpu=compute_cap --format=csv
Out [8]:
compute_cap
7.0
7.0
7.0
7.0

System-Level GPU Checks

9. Check PCIe devices

In [9]:
! lspci | grep -i nvidia
Out [9]:
3b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)
86:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)
af:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)
d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)

10. Check GPU driver version

In [10]:
! cat /proc/driver/nvidia/version
Out [10]:
NVRM version: NVIDIA UNIX x86_64 Kernel Module  530.30.02  Wed Feb 22 04:11:39 UTC 2023
GCC version:  gcc version 8.4.0 (Ubuntu 8.4.0-1ubuntu1~18.04) 
All systems normal

© 2025 2023 Sanjeeb KC. All rights reserved.