Setting Up TensorFlow to Detect NVIDIA RTX 4090 In Ubuntu 20.04
Buying The Hardware
I’ve been mulling on to buy my own personal deep-learning machine for some time now. It’s quite an investment, and I’ve weighed the pros and the cons. This past weekend, I finally bit the bullet and bought the most powerful PC that selling 3 out of 4 of my kidneys could buy (LOL). I bought my PC parts from a store named Dynaquest in Mandaluyong city, Philippines.
Here are the complete specs of my PC, as well as some basic UserBenchmarks.
Windows with WSL or Dual Boot OS?
I’ve contemplated if I'm going to use either Windows + WSL or a dual boot Windows and Ubuntu, and I decided to go for the latter. I won't go into the details, but I would like to avoid any virtualization layer(s) between the OS and the hardware. I also installed a specific Ubuntu version, 20.04, as this is the same version that’s installed in the Azure Machine Learning computes that I am using. The intention is to make the environments as close as possible so that I can compare the performance between my machine, and the Azure ML computes when training models and inferencing.
I’ve followed these guides to set up a dual boot Windows 11 and Ubuntu 20.04 machine.
- https://www.xda-developers.com/dual-boot-windows-11-linux/
- https://www.tomshardware.com/how-to/dual-boot-linux-and-windows-11
Tensorflow GPU woes and solving it
I’ve set up countless Azure Ubuntu ML computes before, and I didn't encounter any issues configuring them in the past. I was pretty confident I would be able to set things up quickly. Unfortunately, the same installation scripts I used when setting up my Azure ML computes didn’t work.
The main issue being is that Tensorflow can’t detect my installed RTX 4090.
After hours and hours scouring the internet for clues on how to fix it and trying to solve it on my own, I stumbled on this amazing Medium post by Venky. Even though his Ubuntu version is 22.04, I encountered no issues when applying them on Ubuntu 20.04.
To summarize the post
- Install GCC 9 compiler and set it as the preferred GCC compiler version
- Download and install CUDA and cuDNN libraries from Nvidia, you may need to create an account to download the libraries
- Create a new Python environment and install dependencies
Venky also has a quick sanity check at the end. Running this check yields a the screen below thus confirming that we’ve successfully installed CUDA and cuDNN.
Important Note
There is also a very important comment that was left by Zhongyiyuan about Venky’s workflow, which revolves around updating Linux Kernel and breaking GCC functionality. According to this link, the max GCC version supported by CUDA 11.2 is GCC 10, not 9, so there may be no need to set GCC 9 as the preferred version. But I can’t test it anymore as I’ve already applied the workflow. Just keep this in mind!
Creating New Tensorflow Environment
Venky’s workflow will also add the new environment to the list of usable kernels from Jupyter lab. Here’s the set of scripts from his Medium article that will add that.
(torchgpu) source venv/torchgpu/bin/activate
(torchgpu) pip install ipykernel
(torchgpu) python -m ipykernel install --user --name TORCH-GPU --display-name "PyTorch GPU"
Also, Venky’s workflow will install Pytorch. In my case, I will be using Tensorflow instead. Also, as I didn’t have Jupyterlab yet, so I had to install it. The scripts below will install Jupyterlab and TensorFlow.
(torchgpu) pip3 install jupyterlab #install jupyerlab
(torchgpu) pip3 install ipywidgets==7.7.2 #im getting widget issues with other versions
(torchgpu) pip3 install tensorflow==2.9.0 #i use a specific version
(torchgpu) pip3 install pandas #... etc , install all your other libraries
Opening up Jupyterlab, I could see the new environment in the kernel dropdown.
Rerunning the same notebook, I can now see one GPU! If you scroll back up, this previously showed zero available GPUs!
Proceeding to train the model, I opened up a terminal and monitored the GPU utilization, and can now confirm that my notebook is utilizing the RTX 4090.
Before configuring it properly, using the CPU instead of the GPU, per epoch was around 60 mins, now with a GPU it only needs 4 mins per epoch!
Closing Remarks
Bringing my new machine home, I was confident that I’d be able to set things up quickly as I’ve set up countless Ubuntu-based Azure ML computes before. I was wrong.
Medium proved very helpful in this situation, and that’s how I stumbled on Venky’s article. I’d like to thank him for the time and effort he gave to write his article.
I haven't stress-tested my new machine yet, so I will be benchmarking and doing some stress tests to check for the overall system integrity and dependability. Afterward, I will be drowning myself in more interesting deep-learning stuff, even benchmarking my machine against Azure GPU VMs to see how my machine fares against cloud GPU computes.
Thank you for reading my article!