- Successfully installed the Zotac GeForce GTX 1050 Ti graphics card in the Dell Optiplex 3020 desktop.
- Installed CUDA SDK using web installer, https://developer.nvidia.com/cuda-downloads
- Tried following https://jamesbowley.co.uk/build-opencv-4-0-0-with-cuda-10-0-and-intel-mkl-tbb-in-windows/ for building OpenCV. Configure kept failing, saying fseeko not found. So, decided to use the pre-compiled library from https://jamesbowley.co.uk/downloads/
- https://jamesbowley.co.uk/opencv-3-4-gpu-cuda-performance-comparison-nvidia-vs-intel/ gives an interesting test comparison.
- For comparing my code, which reads from disk and then does computations for 50 images, tried a test with the computation part commented out.
- A simple sample program, to perform a 2D DFT and inverse, gave the following results on running more than once -
- samplecudadft,480x360 fft and inv fftDFT and inverse, with upload/dl 0.255261 sec.(after running several times).2400x360,DFT and inverse, with upload/dl 0.261917 sec.CPUE:\OCT\opencvcuda\OpencvCuda\
x64\Release>samplecudadft
DFT and inverse, on CPU 0.0144597 sec.
DFT and inverse, with upload/dl 0.260429 sec. - bottleneck seems to be FFT "planning" and not upload download.E:\OCT\opencvcuda\OpencvCuda\
x64\Release>samplecudadftDFT and inverse, on CPU 0.0146046 sec.
DFT and inverse, with upload/dl 0.263986 sec.
DFT and inverse, without upload/dl 0.260521 sec.and running it a 2nd time,DFT and inverse, 2nd time without upload/dl 0.00618453 sec. - Even when called as a function, quite fast after the first time.DFT and inverse, as a function 0.00678208 sec.
DFT and inverse, as a function 0.00605592 sec.
DFT and inverse, as a function 0.00654344 sec.
DFT and inverse, as a function 0.00608254 sec.
DFT and inverse, as a function 0.00651554 sec.
DFT and inverse, as a function 0.00623649 sec.
DFT and inverse, as a function 0.00607195 sec.
DFT and inverse, as a function 0.0063154 sec.
DFT and inverse, as a function 0.00597894 sec.
DFT and inverse, as a function 0.00685649 sec.
DFT and inverse, as a function 0.00608895 sec. - Will probably need to optimize based onhttps://docs.opencv.org/master/dd/d3d/tutorial_gpu_basics_similarity.html
- Currently, with upload / download and variable assignment not optimized,on cpu,While loop 9.15065 sec.on gpu,While loop 9.34765 sec.where variables were initialized, 64 to 32 conversion and back were included.More important, the Bscan image shows up a bug.