Thursday, April 09, 2020

More notes on OpenGL programming

Following up on my earlier post about OpenGL programming. GL_warp2mp4 is now functional.

Checking the speeds achieved by pboUnpack on my machine, 2.8 fps for a 2048x2048 texture if GL_BGR is used, but increases to 29 fps if GL_RGBA8 is used. So, the 3 fps I get with GL_warp2mp4 with hi-res files seems to be limited by the writing to fbo.  If I write directly to screen and not to a texture on an fbo, I get faster frame rates. But that, of course, has the limitation that the destination has to be of lower resolution than the screen size.

The pixel format GL_BGRA is unfortunately not supported internally for FBO (at least on my system). So, GL_RGBA8 has to be used, which is apparently the fastest we can get.

pboPack shows between 6 and 9 Mpixels/s on my machine, both with PBO on and off.
../bin/pboPack
Video card supports GL_ARB_pixel_buffer_object.
Transfer Rate: 0.0 Mpixels/s. (0.0 FPS)
Transfer Rate: 7.7 Mpixels/s. (30.7 FPS)
Transfer Rate: 8.6 Mpixels/s. (34.4 FPS)
Transfer Rate: 9.0 Mpixels/s. (35.8 FPS)
Transfer Rate: 8.4 Mpixels/s. (33.5 FPS)
Transfer Rate: 8.6 Mpixels/s. (34.3 FPS)
Transfer Rate: 8.3 Mpixels/s. (33.2 FPS)
Transfer Rate: 8.8 Mpixels/s. (35.3 FPS)
PBO mode: off
Transfer Rate: 8.5 Mpixels/s. (34.0 FPS)
Transfer Rate: 6.5 Mpixels/s. (25.9 FPS)
Transfer Rate: 6.6 Mpixels/s. (26.5 FPS)
Transfer Rate: 6.7 Mpixels/s. (27.0 FPS)
Transfer Rate: 6.8 Mpixels/s. (27.4 FPS)
Transfer Rate: 6.7 Mpixels/s. (26.6 FPS)
PBO mode: on
Transfer Rate: 7.6 Mpixels/s. (30.4 FPS)
Transfer Rate: 8.4 Mpixels/s. (33.6 FPS)
Transfer Rate: 8.7 Mpixels/s. (34.9 FPS)
Transfer Rate: 8.9 Mpixels/s. (35.5 FPS)
Transfer Rate: 8.8 Mpixels/s. (35.4 FPS)

What I get with GL_warp2mp4 is around 3.5 fps with 1080p output - that comes to around 6.9 Mpixels/s. So, probably that's all I can expect.

Probably I will get better performance by avoiding OpenGL and the graphics card and going the remap way as in OCVWarp. The reason being, the data transfer to and from the video card seems to be the bottleneck, and CPU based remap computation is faster.

No comments:

Post a Comment