We have all had this bad experience. You are watching a video of your favorite show or playing your favourite game, and a jumpy horizontal (and/or diagonal) line breaks your immersion and reminds you that this all fiction. This effect is usually called tearing, and you can see an example of this in the following video (already visible in the thumbnail):
Another issue that some users have been hitting is not being able to have three 4k displays set horizontally. In this blog post, I will explain how I managed to kill these two birds with my per-CRTC framebuffer stone.
Some historical context
Back in the ‘good’ old days, the Linux kernel was not in charge of anything graphics-related, except VT-switching. The userspace was thus responsible for everything, and the X-server happily was providing all the features you may have wanted.
Most of the code of the X-Server was common because it started as a CPU-only rendering, but as GPUs got introduced to the market, the X-server learnt how to use them for both changing the modes, performing 2D acceleration and, later on, 3D acceleration. However, a driver had to be written for every GPU on the market. This is how we came to have over 10 drivers.
In 2008, Kernel ModeSetting (KMS) came to Linux, which took away the responsibility of changing the screen resolution from the X-server, and allowed for cool projects such as Plymouth (splash screen during boot), or fast VT-switching (because the mode did not have to be re-set after every switch). It also introduced a unified interface across GPU vendors for the userspace, allowing applications to interact with the different displays without having to care whether the GPU would be from Intel, AMD/ATI, NVIDIA, or any other vendor.
Later in the same year, the X-server received a new driver (called xf86-video-modesetting), targeting the generic KMS interface. It allowed providing mode setting for every KMS driver (which was admitedly not that many at the time, but which is now over 50). The xf86-video-modesetting has however remained a niche because it lacked support for 2D acceleration. Thankfully, 2D acceleration using OpenGL has been introduced in 2014 to the X-server under the name glamor, making xf86-video-modesetting a usable and generic driver.
Fast-forward to 2016, Debian- and Fedora-based distributions switched to using the modesetting driver instead of the Intel-only xf86-video-intel driver, often named after one of its backend: SNA. The decision happened because the SNA driver lacked acceleration support for Skylake processors, and because of the limited development the driver was seeing.
As a whole, X-specific development is on the decline, because Wayland is the new display server standard. In a Wayland environment, X11 retro-compatibility is provided by XWayland, which only supports the xf86-video-modesetting driver. So, last year, Intel took the decision to support the -modesetting driver in favor of the SNA driver.
This decision however impacted a small percentage of users who relied on features that used to be available on SNA but are missing on xf86-video-modesetting. The most important one being TearFree. This feature allows users of uncomposited environments to experience a tear-free environment for both windowed and fullscreen applications. Luckily, users of modern desktops desktop environments have been unimpacted because they are using a OpenGL-based window managers which uses DRI2’s and DRI3’s PageFlip capability. This capability allows fullscreen applications (or window managers) to provide the full framebuffer to the X-server and use KMS’s ability to flip to it at the next vblank of all displays used. This results in a tear-free experience, if the application is double buffered (one buffer is used to render to while another one is used to scan out to the display).
Advantages of per-CRTC framebuffers
The X-server has the concept of screens, but it is closer to the concept of a seat than an actual display as the screen’s framebuffer actually contains the content of all displays (every display would use a different x/y offset in this pixmap). In un-composited scenarios, the applications render directly into this buffer, which creates tearing as the rendering is not synchronised with the different displays.
Another issue with this gigantic framebuffer is that it is limited to the maximum width and height supported by the display engines of the GPU. On recent Intel GPUs, the limit is 8k which is sufficient for two 4k displays or three full-HD displays in a row, but some users have wanted more.
The solution to both problems is to introduce per-display (AKA per-CRTC) framebuffers. Indeed, once the gigantic framebuffer gets split into per-CRTC ones, the maximum width of the framebuffer supported by the HW will only limit the maximum resolution achievable per screen rather than the combined size of all the displays put together. This feature can then be used to achieve a tear-free desktop by making sure that we never copy the changes done on this gigantic framebuffer to the per-CRTC framebuffer while this display is scanning out. Unfortunately, we cannot copy these changes fast-enough to fit into the vblank period of the display, so we instead have to use the double buffering technique described earlier to achieve the same effect.
Another issue is that the concept of screen is so central to X11 that allowing fullscreen applications and window managers to provide one framebuffer per CRTC would require a lot of work which these compositors should instead spend on porting their codebase to become Wayland-enabled. Wayland has been designed from the ground-up to be efficient and provide a silky-smooth/tear-free experience. In the mean time, we can catter for the users of uncomposited environments by transparently implementing such support in the xf86-video-modesetting driver.
Implementing per-CRTC framebuffers in xf86-video-modesetting
WARNING: Please skip this section if you are not interested in the implementation details
I am not what one would call a regular contributor to the -modesetting driver, or to the X-server’s code base in general. I contributed 3 patches over the last 3 years (2 bug fixes, and 1 feature), and 2 reviews.
Hacking on the X-server has always been a little more complex than hacking on other projects because co-installing the X-server requires quite a lot of fiddling with configuration files, starting scripts and logind integration. However, I must say that recent changes such as moving to Meson, the consolidation of all protocols headers into one repository, and the merge-request workflow provided by gitlab (along with automated testing) have made working on the X-server and the modesetting driver easier. Thanks a lot guys!
Upon looking at the code of the -modesetting driver, I realized that there was already support for per-CRTC framebuffers. They were however only oriented towards supporting rotated displays. Making this code more generic to support the non-rotated case turned out to be quite a challenge as it would have required to change the ABI between the X-Server and the device drivers. This proved to be too great of a hassle, and I instead opted to encapsulate all the generic code into functions and a structure to represent the per-CRTC buffers (drmmode_shadow_scanout_rec). This work was mostly done in patch 1 and 2.
I then worked on reducing the amount of pixels that need to be copied for every frame. Instead of copying damages instantly, I would like to buffer them and perform the copy at the same frequency as the refresh rate of the screen. This reduces the performance impact of this technique for applications with a refresh rate vastly higher than the display’s refresh rate (hello glxgears). The X-server keeps track of which pixels need to be updated (damages) in a RegionRec data structure, which is simply a list of boxes and supporting various set operations such as unions or intersections. Damage information is received from the X-server through the BlockHandler function. I thus only have to aggregate all the damaged regions into per-CRTC invalid regions and store that in a new field of stored in drmmode_shadow_scanout_rec (screen_damage). The accumulation of damages is done by the ms_update_scanout_damages() function, and the copy (blitting) of the damages is done by drmmode_update_scanout_buffer(), and these functions can be found in patch 3.
In patch 4, I am finally adding support for per-CRTC framebuffers, albeit disabled because I wanted to make this patch as short as possible so as not to clutter it with future details. The function drmmode_need_shadow_scanout() is introduced in order to dynamically be able to use this feature based on different conditions. Right now, we always return FALSE. We also do not limit the blitting to the refresh rate of the screen yet (but this is coming soon). Despite not doing much, this patch is however exercising the X-server is a new way, which led to a crash when the X-server would restart because the CRTC datastructures were re-used without being zero’ed, leading to a use-after-free bug which this patch already fixes. One oddity about this patch is the call to glamor_finish() after blitting damages. This is because we want to make sure all the operations are done before returning, which reduces stuttering because some drawcalls may not have been queued on the GPU yet and will not until the next damages appear (which may take seconds).
The patch 5 starts making use of all of this new code by enabling per-CRTC framebuffers in case the width or height of the screen’s pixmap is larger than the GPU’s display engines’ capabilities. Because of this, it is now safe to raise the limits for the pixmap to the maximum supported by X11 (64kx64k). This limit will never be lifted because all the X11 protocols and extensions depend on 16 bits integers to represent positions.
Now that we enabled the feature at least in one case, patch 6 improves its performance by finally limiting the blitting of damages to the refresh rate of one of the displays. Luckily, the -modesetting driver already allows us to request a function to be called at the next vblank event (ms_queue_vblank). We just need to schedule an update whenever we receive damage events, and revert back to a synchronous update if the scheduling of the call failed.
Patch 7 is a trivial patch that allows users to force-enable the per-CRTC framebuffer. The option is called ShadowPrimary, to mimic the xf86-video-ati driver. It may improve performance in some edge cases, but the primary purpose of this option is to allow testing of this codepath.
Now that we have per-CRTC framebuffers, we need to work towards double buffering them to prevent tearing. This work requires the KMS feature to exchange the scanout buffer during vblank (Page Flipping). Patches 8 and 9 are performing preliminary work towards this goal by respectively making some code more generic and removing the assumption that pageflipping always happen on the gigantic framebuffer.
Patch 10 finally enables TearFree support! It does so by introducing a shadow back buffer (named shadow_nonrotated_back), adding damage tracking on this buffer, setting up the page flips instead of performing the damage updates at the next vblank, and adding a TearFree option. This patch ends up being relatively small because it re-uses a lot of the infrastructure we set-up previously.
Unfortunately, more work is needed to the TearFree support compatible with the PageFlip feature. This would allow modern desktop environments to skip the extra copy unless it hits the limits of the display engines. In the mean time, trying to use the PageFlip feature will lead to increased latencies, and the kernel complaining that the flipping queue reached its maximum length! I’ll make sure to disable pageflips before the patch series lands.
Testing the feature
There are a lot of different scenarios that are affected by my patch series (multi-GPU being one of the hairiest one), and I would appreciate your feedback.
You can find all my patches in my pull request on freedesktop.org. They should apply cleanly on the latest X-server release (1.20.1), in case you are already running this version (looking at you, ArchLinux users). I am quite pleased that the patch series ended up so small:
8 files changed, 605 insertions(+), 72 deletions(-)
Once you recompiled your X-server with the patches, please set the TearFree option in your xorg.conf like so:
Option "TearFree" "True"
You can use this youtube video to check for differences with and without TearFree. Make sure to try this video in both fullscreen and windowed mode as the fullscreen mode may utilise the PageFlip feature to provide tear-free rendering.
WARNING: The extra copy incurred by the TearFree option can use up a lot of memory bandwidth when using a lot of 4k monitors, which can lead to up 50% performance loss.
That’s all, folks!