
Blend2D
2D Vector Graphics Engine
Blend2D is a high-performance 2D vector graphics engine written in C++. It was written from scratch with the goal to achieve the best possible software-based acceleration of 2D rendering. Blend2D provides a high-quality and high-performance analytic rasterizer, a new stroking engine that uses curve offsetting instead of flattening, a 2D pipeline generator that uses JIT compilation to generate optimal 2D pipelines at runtime, and has a capability to use multiple threads to accelerate the performance even further.
The library was also written with zero dependencies in mind. Currently, it only depends on AsmJit, which is used for JIT compilation. All features that Blend2D provides are implemented by Blend2D itself, including image loading capabilities and text rendering (freetype or operating system provided libraries are not used).
Blend2D project is developed by the following people:
There are currently no rules about becoming part of the Blend2D Team. We think that anyone interested in Blend2D development should contribute first by opening issues, pull requests, by joining our channels, and by promoting Blend2D on other sites in a neutral way. Anyone who starts making significant contributions that match our code quality will be offered to join Blend2D project at some point.
The Blend2D project started in 2015, as an experiment to use a JIT compiler for generating 2D pipelines. The initial prototype did not use any SIMD and was already faster than both Qt and Cairo in composition of the output from the rasterizer. It was probably not just about the pipeline itself as the rasterizer was already optimized. However, the rasterizer was much worse than the one used today. The prototype was initially written for fun and without any intentions to make a library out of it. But the performace was so impressive that it built the foundation of Blend2D.
Later, when gradients, textures, and advanced composition operators were implemented, it was obvious that in terms of performance Blend2D competes very well with other 2D renderers while keeping the same quality. Yet it was uncertain which features the library should provide in the first place. Because of the high complexity with text rendering in general, the initial idea was to only create a library that is able to render vector paths. The users would simply pass such paths to render text. Finally, it became clear that a 2D framework without native text support is not practical, so the idea of having basic support for TTF and OTF fonts has been explored and finally implemented.
After basic text rendering, two more ideas were pursued: Faster rasterization and a better stroking engine that would be able to offset curves without flattening. The work on the rasterizer started in late 2017 and took around 3 months to complete, with some extra time to stabilize. Around 20 variations of different rasterizers have been implemented and benchmarked so that the best approach for Blend2D was chosen and further refined. Research into the new stroking engine started in 2018 and also required a few months of studying and experimenting before the initial prototype was implemented. Although not considered initially, because of positive results, the new stroking engine has been added to the beta release of Blend2D.
In the early 2020 a work on multi-threaded rendering context has begun. It took approximately 3 months to design and implement the initial prototype that was on par with features provided by the single-threaded rendering context. It was committed to master branch in April 2020 and released as a part of beta13 release. Blend2D was most likely the first open-source 2D vector graphics engine that offered a multi-threaded / asynchronous rendering.
The most important features were finalized in early 2019 and it became clear that with some additional bug fixing and testing it would be possible to release the beta version around March 2019. Blend2D was finally released to the public on April 2nd, including code samples and benchmarking tool.
Please take some time and read our FAQ to learn a little more about the project. You are welcome to contact us if you need more details, or if you would like to ask us something we have missed.
The only architecture specific code in Blend2D is a JIT pipeline generator, which will be ported to more architectures in the future. The rest of the code is portable C++ with optional optimizations that use compiler intrinsics to take advantage of SIMD. An experimental pipeline that doesn't use JIT is under development and even now it's possible to compile Blend2D on non-x86 targets with limited functionality.
It's worth mentioning that architecture-specific optimizations are used by many mainstream 2D rendering engines that offer software-based backend, because SIMD can significantly improve the performance of pixel processing. Blend2D doesn't really have more architecture-specific code than other libraries such as Qt or pixman.
Blend2D uses AsmJit's compiler tool for JIT code generation, which can generate over 100 MB/s of machine code on modern setups (around 170MB/s on an AMD Zen 4 machine). Typically, 2D rendering only requires few pipelines that may require about 20-50kB of total RAM as the size of a single pipeline varies between 0.2kB to 5kB. Pipelines are cached and never generated twice. The required time to create a pipeline is almost zero compared to the time spent by executing it.
The performance of multi-threaded rendering highly depends on the size of the target image and on the actual render calls. Multi-threaded rendering (also called asynchronous rendering in Blend2D) serializes all render calls to a render command, which is then executed later by a worker. Each worker acqures a band (several scanlines) and processes all commands within that band before acquring another band. This approach is very beneficial when rendering into a larger image (FullHD, 4K), because each band is processed only once, thus it can remain in the CPU cache for all commands that touch that band. This approach is also great for worker threads in general as it's guaranteed that worker threads do not need any synchronization and would never write to the same memory - once a worker acquires a band no other thread can acquire it.
The performance page clearly shows advantages of multi-threaded rendering in all tested scenarios, but how beneficial it would be for a particular workload would have to be benchmarked separately. Many Qt demos offered by blend2d-apps
package have the ability to select a multi-threaded rendering engine so such demos can also be used as a reference to compare the performance of various implementations (including Qt). It's worth mentioning that worker threads are not fully utilized in these demos, which means that they are usually clocked lower than the main thread. We cannot do much about this as those demos use Qt, which is responsible for blitting the framebuffer into the GPU memory, and this operation seems synchronous.
Please note that the current implementation is still a prototype that will be improved in the future.
The Blend2D project started as a challenge to compete with existing software-based 2D renderers in terms of performance. It was not planned to compete with GPU-accelerated renderers although the importance of that comparison is understandable and we would like to have a fair comparison at some time. However, we believe that such comparisons should not only include the average frame-rate achieved, but also output quality as well as power and memory consumption of both main and GPU memory. This information is often missing, which may result in misleading and biased conclusions.