Performance Comparison

Performance is one of the key features of Blend2D. This page provides a visualization of output generated by an open source bl_bench tool (provided by blend2d-apps), which is used to tune Blend2D itself and to compare it with other 2D engines either visually of performance-wise. The tool repeats all tests with various composition operators / styles and scales the size of each operation from 8x8 to 256x256 pixels, which can be used to compare how efficiently the engine can render both small and large vector art.

Rendering
Test Case
Composition 
Style
Rendering time in [ms] of performing 1000 successive render operations on AMD Ryzen 7950X CPU

Notes

  • AGG comparison is at the moment limited to Solid, Linear, and Radial gradient styles
  • Skia is linked statically and built via vcpkg package manager
  • Blend2D and all dependencies linked statically are compiled by clang 16.0 or newer

Tests

The following table describes the meaning of test names used by bl_bench:

Test Name Description
RectA Fill or stroke an axis-aligned (or pixel-aligned) rectangle. The simplest operation, usually the most optimized by 2D engines. SIMD acceleration dominates the performance of FillRectA tests. StrokeRectA test is usually tricky as some engines can use 4 aligned rectangles to represent the stroke, whereas others can use a polygon rasterizer. Blend2D uses a polygon rasterizer in such case.
RectU Fill or stroke a rectangle which is unaligned. This test shows how efficiently this operation is and whether it's a special-cased or not. Blend2D contains a specialized rasterizer for such case, but the idea is to remove it at some point.
RectRot Fill or stroke a rotated rectangle. Tests whether the engine uses a generic polygon rasterizer for such rendering or has some specialized rasterizer that can render convex polygons faster. Blend2D uses a generic rasterizer for almost all tests except FillRectA and FillRectU. Filling a rotated rectangle with Pattern_BI shows the performance of rendering rotated images that are bilinearly filtered.
RoundU Fill or stroke a rounded rectangle, not aligned to pixel boundaries. This is a test that shows how rendering engines handle curves as the arcs representing rounded parts are usually described as cubic beziers.
RoundRot Fill or stroke a rounded rectangle, which is rotated. The question is, how much the rotation makes this operation slower?
Poly Fill or stroke a polygon consisting of N vertices by using a non-zero or even-odd fill rule. Vertices are random, which means that many of them would self-intersect, especially when N increases. This test was designed to stress the rasterizer and shows its robustness.
World Fill or stroke a world data, which is the same figure as shown on Blend2D homepage, at various sizes. This is a real world example that reveals the performance of rendering complex vector art. The world data is stored in a single path and contains only polygons (no curves).

The following table describes the meaning of styles used by bl_bench:

Style Name Description
Solid Solid color.
Linear Linear gradient with 3 color stops.
Radial Radial gradient with 3 color stops.
Conic Conic gradient with 4 color stops (not supported by Cairo).
Pattern_NN Pattern (image) using nearest-neighbor filter.
Pattern_BI Pattern (image) using bilinear interpolation (works the same way as nearest-neighbor when running FillRectA test).

Discussion

All tests were written in a way to use the best capabilities of each rendering engine. The focus is on raw rendering performance and not on caching. A pseudo random number generator is used to generate random input coordinates & colors; and each test has a pre-configured random generator to use the same seed, which means that all engines render exactly the same content. This can be verified by using a bl_bench --save command line argument and then comparing visually outputs of test outputs.

Interpreting the Results

  • Blend2D wins almost all tests because of its high performance rasterizer that influences all tests except FillRectA and FillRectU. However, there are still some tests where Blend2D doesn't win, for example FillRectA with SrcOver composition and Pattern_NN fill - in this particular case Qt wins because it branches in inner loop when alpha is zero of full, however, Blend2D doesn't use this trick and always blends pixels regardless of the source alpha. Since most source images have majority of pixels using zero or full alpha this trick works well. We will look into this optimization opportunity in the future.
  • AGG is slow especially in operations that are heavy on filling (FillRect, FillRound), because it has no SIMD optimizations to accelerate pixel composition. On the other hand, tests that involve rasterization more than composition are much better and can compete with both Cairo and Qt.
  • Cairo has performance problems with gradients. It uses a different approach compared to AGG, Blend2D, and Qt, which makes the rendering much slower. In addition, Cairo is the only library that has sometimes faster SrcOver operator than SrcCopy, which is most probably caused by SIMD fast-paths only implementing SrcOver and not SrcCopy. It shows up as we compare both.
  • Qt loses in all rasterization heavy tests because of a poor rasterization performance. The rasterizer is not good as a general purpose rasterizer as it doesn't handle well complex vector art. FillPoly and StrokePoly tests show that the rasterizer performance degrades fast when the input path increases in size and complexity. On the other hand, Qt's compositing pipeline is more optimized than Cairo especially considering gradients and bilinear filtering of images.

Border Cases

There are always border cases when it comes to 2D rendering. Some operations are handled differently across rendering engines, which is unwanted as we would like to always compare comparable. The following table attempts to document each border case and possible workarounds to make the performance comparison as fair and unbiased as possible:

Library Issue Comment / Workarounds
Cairo Global transparency Cairo has very tricky support for global transparency. It only exports cairo_paint_with_alpha() function that doesn't match Blend2D capabilities so tests with global alpha are disabled at the moment.
Qt Extend modes Qt doesn't allow to specify extend modes for patterns (only gradients are supported). Implicitly only repeating patterns are supported, but Qt internally also implements padding for blitting transformed images. This means that all pattern benchmarks are restricted to repeat mode only to make them comparable.
Qt Aligned translations Qt always tries to align a translation matrix that has no scaling and skewing into pixels when a source style is Pattern_BI. Since this style explicitly tests non-aligned case a non-significant scaling part is added to the transformation matrix to bypass the hardcoded check in Qt. The scaling part is very small and doesn't cause any visual difference, however, if there is a better way we would like to hear about it. This workaround is only required to test FillRectU with Pattern_BI style, because without this fix the test would be identical to Pattern_NN in Qt case, which would be incorrect.

Conclusion

Blend2D offers incredible performance compared to other libraries, because it optimizes across the whole stack - building edges from geometries, novel rasterization approach, JIT optimized pipelines, and multithreading. But that's not all of it - even the dispatching mechanism (a layer between calling a rendering engine function and the pipeline actually executing it) has been optimized to make it fast to render tiny geometries with low overhead. And of course there are still areas that could be more optimized, thus potential to make Blend2D even faster.

Output Images

The benchmarking tool generates hundreds of images like these shown below: