Performance Comparison

Performance is one of the key features of Blend2D. This page provides a visualization of output generated by bl_bench tool, which is used to tune Blend2D itself and to compare it with other 2D engines either visually of performance-wise. The tool repeats all tests with various composition operators / styles and scales the size of each operation from 8x8 to 256x256 pixels, which can be used to compare how efficiently the engine can render both small and large art.

Test:
CompOp:
Style:
Rendering time in [ms] of performing 1000 successive renderings on AMD Ryzen 1700 CPU.

Test Data

AGG comparison is at the moment limited to SrcOver operator and Solid style. We plan to address this limitation in the future.

Tests

The following table describes the meaning of test names used by bl_bench:

Test Name Description
RectAA Fill or stroke an axis-aligned (or pixel-aligned) rectangle. The simplest operation, usually the most optimized by 2D engines. SIMD acceleration dominates the performance of FillRectAA tests. StrokeRectAA test is usually tricky as some engines can use 4 aligned rectangles to represent the stroke, whereas others can use a polygon rasterizer. Blend2D uses a polygon rasterizer in such case.
RectNA Fill or stroke a rectangle which is not aligned (NA). This test shows how efficiently this operation is and whether it's a special-cased or not. Blend2D contains a specialized rasterizer for such case, but the idea is to remove it at some point.
RectRot Fill or stroke a rotated rectangle. Tests whether the engine uses a generic polygon rasterizer for such rendering or has some specialized rasterizer that can render convex polygons faster. Blend2D uses a generic rasterizer for almost all tests except FillRectAA and FillRectNA. Filling a rotated rectangle with Pattern_BI shows the performance of rendering rotated images that are bilinearly filtered.
RoundNA Fill or stroke a rounded rectangle, not aligned to pixel boundaries. This is a test that shows how rendering engines handle curves as the arcs representing rounded parts are usually described as cubic beziers.
RoundRot Fill or stroke a rounded rectangle, which is rotated. The question is, how much the rotation makes this operation slower?
Poly Fill or stroke a polygon consisting of N vertices by using a non-zero or even-odd fill rule. Vertices are random, which means that many of them would self-intersect, especially when N increases. This test was designed to stress the rasterizer and shows its robustness.
World Fill or stroke a world data, which is the same figure as shown on Blend2D homepage, at various sizes. This is a real world example that reveals the performance of rendering complex vector art. The world data is stored in a single path and contains only polygons (no curves).

The following table describes the meaning of styles used by bl_bench:

Style Name Description
Solid Solid color.
Linear Linear gradient with 3 colors stops.
Radial Radial gradient with 3 color stops.
Conical Conical gradient with 3 color stops (not supported by Cairo).
Pattern_NN Pattern (image) using nearest-neighbor filter.
Pattern_BI Pattern (image) using bilinear interpolation (works the same way as nearest-neighbor when running FillRectAA test).

Discussion

All tests were written in a way to use the best capabilities of each rendering engine. The focus is on raw rendering performance and not on caching. A pseudo random number generator is used to generate random vertices and each test has preconfigured this generator to the same seed, which means that all engines render exactly the same content. This can be verified by using a --save command line argument of bl_bench tool and then comparing visually outputs of all tests performed.

Interpreting the Results

  • Blend2D wins almost all tests because of its high performance rasterizer that influences everything except FillRectAA and FillRectNA tests. It's worth noting that at the moment Blend2D uses only 128-bit SIMD in JIT compiled code. It can still use AVX instructions to eliminate unnecessary moves that are present in SSE2+ code, however, it only uses 128-bit XMM registers. This means that Blend2D can lose (and actually loses) in tests where 256-bit AVX optimizations are more important than rasterizing. Additionally, Blend2D pipeline generator currently doesn't support emitting multiple loops in source fetching part of the pipeline, which means that we cannot use some tricks that are used in other libraries in aligned pattern fills (specific to Pattern_NN tests). We know about these and we will improve the performance in near future.
  • AGG is slow especially in operations that are heavy on filling (FillRect, FillRound), because it has no SIMD optimizations to accelerate pixel composition. On the other hand, tests that involve rasterization more than composition are much better and can compete with both Cairo and Qt.
  • Cairo has performance problems with gradients. It uses a different approach compared to AGG, Blend2D, and Qt, which makes the rendering much slower. In addition, Cairo is the only library that has sometimes faster SrcOver operator than SrcCopy, which is most probably caused by SIMD fast-paths only implementing SrcOver and not SrcCopy. It shows up as we compare both.
  • Qt loses in all rasterization heavy tests because of a poor rasterization performance. The rasterizer is not good as a general purpose rasterizer as it doesn't handle well complex vector art. FillPoly and StrokePoly tests show that the rasterizer performance degrades fast when the input path increases in size and complexity. On the other hand, Qt's compositing pipeline is more optimized than Cairo especially considering gradients and bilinear filtering of images.

Border Cases

There are always border cases when it comes to 2D rendering. Some operations are handled differently across rendering engines, which is unwanted as we would like to always compare comparable. The following table attemps to document each border case and possible workarounds to make the performance comparison as fair and unbiased as possible:

Library Issue Comment / Workarounds
Cairo Global transparency Cairo has very tricky support for global transparency. It only exports cairo_paint_with_alpha() function that doesn't match Blend2D capabilities so tests with global alpha are disabled.
Qt Extend modes Qt doesn't allow to specify extend modes for patterns (only gradients are supported). Implicitly only repeating patterns are supported, but Qt internally also implements padding for blitting transformed images. This means that all pattern benchmarks are restricted to repeat mode only.
Qt Aligned translations Qt always tries to align a translation matrix that has no scaling and skewing into pixels when a source style is Pattern_BI. Since this style explicitly tests non-aligned case a non-significant scaling part is added to the transformation matrix to bypass the hardcoded condition in Qt. The scaling part is very small and doesn't cause any visual difference, however, if there is a better way we would like to hear about it. This workaround is only required to test FillRectNA with Pattern_BI style, because without this fix the test would be identical to Pattern_NN in Qt case.

Output Images

The benchmarking tool generates hundreds of images like these shown below: