<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://keaukraine.site/feed.xml" rel="self" type="application/atom+xml" /><link href="https://keaukraine.site/" rel="alternate" type="text/html" /><updated>2025-08-29T17:28:34+03:00</updated><id>https://keaukraine.site/feed.xml</id><title type="html">Oleksandr Popov</title><subtitle>A blog mostly about my 3D wallpapers and related stuff</subtitle><entry><title type="html">Automatically picking ASTC textures quality</title><link href="https://keaukraine.site/automatic-astc-compression/" rel="alternate" type="text/html" title="Automatically picking ASTC textures quality" /><published>2025-08-29T17:22:00+03:00</published><updated>2025-08-29T17:22:00+03:00</updated><id>https://keaukraine.site/automatic-astc-compression</id><content type="html" xml:base="https://keaukraine.site/automatic-astc-compression/"><![CDATA[<p>In this small article I’ll describe the simple script I’ve created while working on the Cartoon Lighthouse live wallpaper. You can find a live web demo of this app <a href="https://keaukraine.github.io/webgl-kmp-cartoonlighthouse/index.html">here</a> and take a look at its source code <a href="https://github.com/keaukraine/webgl-kmp-cartoonlighthouse">here</a>.</p>

<p>I’ve encountered a slight problem with encoding 31 textures into ASTC format. The thing is that usually in all our apps geometries are batched and merged to reduce the amount of draw calls. So typically we use a couple of large texture atlases for all geometries. In this scene however geometries were not batched and I decided to keep this as is since the total amount of draw calls is still rather small (around 30). So this resulted in 31 textures which were to be converted into ASTC format.</p>

<p>So usually I just encoded textures with some ASTC block sizes (4x4, 6x6, 8x8 and 10x10) and manually picked the one which looked good enough for my liking. While this was OK for a handful of textures it was quite a cumbersome process for 31 ones. This needed some automation, and fortunately I’ve encountered this <a href="https://x.com/castano/status/1953247742941380822">wonderful tweet</a> about quality improvements of Spark realtime texture encoder (make sure to check it out, the project is fascinating by itself). What was interesting in it for me is the usage of a tool called SSIMULACRA2. This <a href="https://github.com/cloudinary/ssimulacra2">open-source</a> tool compares two images and gives it a quality score. And the algorithm is specifically tuned to look for both user-perceived and blocky compression-related artifacts in the images! That’s exactly what I needed!</p>

<p>Over the weekend I’ve created a bash script (bash is not my native language so I’ve used ChatGPT to help me here and there) and tested it. The algorithm of the script can be described with this pseudocode:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>for each input file  
	for each ASTC block size from the lowest to the highest quality  
		encode image and save its decoded image  
		compare quality between original and decoded images  
		if quality is within threshold encode with this block size
</code></pre></div></div>

<p>You can optionally specify SSIMULACRA2 score but the default is 70 which can be described as “there are some compression artifacts if you look for them specifically”.</p>

<p>There’s additional logic for the detection of mipmaps - these must be encoded with the same compression so only the highest level is tested for quality.</p>

<p>There’s a limitation of the SSIMULACRA2 tool - it doesn’t validate images smaller than 8x8 pixels so these are encoded with fallback 4x4 block size.</p>

<p>Also I’ve played around with bash output a little so it is not so silent while working - it informs how it processes the current image, and shows scores for the processed images:<br />
<img src="/assets/blog/astc-compression/console.gif" alt="Sample console output" /></p>

<p>This saved not only my time - I didn’t have to check the quality of each file manually - but file sizes too! Because of my laziness I checked manually only a subset of ASTC block sizes but this script relentlessly runs through all 14 blocks supported by ASTC, choosing the best suitable one. And for some smallish textures with quite uniform images even 12x12 block size was good enough.</p>

<p>You can check the source code of the script <a href="https://github.com/keaukraine/astc-compression">here on the GitHub</a> and adapt it for your needs. Also it has an imagemagick script to create mip maps.</p>]]></content><author><name>keaukraine</name></author><category term="blog" /><category term="astc" /><category term="ssimulacra2" /><category term="compression" /><summary type="html"><![CDATA[Automatically picking ASTC textures quality]]></summary></entry><entry><title type="html">How to improve MSAA performance of MTKView</title><link href="https://keaukraine.site/how-to-improve-msaa-in-metal/" rel="alternate" type="text/html" title="How to improve MSAA performance of MTKView" /><published>2025-01-10T12:44:00+02:00</published><updated>2025-01-10T12:44:00+02:00</updated><id>https://keaukraine.site/how-to-improve-msaa-in-metal</id><content type="html" xml:base="https://keaukraine.site/how-to-improve-msaa-in-metal/"><![CDATA[<p>The easiest and fastest way to use Metal in MacOS app is to use <code class="language-plaintext highlighter-rouge">MTKView</code>. It is a handy wrapper which initializes all low-level stuff under the hood so you can get right to the fun part — implementing actual rendering.</p>

<p>However, because of its simplicity it does have a couple of shortcomings and for some reasons doesn’t provide access to all internal things under its hood. One of these minor inconveniences is the way it initializes multisampled render targets.</p>

<p>To understand why this is important let’s explain how Metal handles MSAA. It supports multiple ways of implementing it:</p>

<ol>
  <li>You can have a multisampled render target and then resolve it to on-screen render target automatically.</li>
  <li>You can use this multisampled render target with custom resolve shaders.</li>
  <li>On supported hardware, you can omit mutlisampled render target and resolve it automatically directly to the final render target. This will still use multisampled render target but it will be memoryless.</li>
  <li>The same approach but with tile shaders to resolve (apply custom tone mapping, etc).</li>
</ol>

<p>You can find a more detailed explanation of these methods with a sample XCode project in this official Apple <a href="https://developer.apple.com/documentation/metal/metal_sample_code_library/improving_edge-rendering_quality_with_multisample_antialiasing_msaa">documentation article</a></p>

<p>What is of a particular interest for us is the memoryless multisampled render targets. They are very efficient since they are transient and reside only in (extremely fast and tiny) temporary tile memory of GPU. Because of this they don’t use main memory allocations and don’t add up to precious VRAM access bandwidth.</p>

<p>Here is the typical 4x MSAA rasterization process with default render pass created by <code class="language-plaintext highlighter-rouge">MTKView</code>:</p>

<p><img src="/assets/blog/metal-msaa/default-msaa.webp" alt="Default MSAA pipeline" /></p>

<p>And here is the same one but using efficient memoryless render target:</p>

<p><img src="/assets/blog/metal-msaa/memoryless-msaa.webp" alt="Memoryless MSAA pipeline" /></p>

<p>Basically, the only difference is that we substitute transient multisampled render target with the memoryless one and this results in a huge improvement in memory allocation and bandwidth. Please note that according to the <a href="https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf">Metal Feature Set</a> tables memoryless render targets are not supported by old devices. Namely, Intel-based Macs don’t support tiled rendering and cannot use them. But if you target shiny new Apple-silicon devices then you definitely must use them because they are so extremely efficient.</p>

<p>The thing is that (possibly for a better support of all hardware) <code class="language-plaintext highlighter-rouge">MTKView</code> initializes MSAA only with classic in-memory render targets — multisampled one for the rendering and and the final one for resolving and presenting result on the screen.</p>

<p>In the aforementioned official Metal MSAA example you can find a proper way of initialization of memoryless MSAA resolve but it doesn’t use this handy <code class="language-plaintext highlighter-rouge">MTKView</code> — instead there’s quite a lot of glue code to make it work.</p>

<p>However I’ve found a hacky yet relatively simple and perfectly working way of initializing efficient memoryless MSAA resolve using the default <code class="language-plaintext highlighter-rouge">MTKView</code> wrapper view.
Let’s take a look at what configuration options <code class="language-plaintext highlighter-rouge">MTKView</code> does provide.
Obviously there’s a <code class="language-plaintext highlighter-rouge">sampleCount</code> which will initialize MSAA render targets. Also there are <code class="language-plaintext highlighter-rouge">depthStencilPixelFormat</code> and <code class="language-plaintext highlighter-rouge">depthStencilStorageMode</code> fields. And you can change depth+stencil to use memoryless storage too by setting <code class="language-plaintext highlighter-rouge">depthStencilStorageMode=.memoryless</code>, which also saves a lot of RAM usage and bandwidth in case you don’t need depth information of your frames.
Here’s a typical <code class="language-plaintext highlighter-rouge">MTKView</code> initialization code (for an Apple-silicon GPUs, which support memoryless textures):</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">_metalView</span><span class="o">.</span><span class="n">depthStencilPixelFormat</span> <span class="o">=</span> <span class="o">.</span><span class="n">depth32Float</span>
<span class="n">_metalView</span><span class="o">.</span><span class="n">depthStencilStorageMode</span> <span class="o">=</span> <span class="o">.</span><span class="n">memoryless</span>
<span class="n">_metalView</span><span class="o">.</span><span class="n">preferredFramesPerSecond</span> <span class="o">=</span> <span class="mi">60</span>
<span class="n">_metalView</span><span class="o">.</span><span class="n">sampleCount</span> <span class="o">=</span> <span class="mi">4</span> <span class="c1">// hard-coded 4 samples but you can query max available samples for GPU and set it accordingly</span>
</code></pre></div></div>

<p>That’s cool, let’s also switch color render target to the memoryless mode too! Unfortunately for the color render target there is only <code class="language-plaintext highlighter-rouge">colorPixelFormat</code> available (typically set up automatically) and there is no <code class="language-plaintext highlighter-rouge">colorStorageMode</code>. So there’s no easy way to just set it up to use memoryless MSAA mode.</p>

<p>Still there’s a relatively simple way of <em>switching</em> it to the memoryless mode after it has been initialized!</p>

<p>The thing is that Metal API allows you to change the MSAA resolve texture of the current render pass. The descriptor of this render pass is provided to you by <code class="language-plaintext highlighter-rouge">MTKView</code> and obviously it is pre-initialized with in-memory texture.
So all you need to do is on the first frame you draw to create a memoryless render target and substitute the default resolve texture with the new one.</p>

<div class="language-swift highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// New memoryless MSAA texture</span>
<span class="k">var</span> <span class="nv">textureMsaa</span><span class="p">:</span> <span class="kt">MTLTexture</span><span class="p">?</span>

<span class="o">.................</span>

<span class="kd">func</span> <span class="nf">yourCodeToDrawStuff</span><span class="p">()</span> <span class="p">{</span>
    <span class="c1">// Before rendering, create and replace MSAA resolve RTT.</span>
    <span class="k">do</span> <span class="p">{</span>
        <span class="k">let</span> <span class="nv">resolveTexture</span> <span class="o">=</span> <span class="n">view</span><span class="o">.</span><span class="n">currentRenderPassDescriptor</span><span class="p">?</span><span class="o">.</span><span class="n">colorAttachments</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">resolveTexture</span>
        <span class="k">if</span> <span class="n">resolveTexture</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
            <span class="k">let</span> <span class="nv">width</span> <span class="o">=</span> <span class="n">resolveTexture</span><span class="o">!.</span><span class="n">width</span>
            <span class="k">let</span> <span class="nv">height</span> <span class="o">=</span> <span class="n">resolveTexture</span><span class="o">!.</span><span class="n">height</span>
            <span class="k">if</span> <span class="n">textureMsaa</span> <span class="o">==</span> <span class="kc">nil</span> <span class="o">||</span> <span class="n">textureMsaa</span><span class="p">?</span><span class="o">.</span><span class="n">width</span> <span class="o">!=</span> <span class="n">width</span> <span class="o">||</span> <span class="n">textureMsaa</span><span class="p">?</span><span class="o">.</span><span class="n">height</span> <span class="o">!=</span> <span class="n">height</span> <span class="p">{</span>
                <span class="c1">// Auto-purge the old unused resolve texture</span>
                <span class="n">renderPassDescriptor</span><span class="p">?</span><span class="o">.</span><span class="n">colorAttachments</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">texture</span><span class="p">?</span><span class="o">.</span><span class="nf">setPurgeableState</span><span class="p">(</span><span class="o">.</span><span class="n">volatile</span><span class="p">)</span>
                <span class="n">textureMsaa</span> <span class="o">=</span> <span class="k">try</span> <span class="nf">create2DRenderTargetMemoryless</span><span class="p">(</span><span class="nv">width</span><span class="p">:</span> <span class="n">width</span><span class="p">,</span> <span class="nv">height</span><span class="p">:</span> <span class="n">height</span><span class="p">,</span> <span class="nv">pixelFormat</span><span class="p">:</span> <span class="o">.</span><span class="n">bgra8Unorm</span><span class="p">,</span> <span class="nv">metalDevice</span><span class="p">:</span> <span class="n">device</span><span class="p">)</span>
                <span class="n">textureMsaa</span><span class="p">?</span><span class="o">.</span><span class="n">label</span> <span class="o">=</span> <span class="s">"Main pass RTT"</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span> <span class="k">catch</span> <span class="p">{</span>
        <span class="nf">fatalError</span><span class="p">(</span><span class="s">"Cannot create MSAA texture: </span><span class="se">\(</span><span class="n">error</span><span class="se">)</span><span class="s">"</span><span class="p">)</span>
    <span class="p">}</span>
    <span class="c1">// Use new memoryless texture</span>
    <span class="n">view</span><span class="o">.</span><span class="n">currentRenderPassDescriptor</span><span class="p">?</span><span class="o">.</span><span class="n">colorAttachments</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">texture</span> <span class="o">=</span> <span class="n">textureMsaa</span>

    <span class="c1">// Do you rendering here as usual</span>
    <span class="o">.....................</span>
<span class="p">}</span>

<span class="kd">func</span> <span class="nf">create2DRenderTargetMemoryless</span><span class="p">(</span><span class="nv">width</span><span class="p">:</span> <span class="kt">Int</span><span class="p">,</span> <span class="nv">height</span><span class="p">:</span> <span class="kt">Int</span><span class="p">,</span> <span class="nv">pixelFormat</span><span class="p">:</span> <span class="kt">MTLPixelFormat</span><span class="p">,</span> <span class="nv">metalDevice</span><span class="p">:</span> <span class="kt">MTLDevice</span><span class="p">)</span> <span class="k">throws</span> <span class="o">-&gt;</span> <span class="kt">MTLTexture</span> <span class="p">{</span>
    <span class="k">let</span> <span class="nv">descriptor</span> <span class="o">=</span> <span class="kt">MTLTextureDescriptor</span><span class="o">.</span><span class="nf">texture2DDescriptor</span><span class="p">(</span><span class="nv">pixelFormat</span><span class="p">:</span> <span class="n">pixelFormat</span><span class="p">,</span> <span class="nv">width</span><span class="p">:</span> <span class="n">width</span><span class="p">,</span> <span class="nv">height</span><span class="p">:</span> <span class="n">height</span><span class="p">,</span> <span class="nv">mipmapped</span><span class="p">:</span> <span class="kc">false</span><span class="p">)</span>
    <span class="n">descriptor</span><span class="o">.</span><span class="n">textureType</span> <span class="o">=</span> <span class="o">.</span><span class="n">type2DMultisample</span>
    <span class="n">descriptor</span><span class="o">.</span><span class="n">sampleCount</span> <span class="o">=</span> <span class="mi">4</span> <span class="c1">// Yes I use hard-coded 4 samples here too :)</span>
    <span class="n">descriptor</span><span class="o">.</span><span class="n">usage</span> <span class="o">=</span> <span class="p">[</span><span class="o">.</span><span class="n">renderTarget</span><span class="p">]</span>
    <span class="n">descriptor</span><span class="o">.</span><span class="n">resourceOptions</span> <span class="o">=</span> <span class="p">[</span><span class="o">.</span><span class="n">storageModeMemoryless</span><span class="p">]</span>
    <span class="k">let</span> <span class="nv">result</span> <span class="o">=</span> <span class="n">metalDevice</span><span class="o">.</span><span class="nf">makeTexture</span><span class="p">(</span><span class="nv">descriptor</span><span class="p">:</span> <span class="n">descriptor</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">result</span> <span class="o">!=</span> <span class="kc">nil</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">result</span><span class="o">!</span>
    <span class="p">}</span>
    <span class="k">throw</span> <span class="kt">RuntimeError</span><span class="p">(</span><span class="s">"Cannot create texture with pixelFormat </span><span class="se">\(</span><span class="n">pixelFormat</span><span class="se">)</span><span class="s"> of size </span><span class="se">\(</span><span class="n">width</span><span class="se">)</span><span class="s">x</span><span class="se">\(</span><span class="n">height</span><span class="se">)</span><span class="s">"</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This simple trick effectively substitutes the auto-created in-memory color render target with the memoryless one.</p>

<p>There is one important step to do — you must set the purgeable state to volatile for the old unused render target in order for it to free memory. Otherwise even if it won’t be used it will still keep a large amount of memory allocated for it. This is an extremely powerful and easy-to-use feature of Metal API which I love — if you don’t use some resource, API can get rid of it for you automagically. You don’t have to manually delete them as in OpenGL.</p>

<p>Here are some final memory usage comparisons on a MacBook Air M1 with a full-screen 2560x1600 render target:</p>

<p>First, a default approach — <code class="language-plaintext highlighter-rouge">MTKView</code> with 4x MSAA in-memory resolve texture:</p>

<p><img src="/assets/blog/metal-msaa/memory-usage-msaa-in-memory.png" alt="Image description" /></p>

<p>This multisampled texture uses 78 MB of memory which is being accessed (both write and read) on every frame!</p>

<p>And here is the memoryless one:</p>

<p><img src="/assets/blog/metal-msaa/memory-usage-msaa-memoryless-final.png" alt="Image description" /></p>

<p>Notice the 78 MB texture is now listed in the unused resources. It actually uses 0 bytes, only listed as a “dormant” 78MB resource which could be re-allocated in case it will be reused again.
This can be confirmed in the Activity Monitor. Before optimization:</p>

<p><img src="/assets/blog/metal-msaa/ram-usage-default-msaa.png" alt="Image description" /></p>

<p>And after:</p>

<p><img src="/assets/blog/metal-msaa/ram-usage-memoryless-msaa.png" alt="Image description" /></p>

<p>Now my app uses just under 80 MB of total RAM instead of 150! This is a good result for a full-screen 3D app — actually it about as much as just two stock MacOS calculators! (Yes you can check it yourself — <em>Calculator</em> uses ~40MB of RAM which seems to be a bit excessive).</p>

<p>Hope this little tutorial will be useful and will make your Metal app more memory and power-efficient!</p>]]></content><author><name>keaukraine</name></author><category term="blog" /><category term="metal" /><category term="msaa" /><summary type="html"><![CDATA[How to improve MSAA performance of MTKView]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://keaukraine.site/assets/blog/metal-msaa/title.webp" /><media:content medium="image" url="https://keaukraine.site/assets/blog/metal-msaa/title.webp" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Stylized Castle WebGL demo</title><link href="https://keaukraine.site/stylized-castle/" rel="alternate" type="text/html" title="Stylized Castle WebGL demo" /><published>2023-10-27T10:00:00+03:00</published><updated>2023-10-27T10:00:00+03:00</updated><id>https://keaukraine.site/stylized-castle</id><content type="html" xml:base="https://keaukraine.site/stylized-castle/"><![CDATA[<p>This is my first attempt to create a kitbashed scene from tiling assets. I have used assets by <a href="https://kenney.itch.io/">Kenney</a> and his <a href="https://kenney.itch.io/assetforge-deluxe">Asset Forge</a> Deluxe as an editor to create the scene. Result was exported to OBJ files and then converted into ready-to-use buffers with vertex data and indices.</p>

<p>Scene has a distinct stylized look with no textures used except for animated characters — knights and birds. To add depth to the scene, a simple linear vertex fog and real-time shadows are applied.</p>

<p>Total scene polycount with both static and dynamic objects is 95k triangles.</p>

<h2 id="static-geometry">Static geometry</h2>

<p>All static objects in the scene are merged into 2 large meshes to reduce the amount of draw calls. These assets don’t use textures, instead vertex colors are used. These colors are stored as indices which allows for an easier customization of color themes. Shaders accept up to 32 colors set via uniforms.</p>

<p>The stride for static objects vertex data is 12 bytes — 3 FP16 values are used for position, 3 normalized signed bytes for normals, and 1 unsigned byte for color. 2 unused bytes are used for 4-byte alignment of data:</p>

<center> <img alt="static geometries vertex data" src="/assets/blog/stylized-castle/geometry-strides.webp" /> </center>

<p>I also tried using a more compact packed <code class="language-plaintext highlighter-rouge">GL_INT_2_10_10_10_REV</code> type for vertex positions to fit data in 8 bytes. Unfortunately, its precision was just not enough for this purpose. This data type provides precision of roughly 1 meter per 1 km. And since the scene uses quite large geometries batched into 2 meshes this precision was clearly not enough.</p>

<h2 id="shadow-maps">Shadow maps</h2>

<p>Lighting in the scene is not baked — shadow maps are used for all objects to cast dynamic shadows. Shadow maps have no cascades since the scene is rather small. However, the detail of shadows is adjusted in a different way. The light source FOV is slightly adjusted per each camera to have more detailed shadows for close-up scenes and less detailed for overviews.</p>

<p>Shadow map resolution is 2048x2048 which is sufficient to create detailed enough shadows.</p>

<p>To smooth out hard shadow edges, the hardware bilinear filtering of shadow map is used. Please note that OpenGL ES 3.0+ / WebGL 2.0 is required for this. OpenGL ES 2.0 supports only unfiltered sampling from shadow maps which results in boolean-like comparison whether a fragment is in shadow or not. Hardware filtering is combined with 5-tap sub-texel percentage closer filtering (PCF). This results in smooth shadow edges with a relatively small amount of texture samples.</p>

<p>I also considered a more expensive 9-tap PCF which improved image quality in case of unfiltered shadow texture but with the filtered texture improvement over 5-tap one was negligible. So according to the golden rule of real-time graphics programming “it looks good enough”, the final filtering used in the app is 5-tap PCF with hardware filtering.</p>

<p>Here you can see comparison of different shadow filtering modes:</p>

<center> <img alt="shadowmaps filtering" src="/assets/blog/stylized-castle/filtering-comparison.gif" /> </center>

<h3 id="performance">Performance</h3>

<p>To improve performance, a couple of optimizations are used.</p>

<p>First is a quite typical, simple and widely used one — shadow map is updated at half framerate. This is almost unnoticeable since nothing in the scene moves too fast. For cases when the camera and light direction is about to switch to a new position it is rendered at full framerate to prevent 1-frame flickering of shadow map rendered with the old light source.</p>

<p>The second trick is that the PCF is not applied for distant fragments — instead a single sample of shadow texture is used. It is impossible to spot any difference in image quality in the distance because shadows are still hardware-filtered but the performance and efficiency are improved. But isn’t it considered a bad practice to use branching in the shaders? Yes and no. In general, it is not so bad on modern hardware — if <a href="https://solidpixel.github.io/2021/12/09/branches_in_shaders.html">used properly</a> and not in an attempt to create some all-in-one uber-shader. Actually, it is quite often used in raymarching where it can provide a measurable performance improvement by branching out empty/occluded parts. In this particular case branching helps to save one of the most critical resources on mobile GPUs — memory bandwidth.</p>

<p>So how can we test if this branching actually improved things or only unnecessarily complicated shaders?</p>

<p>First, let’s perform a static analysis of shaders. To do this, I use the Mali Offline Compiler tool. Here are results:</p>

<p><em>Non-optimized:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Work registers: 25
Uniform registers: 16
Stack spilling: false
16-bit arithmetic: 66%
                              FMA     CVT     SFU      LS       V       T    Bound
Total instruction cycles:    0.31    0.14    0.06    0.00    1.00    1.25        T
Shortest path cycles:        0.31    0.11    0.06    0.00    1.00    1.25        T
Longest path cycles:         0.31    0.14    0.06    0.00    1.00    1.25        T

FMA = Arith FMA, CVT = Arith CVT, SFU = Arith SFU, LS = Load/Store, V = Varying, T = Texture
</code></pre></div></div>

<p><em>Conditional PCF:</em></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Work registers: 21
Uniform registers: 16
Stack spilling: false
16-bit arithmetic: 68%
                              FMA     CVT     SFU      LS       V       T    Bound
Total instruction cycles:    0.31    0.22    0.06    0.00    1.00    1.50        T
Shortest path cycles:        0.17    0.11    0.06    0.00    1.00    0.25        V
Longest path cycles:         0.31    0.20    0.06    0.00    1.00    1.25        T

FMA = Arith FMA, CVT = Arith CVT, SFU = Arith SFU, LS = Load/Store, V = Varying, T = Texture
</code></pre></div></div>

<p>So according to these results, the new version is no longer texture-bound in the shortest path and still has the same cycles for the longest path. Also, the number of used registers is reduced. Looks good on paper, isn’t it?</p>

<p>But of course both versions of shaders perform identically in the Android app on my Pixel 7a — it always runs at stable 90 fps. So to see if GPU is less loaded, let’s run Android GPU Inspector on two versions of app and compare some metrics from profiles:</p>

<center> <img alt="Image description" src="/assets/blog/stylized-castle/filtering-performance.webp" /> </center>

<p>As expected, it doesn’t affect overall GPU cycles much but reduces load on texture units. As a result, the GPU now is less busy — it consumes less power and has more free resources to smoothly render home screen UI on top of live wallpaper.</p>

<h2 id="animation">Animation</h2>

<p>All animated objects in the scene are animated procedurally. No baked skeletal or vertex animations are used. The simple shapes of these objects allow them to be animated in vertex shaders relatively easily.</p>

<p>I have found some inspiration for procedural animations of rats in the “<em>King, Witch and Dragon</em>” (<a href="https://torchinsky.me/shader-animation-unity/">https://torchinsky.me/shader-animation-unity/</a>) and fish <em>ABZU</em> (<a href="https://www.youtube.com/watch?v=l9NX06mvp2E">https://www.youtube.com/watch?v=l9NX06mvp2E</a>). Animations in our scene are of course simpler than the ones in these games because animated objects have stylized boxy look and therefore movements are also stylized and simplified.</p>

<h3 id="knights">Knights</h3>

<center> <img alt="knight" src="/assets/blog/stylized-castle/knight.gif" /> </center>

<p>Scene has 16 knights with each model made of just 48 triangles.</p>

<p>Animation is done in the <a href="https://github.com/keaukraine/webgl-stylized-castle/blob/main/src/shaders/KnightAnimatedShader.ts">KnightAnimatedShader.ts</a>. Let’s take a look at how it animates the model. To do this, first it needs to detect vertices belonging to different body parts of the knight. But the vertex data doesn’t have a special “bone id” attribute for this. This cannot be done by testing vertex positions because some body parts overlap. For example, the head has 4 bottom coordinates identical to the body. Some texture coordinates also overlap so we cannot rely on them too as in the case of <a href="https://torchinsky.me/shader-animation-unity/">rats animation</a> in the “King, Witch and Dragon” game. So I grouped the vertices for each body in the buffer, and the vertex shader determines body parts simply by comparing their <code class="language-plaintext highlighter-rouge">gl_VertexID</code>. Of course this is not optimal because it introduces branching but it is not overly excessive and it is done in a vertex shader for a very low-poly model. Model is grouped to have first body vertices, then head and then arms:</p>

<center> <img alt="knight vertex data" src="/assets/blog/stylized-castle/knight-stride.webp" /> </center>

<p>And here is the knight model with applied test coloring to visualize body parts:</p>

<center> <img alt="knight colored" src="/assets/blog/stylized-castle/knight-colored.webp" /> </center>

<p>Now that shader knows which vertex belongs to which body part it applies rotations to them provided via uniforms. Rotation pivot points are hard-coded in shader. You may notice that only the head and arms are animated. Because models don’t have separate legs they are not animated. Instead a bobbing is applied to the whole model to add a rather convincing effect of “walking”. The bobbing is simply an absolute value of the sine wave:</p>

<center> <img alt="bobbing" src="/assets/blog/stylized-castle/walk-bobbing.png" /> </center>

<h3 id="birds">Birds</h3>

<center> <img alt="bird animation" src="/assets/blog/stylized-castle/bird.gif" /> </center>

<p>There are 6 eagles soaring in the sky, each model made of 70 triangles. They are flying in different circular paths.</p>

<p>Birds are rendered with <a href="https://github.com/keaukraine/webgl-stylized-castle/blob/main/src/shaders/EagleAnimatedShader.ts">EagleAnimatedShader.ts</a>. Animations are done in the same way as for knights but they are simpler since only wings are animated and they are rotated synchronously. So only a single rotation timer is passed into the shader via uniform to control animation.</p>

<h3 id="flags">Flags</h3>

<center> <img alt="flag animation" src="/assets/blog/stylized-castle/flag.gif" /> </center>

<p>Scene has 3 different flags, all animated with the same <a href="https://github.com/keaukraine/webgl-stylized-castle/blob/main/src/shaders/FlagSmShader.ts">FlagSmShader.ts</a>. Technique is inspired by the wavy animation of the fish in <em>ABZU</em>. The simple sine wave is applied to vertices, reducing amplitude closer to the flagpole and increasing near the end of the flag. To correctly apply lighting, normals are also bent. To bend them the cosine wave of the same frequency is used since it is derivative of sine.</p>

<h3 id="wind-stripes">Wind Stripes</h3>

<center> <img alt="wind animation" src="/assets/blog/stylized-castle/wind.gif" /> </center>

<p>A small detail added to the scene as a last touch is mostly inspired by the <em>Sea Of Thieves</em> wind effect. In the <em>Sea Of Thieves</em> these stripes serve a purpose of showing wind direction to align sails so they are more straight. In our scene they are purely for the looks so they bend and twist much more.</p>

<p>Let’s take a look at the shader to draw them — <a href="https://github.com/keaukraine/webgl-stylized-castle/blob/main/src/shaders/WindShader.ts">WindShader.ts</a>. It is even “more procedural” than the ones used for animated objects. It doesn’t use geometry buffers at all and generates triangles based on <code class="language-plaintext highlighter-rouge">gl_VertexID</code>. Indeed, as you can see in the source of its <code class="language-plaintext highlighter-rouge">draw()</code> method it doesn’t set up any buffers with vertex data. Instead it uses the hard-coded <code class="language-plaintext highlighter-rouge">VERTICES</code> array for two triangles declaring a single square segment. So if we need to draw 50 segments, we issue a <code class="language-plaintext highlighter-rouge">glDrawArrays</code> call with 50 * 3 * 2 triangle primitives. The vertex shader offsets each segment based on the <code class="language-plaintext highlighter-rouge">gl_VertexID</code> and tapers both ends of a final stripe using <code class="language-plaintext highlighter-rouge">smoothstep</code>. Then the coordinates of a resulting spline shape are shifted by an offset timer so they appear moving. Next, the shape is deformed by two sine wave noises in world-space. Color is animated to fade in and out, and offset is animated to move the stripe. All this results in a random snake-like movement of the stripes but keeps them aligned to the world-space path.</p>

<p>Here is a breakdown of rendering of a short 20-segments wind stripe with test coloring of segments to clearly visualize geometry:</p>

<center> <img alt="wind stripe steps" src="/assets/blog/stylized-castle/wind-rendering.gif" /> </center>

<h2 id="depth-only-shaders">Depth-only shaders</h2>

<p>All objects in the scene except wind stripes cast shadows so they have to be rendered to a depth map texture using the light source camera. For this, the simplified versions of corresponding animated and static shaders are used. They perform all the same vertex transformations but don’t calculate lighting.</p>

<p>For example, let’s take a look at the <a href="https://github.com/keaukraine/webgl-stylized-castle/blob/main/src/shaders/KnightDepthShader.ts">KnightDepthShader.ts</a>. It performs all the same vertex transformations to animate head and arms but does not calculate lighting based on normals. Even more than that, you may notice that its fragment shader is empty — it provides no color output at all. These are perfectly valid shaders in GLSL ES 3.00 since their only purpose is to write to the depth (shadow map) attachment.</p>

<h2 id="results-and-possible-additional-optimizations">Results and possible additional optimizations</h2>

<p>Live web demo: <a href="https://keaukraine.github.io/webgl-stylized-castle/index.html">https://keaukraine.github.io/webgl-stylized-castle/index.html</a></p>

<p>Source code: <a href="https://github.com/keaukraine/webgl-stylized-castle">https://github.com/keaukraine/webgl-stylized-castle</a></p>

<p>Android live wallpaper app - <a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.cartooncastle3d">https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.cartooncastle3d</a></p>

<p>Final web demo is ~2.2 MB which is not quite optimal because all objects are exported as 2 huge batched meshes. And there are a lot of repetitive objects like trees and cannons which are good candidates for instanced rendering.</p>]]></content><author><name>keaukraine</name></author><category term="blog" /><category term="webgl" /><category term="3d" /><category term="castle" /><category term="animation" /><summary type="html"><![CDATA[Stylized Castle WebGL demo]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://keaukraine.site/assets/blog/stylized-castle/title.webp" /><media:content medium="image" url="https://keaukraine.site/assets/blog/stylized-castle/title.webp" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Voxel Airplanes 3D, or the story about fitting vertex data into 4 bytes</title><link href="https://keaukraine.site/voxel-airplanes/" rel="alternate" type="text/html" title="Voxel Airplanes 3D, or the story about fitting vertex data into 4 bytes" /><published>2023-10-27T10:00:00+03:00</published><updated>2023-10-27T10:00:00+03:00</updated><id>https://keaukraine.site/voxel-airplanes</id><content type="html" xml:base="https://keaukraine.site/voxel-airplanes/"><![CDATA[<p>While working on Floating Islands live wallpaper I stumbled upon these cute voxel <a href="https://maxparata.itch.io/voxel-plane">3D models of airplanes by Max Parata</a>. I already wanted to bring life to some stylized low-fi 3D art so in my mind I immediately saw how to create a stylized old-skool low-fi scene with these assets. You can watch a <a href="https://keaukraine.github.io/webgl-voxel-airplanes/index.html">live WebGL demo here</a>.
This project was quite fast to implement - the time span between the first commit with rough WIP layout and the final version is about 20 days. Yet it was quite fun to create it because during development I’ve got some fresh ideas on how to improve the scene. All additions were really minor to keep the scene as simple as possible in accordance with its art design. My brother provided valuable feedback on how to improve it and also helped with optimization of some geometries.</p>

<hr />

<h2 id="scene-composition">Scene composition</h2>

<p>Scene is aesthetically simple so it contains just four objects: planes, ground, clouds and wind:</p>

<center> <img alt="Scene render order" src="/assets/blog/voxel-airplanes/scene-render-order.gif" /> </center>

<p>Next, let’s take a look at what shaders are used to render models, and how geometries of these models are optimized.</p>

<h2 id="planes">Planes</h2>

<p>Technically planes are rendered not as voxels (each voxel cube individually) but as a ready mesh, exported from <a href="https://www.voxelmade.com/magicavoxel/">MagicaVoxel</a>. They are not simplified using <a href="https://www.thestrokeforge.xyz/vox-cleaner">VoxCleaner</a> to use texture atlases and reduce polycount - I decided to use them as is because it will be easier to create alternate palettes, and anyways vertex data for planes have ridiculously small memory footprint.</p>

<p>Each plane consists of 3 parts - plane body, glass cockpit and rotating propellers. Plane is rendered using 2 shaders - a simple directionally lit  <a href="https://github.com/keaukraine/webgl-voxel-airplanes/blob/main/src/shaders/PlaneBodyLitShader.ts">PlaneBodyLitShader.ts</a> for body and props and its variation <a href="https://github.com/keaukraine/webgl-voxel-airplanes/blob/main/src/shaders/GlassShader.ts">GlassShader.ts</a> for glass with stylized reflections.</p>

<p>The specifics of plane models allow vertex data to be packed using really small data types. All plane models are small and fit in the -127…+127 bounding box. And since vertices represent voxels they are always snapped to 1x1 grid. So I chose to store vertex positions in signed bytes which have just enough precision for the job.</p>

<p>The most interesting part however is storing normals and texture coordinates. They are packed in a single byte.</p>

<p>First, normals can be stored with just 3 bits. But how can it be enough to store this kind of information in 3 bits? Since voxels are cubes, they can have only 6 variations of normals. So instead of storing normals as vectors they can be represented with an index, and 3 bits is enough to store up to 8 variations. An array of actual values is hard-coded in the vertex shader. After 3 bits of byte are used by normals, we are left with 5 more bits which allow our models to have a 32-colors palette. And again this is more than enough for our case because stylized models use only 23 unique colors. Since the palette textures used by models are of 32x1 size, the V texture coordinate is omitted and hardcoded to 0. <code class="language-plaintext highlighter-rouge">texelFetch</code> is used to sample color from palette, and after combining it with directional lighting it is ready to be passed to fragment shader.</p>

<p>Please note that bitwise operators used to unpack normals and color indices are available only in OpenGL ES 3.0+ and WebGL 2.</p>

<p>Complete vertex data stride for plane and prop models is just 4 bytes, and it keeps original information 100% intact:</p>

<center> <img alt="Plane stride" src="/assets/blog/voxel-airplanes/stride-plane-new.png" /> </center>

<p>Byte with packed color and normals indices:</p>

<center> <img alt="Packed color and normals byte" src="/assets/blog/voxel-airplanes/packed-color-normals.png" /> </center>

<p>In a separate draw call, a glass with scrolling stylized fake reflection is drawn:</p>

<center> <img alt="Glass with reflection" src="/assets/blog/voxel-airplanes/glass.gif" /> </center>

<p>Glass is rendered without palette texture — its color is set via uniform. The texture passed to this shader is a mask for reflection. Its UV coordinates are calculated in the vertex shader based on model-space vertex coordinates. Of course, it is unfiltered for artistic purposes. Stride for glass models is the same but without texture coordinate for color — the whole byte is used to store normal index:</p>

<center> <img alt="Glass with reflection" src="/assets/blog/voxel-airplanes/stride-glass-new.png" /> </center>

<p><a href="https://github.com/keaukraine/webgl-voxel-airplanes/blob/main/src/shaders/GlassShader.ts">GlassShader</a> samples texture using textureLod with 0 mipmap level. This is done to explicitly tell the OpenGL ES driver that we access the texture without mipmaps and to reduce some overheads. You can read more about this and some other texture sampling optimizations tricks in Pete Harris blog — <a href="https://solidpixel.github.io/2022/03/27/texture_sampling_tips.html">https://solidpixel.github.io/2022/03/27/texture_sampling_tips.html</a></p>

<p>Also glass models have small polycount so they use unsigned byte indices which also reduces memory bandwidth.</p>

<h2 id="wind-stripes">Wind stripes</h2>

<p>For wind stripes I decided to create a shader which doesn’t perform any memory reads at all. Wind stripe has a very simple geometry — a 100x100 units quad, stretched into an appropriately thin line by model matrix. Because of its simplicity, all this geometry can be hardcoded in vertex shader code. And it doesn’t use any textures as well — fragment color is passed via uniform. You can find the implementation in <a href="https://github.com/keaukraine/webgl-voxel-airplanes/blob/main/src/shaders/WindStripeShader.ts">WindStripeShader.ts</a>. It uses <code class="language-plaintext highlighter-rouge">gl_VertexID</code> to get position for a given vertex. When this shader is used to draw a wind stripe, no buffers or textures are bound.</p>

<p>Technically it even can be used as a “building block” to draw more complex shapes by issuing draw calls with different rotating/scaling/shearing its base hard-coded quad geometry but this will be too inefficient.</p>

<h2 id="terrain">Terrain</h2>

<p>Terrain textures are 256x256 tiling images. They are based on aerial photos with some GIMP magic sprinkled over them — contrast adjustments and colors reduced to just 10–12. This adds a more old-skool look to them and makes each texel more pronounced.</p>

<p>Shader to render terrain is in DiffuseScrollingFilteredShader.ts file. Let’s take a look at it.</p>

<p>It is a rather primitive shader which simply pans UV coordinates to create an illusion of moving ground beneath the airplane. However there is one additional thing it does, and it is texture filtering. You may be wondering what filtering is used here, ground clearly is unfiltered, it uses <code class="language-plaintext highlighter-rouge">GL_NEAREST</code> sampling! However, there is a custom antialiased blocky filtering used here. The thing is, that regular <code class="language-plaintext highlighter-rouge">GL_NEAREST</code> sampling produces a lot of aliasing on the edges of texels. This becomes especially noticeable at certain angles of the continuously rotating camera. The <code class="language-plaintext highlighter-rouge">textureBlocky()</code> function alleviates these aliasing artifacts while preserving that extra crispy old-skool look of unfiltered textures. Ground texture actually uses <code class="language-plaintext highlighter-rouge">GL_LINEAR</code> filtering and the <code class="language-plaintext highlighter-rouge">textureBlocky()</code> calculates sampling point to get either an interpolated filtered value at the edges or an exact unfiltered one from the center of texel for any other area.</p>

<p>The original author of this filtering is <a href="https://www.shadertoy.com/user/Permutator">Permutator</a>, and code is used under CC0 license from this shader toy — <a href="https://www.shadertoy.com/view/ltfXWS">https://www.shadertoy.com/view/ltfXWS</a> (you may find some deeper explanation of math used in this filtering technique there).</p>

<p>Here is a comparison (with 4x zoom) of a regular <code class="language-plaintext highlighter-rouge">GL_NEAREST</code> filtering vs custom blocky filtering. As you can see, both are pixelated but the latter one is not aliased.</p>

<center>
    <img alt="Unfiltered terrain" src="/assets/blog/voxel-airplanes/jagged.webp" />
    <img alt="Filtered terrain" src="/assets/blog/voxel-airplanes/smooth.webp" />
</center>

<p>One of the last additions to the scene is a transition between two different terrain textures. When you switch them, they don’t just toggle but instead a cute pixelated transition effect is used to smoothly switch between textures.</p>

<center>
    <img alt="Terrain transition" src="/assets/blog/voxel-airplanes/transition.gif" />
</center>

<p>You can find a code for this transition in the <a href="https://github.com/keaukraine/webgl-voxel-airplanes/blob/main/src/shaders/DiffuseScrollingFilteredTransitionShader.ts">DiffuseScrollingFilteredTransitionShader.ts</a> file. Transition uses tiling blue noise texture for uniformly appearing square blocks on the ground. To make transition smoother, <code class="language-plaintext highlighter-rouge">smoothstep()</code> is used. However there is a commented out line with <code class="language-plaintext highlighter-rouge">step()</code> which makes transition more abrupt if you prefer it.</p>

<h2 id="clouds">Clouds</h2>

<p>Clouds don’t use this antialiased blocky filtering because they don’t rotate, are quite transparent and move relatively fast. This makes it quite hard to spot aliasing artifacts on them so they use the cheapest option available — <code class="language-plaintext highlighter-rouge">GL_NEAREST</code> sampling. Clouds use a custom mesh with cutouts where texture is empty. This significantly reduces overdraw compared to a regular quad mesh. Here it is visualized by disabling blending:</p>

<center>
    <img alt="Terrain transition" src="/assets/blog/voxel-airplanes/cloud-geometry.png" />
</center>

<h2 id="result">Result</h2>
<p>You can see a live web demo <a href="https://keaukraine.github.io/webgl-voxel-airplanes/index.html">here</a> and if you like to have it on the home screen of your Android phone you can get a live wallpaper app on <a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.voxelairplanes">Google Play</a>.</p>

<p>Source code is available <a href="https://github.com/keaukraine/webgl-voxel-airplanes">on GitHub</a>, feel free to play around with it.</p>

<p>As always, the web demo is heavily optimized for the fastest downloading of resources and Android app for the best efficiency and performance. The initial loading of the web demo is just 155 kB, and the size of all models and textures is 1.05 MB so you can fit this data on a floppy disk.</p>

<h2 id="p-s">P. S.</h2>
<p>These WebGL demo and Android app have been made during war in Ukraine despite regular power outages caused by deliberate destruction of country’s electric infrastructure. Please support Ukraine however you can and refrain from buying imported russian goods since taxes from these sales are used to support war.</p>]]></content><author><name>keaukraine</name></author><category term="blog" /><category term="webgl" /><category term="3d" /><category term="airplanes" /><category term="voxel" /><category term="stylized" /><category term="pixelated" /><summary type="html"><![CDATA[Voxel Airplanes 3D, or the story about fitting vertex data into 4 bytes]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://keaukraine.site/assets/blog/voxel-airplanes/title.webp" /><media:content medium="image" url="https://keaukraine.site/assets/blog/voxel-airplanes/title.webp" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Floating Islands WebGL demo</title><link href="https://keaukraine.site/floating-islands/" rel="alternate" type="text/html" title="Floating Islands WebGL demo" /><published>2022-10-29T11:00:00+03:00</published><updated>2022-10-29T11:00:00+03:00</updated><id>https://keaukraine.site/floating-islands</id><content type="html" xml:base="https://keaukraine.site/floating-islands/"><![CDATA[<h2 id="idea-and-inspiration">Idea and inspiration</h2>

<p>Idea for this <a href="https://keaukraine.github.io/webgl-rock-pillars/index.html">3D scene</a> comes from the magnificent Zhangjiajie National Forest Park in China. You can clearly see where the inspiration originates from - this majestic real life location also has grassy rock pillars covered in dense clouds and when observed from above, their bottom parts are disappearing in dense fog. To improve the magical feeling to the scene, we decided to make some rocks float mid-air. This additional inspiration comes from the map Gateway to Na Pali from my favorite game Unreal. This location has floating rocks in the distant background and is placed inside a huge floating rock itself. We decided to create a scene which would have a lot of similar floating islands densely packed in one area.</p>

<h2 id="ai-generated-concept-art">AI generated concept art</h2>

<p>We also tried to use Stable Diffusion to generate some concept art for the scene in hope AI will hallucinate some unusual points of view or might incorporate some details which we may find fitting the scene. However, all images appeared to be virtually identical. AI created a series of rather dull images - the same rocks in the same fog without any additional details. We only used a couple of them as a reference for vivid sunrise color palettes which could as well be picked from any other sources.</p>

<h2 id="scene-composition">Scene composition</h2>
<p>To create this scene we’ve used and reused some stylized hand-painted 3D models from packs we’ve purchased quite some ago for our previous projects. No new assets were purchased for this project. Scene uses just 3 rock models, some generic ferns and trees, and birds flying in the sky.</p>

<p>Render order is the following: depth pre-pass, rocks, birds, sky, soft cloud particles.</p>

<center> <img alt="Scene rendering order" src="/assets/blog/floating-islands/rendering.gif" /> </center>

<h2 id="camera-path-and-objects-placement">Camera path and objects placement</h2>

<p>To make an impression of an endless random scene there were 2 options: a true random scene and a looped generated path. The first option requires object placement on the fly in front of the camera which means these positions have to be transferred to GPU dynamically. So a better option is to generate a static looped path once and draw objects along it as the camera moves.
You can find a function to generate a base spline in the <code class="language-plaintext highlighter-rouge">positionOnSpline</code> function in <a href="https://github.com/keaukraine/webgl-rock-pillars/blob/main/src/ObjectsPlacement.ts">ObjectPlacement.ts</a> file. It creates a circular looped path for the camera with oscillating radius. A couple of harmonics are applied to randomize the circle radius so it appears random but is still perfectly looped. Then, all objects are placed around this path - trees are under the camera, rocks above and to the sides.</p>

<p>Object positions and rotations are stored in typed Float32Array in the form of textures on GPU.
<code class="language-plaintext highlighter-rouge">drawInstances</code> method in <a href="https://github.com/keaukraine/webgl-rock-pillars/blob/main/src/Renderer.ts">Renderer.ts</a> renders objects visible only from a certain point on the spline. Because of scene simplicity there’s no need to use frustum culling - objects are drawn at a certain distance in front and back of the camera. This visibility distance is slightly larger than fog start distance so the new objects appear fully covered in fog and don’t pop. Instances are ordered front-to-back so when drawn they make use of Z-buffer culling.</p>

<p>Only rocks and trees models are placed this way alongside the camera path. Bird flocks use hand-picked linear paths to cover the whole area of the scene with minimal paths.</p>

<p>Here is camera path visualized with only a subset of objects rendered in its vicinity:</p>

<center> <img alt="Objects culling" src="/assets/blog/floating-islands/path.gif" /> </center>

<h2 id="fog-cubemaps">Fog cubemaps</h2>

<p>Initial implementation used fog of uniform color which looked rather bland. To add more color variation from different directions (like sun halo) we decided to use cubemaps for fog. This allows great flexibility for the artist (my brother) - he can completely change the look of the whole scene by creating a cubemap and tweaking a couple of colors in the scene preset. Cubemaps were initially created as equirectangular images since it is easy to paint them. Then we used an <a href="https://jaxry.github.io/panorama-to-cubemap/">online tool</a> to convert an equirectangular source image to 6 cubemap faces, and a simple ImageMagick script to fix their rotations to suit our coordinate system (Z-up).</p>

<p>You can find cubemap fog implementation in static constants from <a href="https://github.com/keaukraine/webgl-rock-pillars/blob/main/src/shaders/FogShader.ts">FogShader.ts</a>. All fog shaders use them. Final fog coefficient used by vertex shader for color mixing also contains the height fog coefficient.</p>

<p>In the web demo UI you can adjust different fog parameters - start distance, transition distance, height offset and multiplier. Also, changing the scene time of day is done by using a different cubemap texture and a couple of colors for each preset.</p>

<p>Interestingly, after implementing this I’ve found out that fog cubemaps are widely used in Source engine, and of course this technique has also been incorporated in some indie games too.</p>

<h2 id="grass-on-rocks">Grass on rocks</h2>

<p>To make rocks less dull we also apply grass texture on top of them. This technique is commonly used to simulate surfaces covered by snow or soaked with the rain. Grass texture is mixed with rock texture based on vertex normal. You can play around with the <code class="language-plaintext highlighter-rouge">grassAmount</code> slider in UI to see how it affects grass spread on rocks.</p>

<p>Source code of shader which applies grass texture on top of rocks is in <a href="https://github.com/keaukraine/webgl-rock-pillars/blob/main/src/shaders/FogVertexLitGrassShader.ts">FogVertexLitGrassShader.ts</a>.</p>

<h2 id="soft-clouds-shader">Soft clouds shader</h2>

<p>Clouds are not instanced but are drawn one by one because transformation matrices for these objects have to be adjusted to always face the camera. There’s not that many of them so this doesn’t add too many draw calls. Actually, if GPU state is not changed (no uniforms are updated, blending mode switched, etc.) then even non-instanced rendering is quite fast on modern mobile and desktop GPUs. For test purposes we had a quick and dirty visualization of a camera spline with non-instanced rendering of 5000 small spheres and it caused no slowdowns.</p>

<p>There’s also one minor trick in this shader. As the camera flies through clouds, they can be abruptly culled by a near clipping plane. To prevent this, there is a simple <code class="language-plaintext highlighter-rouge">smoothstep</code> fading applied to fade right in front of the camera. You can find code in <a href="https://github.com/keaukraine/webgl-rock-pillars/blob/main/src/shaders/FogSpriteShader.ts">FogSpriteShader</a>.</p>

<h2 id="result">Result</h2>
<p>You can see a live web demo <a href="https://keaukraine.github.io/webgl-rock-pillars/index.html">here</a> and if you like to have it on the home screen of your Android phone you can get a <a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.floatingislands">live wallpaper app on Google Play</a>.</p>

<p><a href="https://github.com/keaukraine/webgl-rock-pillars">Source code</a> is available on GitHub, feel free to play around with it.</p>

<p>As always, the web demo is heavily optimized for the smallest data size and Android app for the best efficiency and performance. Web version uses WebP for textures which offer better compression than PNG, better image quality than JPEG and support alpha channel even with lossy compression. Mipmaps are generated for all textures. The total gzipped size of the web demo is just 374 kB so you can copy it to a floppy disk to show to friends who have no Internet :)</p>]]></content><author><name>keaukraine</name></author><category term="blog" /><category term="webgl" /><category term="3d" /><category term="islands" /><category term="animation" /><category term="instancing" /><summary type="html"><![CDATA[Floating Islands WebGL demo]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://keaukraine.site/assets/blog/floating-islands/title.webp" /><media:content medium="image" url="https://keaukraine.site/assets/blog/floating-islands/title.webp" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Efficient WebGL vegetation rendering</title><link href="https://keaukraine.site/efficient-vegetation-rendering/" rel="alternate" type="text/html" title="Efficient WebGL vegetation rendering" /><published>2022-08-08T10:00:00+03:00</published><updated>2022-08-08T10:00:00+03:00</updated><id>https://keaukraine.site/efficient-vegetation-rendering</id><content type="html" xml:base="https://keaukraine.site/efficient-vegetation-rendering/"><![CDATA[<p>In this article I’ll explain the rendering pipeline of Spring Flowers WebGL Demo and its corresponding Android app. Also I will describe what problems we’ve encountered and what solutions we used to overcome during development and testing of the <a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.flowers">Android live wallpaper app</a>.</p>

<p>You can check out the <a href="https://keaukraine.github.io/webgl-flowers/index.html">live demo page</a> and play with various configuration options in the top right controls section.</p>

<h2 id="implementation">Implementation</h2>

<p>Scene is composed from the following main objects: sky, ground, and 3 types of grass: flowers (each containing individual instances of leaves, petals and stems), small round grass and tall animated grass. To make the scene more alive, a sphere for glare and moving ants+butterflies are also drawn.</p>

<center> <img alt="Scene rendering order" src="/assets/blog/vegetation/scene.gif" /> </center>

<p>Draw order is the following: first objects closer to the camera and larger ones to use z-buffer efficiently, then objects closer to ground, then sky and ground. Ground plane has transparent edges which blur with the background sky sphere so it is drawn the last after the sky.</p>

<p>For sun glare effect we draw a sphere object with a specular highlight. It is drawn last over the whole geometry without depth test. This way everything is slightly over-brightened when viewed against the sun, and glare is less prominent when the camera is not facing the sun.</p>

<h3 id="tiled-culling-of-instances">Tiled culling of instances</h3>

<p>Grass and flowers are drawn using similar shaders with a common part in them being instanced positioning. These instanced objects get their transformations from the FP32 RGB texture.</p>

<p>All instanced shaders use the same include <a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/shaders/InstancedTexturePositionsShader.ts#L12">COMMON_TRANSFORMS</a> which uses 2 samples from texture to retrieve translation in XY plane, scale and rotation. Please note that rotation is stored in the form of sine and cosine of an angle to save on rotation math.</p>

<p>Original transformation is stored in arrays <code class="language-plaintext highlighter-rouge">FLOWERS</code>, <code class="language-plaintext highlighter-rouge">GRASS1</code> and <code class="language-plaintext highlighter-rouge">GRASS2</code> declared in <a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/GrassPositions.ts">GrassPositions.ts</a>. However, these arrays have coordinates for all instances of objects, they are not split into tiles yet. For this, they are processed using <a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/GrassPositions.ts#L43">sortInstancesByTiles</a> function. It creates a new FP32 array with rearranged positions+rotations and creates an array of tiles which specify instances count and start offset in the final texture used by the shader. This ready information is stored in the <a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/Utils.ts#L19">TiledInstances</a> object. This function allows to split all instances spreaded across square ground area into arbitrary N^2 tiles. In both web demo and Android app all instances are split into reasonable 4x4 tiles area with 16 tiles. Tiles have a small padding which allows them to slightly overlap. This padding has a size of grass model so the instances placed at the very edge of tile won’t disappear abruptly when tile gets culled.</p>

<p>To visualize how instances are split into tiles, let’s imagine a sample area with 20 randomly placed objects which we would like to cull per tile. Let’s rearrange these instances into 2x2 grid with 4 total tiles:</p>

<center> <img alt="Sample scene" src="/assets/blog/vegetation/tiles.png" /> </center>

<p>Here is the structure of texture containing these objects, showing tiling and data stored in each component per instance:</p>

<center> <img alt="Texture format" src="/assets/blog/vegetation/texture-data.png" /> </center>

<p>Here instances for tile 0 have offset=0 and count=5, for tile 1 offset=5 and count=4, and so on.</p>

<p>This structure allows us to draw all 20 instances in 4 draw calls and cull them in batches per tile without updating any data on the GPU.</p>

<p>Culling of tiles bounding boxes on CPU is also relatively cheap. It is done on each frame and you can see how many tiles and individual instances are currently rendered in “Stats” section of controls.</p>

<p>And reducing grass density to scale performance is also really easy with this approach because inside each tile instances are random. All we have to do is to proportionally reduce number of instances per draw call (you can use density slider in controls to test it):</p>

<center> <img alt="Changing grass density" src="/assets/blog/vegetation/density.gif" /> </center>

<p>There are different instanced shaders for drawing different objects. Small grass and flower petals are the simplest ones - they use a simple diffuse colored shading. Dandelion stems and leaves apply specular highlights, and the <a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/shaders/InstancedTexturePositionsGrassAnimatedShader.ts">shader</a> used to render tall grass blades also uses vertex animation for wind simulation.</p>

<h3 id="random-ants">Random ants</h3>
<center> <img alt="Ants on the ground" src="/assets/blog/vegetation/ants.webp" /> </center>

<p>To make the ground more alive, we draw some ants on it. They are also instanced - total 68 ants are rendered in 2 draw calls.</p>

<p>They move in circles with a random radius and center. They are drawn in two draw calls for clockwise and counterclockwise rotation within these circular paths. You can examine the math for positioning vertices in the shader’s <a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/shaders/AntsShader.ts">source code</a>.</p>

<p>It will be almost impossible to notice any animation on these pretty small and very fast moving objects so we don’t animate them at all.</p>

<h3 id="butterflies">Butterflies</h3>

<p>No summer can be imagined without butterflies so we added them too. They are positioned similar to ants but a sine wave is added to their height. Each instance gets its color from texture atlas with 4 different variants.</p>

<p>On the contrary to ants, butterflies must have animation. To animate them we don’t use any kind of baked animation. Actually, a really cheap trick is used to animate it in <a href="https://github.com/keaukraine/webgl-flowers/blob/master/src/shaders/ButterflyShader.ts#L59">vertex shader</a>. It is animated by simply moving wing tips up and down. Wing tips are determined as vertices with high absolute values of X coordinate. Of course it is not a correct circular movement of wings around the butterfly’s body - wings elongate noticeably with higher movement amplitude, but this simplifies shader math and looks convincing enough in motion, as can be seen on this image:</p>

<center> <img alt="Butterfly animation" src="/assets/blog/vegetation/butterfly.gif" /> </center>

<h2 id="android-specific-optimizations">Android-specific optimizations</h2>

<p>As always with our web demos, they are optimized for the smallest possible network data size and the fastest loading times. So it doesn’t use compressed or supercompressed textures. The <a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.flowers">Android app</a> is optimized for power efficiency so it uses compressed textures (ASTC or ETC2) depending on hardware capabilities.</p>

<p>To further improve efficiency it uses variable rate shading (VRS) on <a href="https://opengles.gpuinfo.org/listreports.php?extension=GL_QCOM_shading_rate">supported hardware</a>.</p>

<p>And when the app detects that the device is in energy saving mode (triggered manually or when battery is low) it will reduce FPS and will use simplified grass shader without animation to significantly reduce power draw. Additionally, in this mode the app will use more aggressive VRS.</p>

<p>We’ve encountered general performance issues with rendering lots of instanced geometries on low-end Android phones - the bottleneck appeared to be vertex shaders. So when the app detects it is running on a low-end device it will render grass with slightly reduced grass density and without wind animation.</p>

<h2 id="failed-implementations">Failed implementations</h2>

<p>Before implementing this tiled rendering pipeline a couple of more naive less performant implementations have been tried and tested.</p>

<h3 id="fully-randomized">Fully randomized</h3>

<p>The very first version of grass was with fully randomized positioning of instances. It didn’t use texture to store pre-calculated random transformation for instances but calculated them in shader instead. This introduced more complexity  in vertex shaders (random and noise functions have quite some math in them). Additionally, random values were different on different GPUs which made it impossible to finely hand-pick camera paths. Take a look at this photo where we tested this version on different devices - while the code is identical, placement of instances is different:</p>

<center> <img alt="Random positions" src="/assets/blog/vegetation/fully-randomized.jpg" /> </center>

<p>This version had no visibility calculation or frustum culling which also affected performance.</p>

<h3 id="per-instance-culling">Per-instance culling</h3>

<p>The first naive implementation of culling has been implemented per-instance. This version already used texture to reduce vertex shader math but each instance has been tested for visibility and then texture has been updated with only visible instances.</p>

<p>This worked just fine on PC and on high-end Android devices but proved to be way too slow on low-end phones - CPU took about 10 ms to calculate visibility of instances. Updating the texture on the fly with <code class="language-plaintext highlighter-rouge">glTexSubImage2D()</code> also was unacceptably slow - it took ~20 ms. For comparison, tiled culling takes ~1 ms of CPU time on low-end devices.</p>

<h2 id="final-result">Final result</h2>

<p>Total size of the demo page is just 741 kB so you can carry it on a floppy disk.</p>

<p>You can play around with different parameters on the <a href="https://keaukraine.github.io/webgl-flowers/index.html">live demo page</a>. You can alter time of day, density of grass and other settings. Double click toggles free camera mode with WASD camera movement and rotation with right mouse button down (similar to navigation in viewport in Unreal Engine).</p>

<p>And as usual you can get <a href="https://github.com/keaukraine/webgl-flowers">source code</a> which is licensed under MIT license, so feel free to play around with it.</p>]]></content><author><name>keaukraine</name></author><category term="blog" /><category term="webgl" /><category term="3d" /><category term="vegetation" /><category term="culling" /><category term="instancing" /><category term="tiling" /><summary type="html"><![CDATA[Efficient WebGL vegetation rendering]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://keaukraine.site/assets/blog/vegetation/title.webp" /><media:content medium="image" url="https://keaukraine.site/assets/blog/vegetation/title.webp" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Variable Rate Shading on Adreno GPUs</title><link href="https://keaukraine.site/variable-rate-shading-on-adreno/" rel="alternate" type="text/html" title="Variable Rate Shading on Adreno GPUs" /><published>2022-07-03T19:00:00+03:00</published><updated>2022-07-03T19:00:00+03:00</updated><id>https://keaukraine.site/variable-rate-shading-on-adreno</id><content type="html" xml:base="https://keaukraine.site/variable-rate-shading-on-adreno/"><![CDATA[<p>“With high screen DPI doesn’t come high GPU fillrate” — that’s the main problem of GPUs nowadays. Modern consoles struggle to sustain stable 30, let alone 60 fps on large 4k screens. The common technique to increase FPS is rendering at lower resolution with fancy upscaling techniques like DLSS, FSR, PSSR and XeSS. But modern VR-capable hardware has to be able to target both very high frame rates and high image quality, and upscaling does show its limitations here — depending on implementation the image will be either blurry, too sharpened or will introduce ghosting artifacts. Variable rate shading (VRS) is a temporally stable approach of improving performance with (if applied correctly) virtually unnoticeable quality reduction.</p>

<p>Modern mobile Adreno GPUs by Qualcomm support Variable Rate Shading, and phones with these GPUs have been available since autumn 2021. Because our live wallpapers have to be power-efficient, we have got a test device with Adreno 642L to implement this feature in our apps.</p>

<h2 id="what-is-variable-rate-shading">What is Variable Rate Shading</h2>

<p>The idea behind VRS is to rasterize a single fragment and then interpolate color between adjacent pixels on screen.</p>

<p>A good explanation of how VRS is implemented on Adreno GPUs can be found in the official <a href="https://developer.qualcomm.com/blog/variable-rate-shading-has-arrived-mobile-impressive-results">Qualcomm Developer blog here</a>. You can understand how simple it is by looking at this image from aforementioned blog post:</p>

<center> <img alt="VRS - image by Qualcomm" src="/assets/blog/adreno-vrs/vrs.webp" /> </center>

<p>VRS is better than generic downsample of the whole frame because:</p>

<ol>
  <li>It preserves geometry edges (except cases when the shape is determined by discarding fragments).</li>
  <li>Can be adjusted per each draw call — one object can be rendered at full detail while the other one will have reduced quality.</li>
  <li>Can be applied dynamically to keep target FPS by gradually reducing image quality.</li>
</ol>

<h2 id="implementation">Implementation</h2>

<p>On Snapdragon SoCs it is implemented with <a href="https://www.khronos.org/registry/OpenGL/extensions/QCOM/QCOM_shading_rate.txt">QCOM_shading_rate</a> extension. Adreno GPUs support blocks of 1x1, 1x2, 2x1, 2x2, 4x2, and 4x4 pixels. Please note that some useful dimensions like 2x4 or 4x1 are not available because they are not supported by hardware.</p>

<p>To apply VRS to certain objects you simply make a call to <code class="language-plaintext highlighter-rouge">glShadingRateQCOM</code> with desired rate before the corresponding draw calls.</p>

<p>To disable VRS for geometries which should preserve details and be rendered at native shading rate, simply call <code class="language-plaintext highlighter-rouge">glShadingRateQCOM</code> with 1x1 block size.</p>

<p>One of the first apps we’ve added VRS support to is <a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaperbonsai">Bonsai Live Wallpaper</a>. This is a good example because it has 3 very different types of geometries ranging from perfect candidates for VRS optimizations to the very unsuitable ones.</p>

<p>Let’s take a look at a typical scene from the app and how different parts of image can benefit from reduced shading rate:</p>

<center> <img alt="Bonsai 3D live wallpaper screenshot" src="/assets/blog/adreno-vrs/bonsai-screenshot.webp" /> </center>

<p>The best type of geometry to be optimized by VRS is the one which is blurred and has small color variation between fragments. So, for sky background we apply a quite heavy 4x2 VRS which still introduces virtually no quality degradation, especially with constantly moving cameras.</p>

<p>On the opposite side of the scales is leaves geometry. On the screenshot below we applied 4x4 VRS to the whole scene to showcase the issue with alpha-testing. Please note that branches, while also using the same heavy 4x4 reduction in this example, have the same smooth and anti-aliased edges, clearly showing a benefit of VRS over traditional upscaling.</p>

<center> <img alt="VRS distortions on geometries with discarded fragments" src="/assets/blog/adreno-vrs/distortions.webp" /> </center>

<p>Needless to say, VRS is clearly not suitable for geometries with discarded fragments.</p>

<p>Also because VRS is applied in screen-space, it introduces significant distortions to transparent dust particles. Their size is comparable to VRS block and they start flickering during movement. I’ve noticed a somewhat similar rendering technique used in the COD:MW game on PC when enabling half-resolution particles — sparks and other small particles flicker way too much and look very blocky.</p>

<p>And somewhere between these two geometries lies the ground plane. This is where we apply 2x1 rate reduction. This results in OK image quality because there’s a larger color difference between adjacent vertical pixels compared to the horizontal ones.</p>

<p>Where VRS definitely shines is when it is applied to geometries with very little color difference between adjacent fragments, and Bonsai wallpaper has a stylized silhouette mode where fragments use literally single color:</p>

<center> <img alt="Bonsai live wallpaper, silhouette mode" src="/assets/blog/adreno-vrs/silhouette.webp" /> </center>

<p>Here we have 3 types of shaders:</p>

<ol>
  <li>Alpha-testing for leaves. We already know that we should not apply VRS to these geometries.</li>
  <li>Solid black silhouette and ground. The heaviest 4x4 VRS introduces literally zero quality degradation.</li>
  <li>For the sky gradient we use 2x1 blocks. Technically it would be perfect to have a 4x1 or even 16x1 blocks because gradient changes vertically and adjacent horizontal fragments have identical color but Adreno hardware supports only 2x1 ones.</li>
</ol>

<p>All of these applied to the scene results in identical rendering (screenshots comparison found 0 pixels difference) and 1.5x of shading speed improvement.</p>

<h2 id="dynamic-quality">Dynamic quality</h2>

<p>All our wallpapers use some ways of reducing GPU load when the battery is low. Usually this is done by limiting FPS and omitting a couple of effects.</p>

<p>For more efficient power usage we apply stronger VRS to certain objects in low battery mode. Tree trunks are shaded with 2x1 blocks, sky and transparent effects (light shafts and vignette) are shaded with 4x4 instead of 4x2 or 2x2 blocks. This reduction of quality is still almost unnoticeable but reduces GPU load by additional 3%.</p>

<h2 id="performance-gains-vs-quality-tradeoff">Performance gains vs quality tradeoff</h2>

<p>You will be hard-pressed to find any difference between original and VRS-optimized rendering — color deviation is negligible, and blocky artifacts are really hard to spot. Only ImageMagick was able to show different pixels:</p>

<center> <a href="/assets/blog/adreno-vrs/diff.webp" target="_blank"><img alt="Image quality comparison" src="/assets/blog/adreno-vrs/diff.webp" /></a> </center>

<p>Both VRS-enabled and regular rendering pipelines result in steady 120 FPS on our test device (Galaxy Samsung A52s). So we’ve run a Snapdragon Profiler to analyze performance and efficiency of the optimized build. Here are the numbers:</p>

<p>Bonsai 3D Live Wallpaper, regular mode:</p>

<center> <img alt="VRS performance table" src="/assets/blog/adreno-vrs/perf-regular.webp" /> </center>

<p>Bonsai 3D Live Wallpaper, battery saving mode.</p>

<center> <img alt="VRS performance table" src="/assets/blog/adreno-vrs/perf-battery-saving.webp" /> </center>

<p>Bonsai 3D Live Wallpaper, silhouette mode.</p>

<center> <img alt="VRS performance table" src="/assets/blog/adreno-vrs/perf-silhouette.webp" /> </center>

<p>In the silhouette scene we don’t use different VRS blocks for regular and power saving modes because it already uses maximum block size and still renders the image identical to non-VRS one.</p>

<hr />

<p>Long story short, we’ve improved rendering efficiency by approximately 30% with little to (literally) none image quality reduction.</p>]]></content><author><name>keaukraine</name></author><category term="blog" /><category term="android" /><category term="opengl" /><category term="vrs" /><category term="optimization" /><category term="performance" /><summary type="html"><![CDATA[Variable Rate Shading on Adreno GPUs]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://keaukraine.site/assets/blog/adreno-vrs/title.webp" /><media:content medium="image" url="https://keaukraine.site/assets/blog/adreno-vrs/title.webp" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">WebGL Grim Reaper demo</title><link href="https://keaukraine.site/grim-reaper/" rel="alternate" type="text/html" title="WebGL Grim Reaper demo" /><published>2022-02-04T17:00:00+02:00</published><updated>2022-02-04T17:00:00+02:00</updated><id>https://keaukraine.site/grim-reaper</id><content type="html" xml:base="https://keaukraine.site/grim-reaper/"><![CDATA[<p>A couple weeks before Halloween 2021 I browsed Sketchfab and encountered a cool 3D model of <a href="https://sketchfab.com/3d-models/3drt-grim-reaper-d8c7ec2429b643958603937bed6533e8">Grim Reaper</a> by 3DRT. It has a reasonable polycount, a set of different colours and smooth animations. So the decision was made to create a Halloween-themed live wallpaper with this model. However, I was not able to finish it before Halloween because I gradually added some new effects and features which took quite some time to implement and then tweak.</p>

<p>You can find a live web demo <a href="https://keaukraine.github.io/webgl-reaper/index.html">here</a>, and for persons sensitive to flickering lights a version without lightning is <a href="https://keaukraine.github.io/webgl-reaper/index.html#nolights">here</a>. You can interact with it by clicking the mouse on screen — this will change animation. Also you can enter free-camera mode which uses WASD navigation by pressing the Enter key.</p>

<p>As usual, source code is <a href="https://github.com/keaukraine/webgl-reaper">available on Github</a>.</p>

<p>And of course you can get an <a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.reaper">Android live wallpaper app</a>.</p>

<hr />

<h2 id="scene-composition">Scene Composition</h2>
<p>Scene is pretty simple so it doesn’t require any sorting of objects — carefully chosen hardcoded render order achieves minimal overdraw:</p>

<center> <img alt="Scene rendering stages" src="/assets/blog/grim-reaper/rendering.webp" /> </center>

<p>First, opaque (cloth is alpha-masked so it is also opaque) geometries are rendered. These animated objects use vertex animation with data stored in FP16 textures, so WebGL 2 is required for the demo.
After rendering opaque geometries, writing to depth is disabled with <code class="language-plaintext highlighter-rouge">glDepthMask(false)</code> and then transparent effects — smoke, dust and ghosts are drawn over them with blending. Sky is also drawn at this stage. Because it is the most distant object, it doesn’t have to contribute to depth — it is basically treated as a far clipping plane.</p>

<h2 id="effects">Effects</h2>
<p>That’s where most of the time was spent — thinking of, creating, tweaking and rejecting various effects for a really simple scene with literally a single character in it.</p>

<p>Every time I had an idea on how to improve a look I added it to the Trello board. Then I had some time to think about it — how will it fit the scene, how to implement it, etc. So here is a breakdown of all used effects.</p>

<p>First, soft particles are added to the reaper. Half of them rise upwards, half of them sink down from roughly the centre of the reaper model which fluctuates a little depending on animation. Of course to get the best visual appearance soft particles are used, hence the depth pre-pass. You can read about implementation of soft particles in one of my previous articles.</p>

<p>Then some flickering dust is rendered. You may notice that its brightness is synchronized with lightning strikes — usually dust slowly fades in and out but at lightning strikes it is more visible.</p>

<p>As a final touch, a rather heavy vignette is applied. This effect blends nicely with the gloomy atmosphere, helps to draw attention to the centre of screen and to visually conceal the bland void in the corners of the screen.</p>

<p>There are still a couple of effect ideas noted in my Trello board but I think that adding them will only clutter the scene without adding any more noticeable eye candies.</p>

<p>##Sky shader
Sky is used to fill in the void around the main character. To add some dynamics and movement to these empty parts of the scene it is rendered with a shader which applies simple distortion and lightning to static clouds texture.</p>

<p>Let’s analyse the <a href="https://github.com/keaukraine/webgl-reaper/blob/main/src/shaders/SkyShader.ts">shader code</a>. It combines three simple effects to create a dynamic sky:</p>

<p>It starts with applying colour to rather bland-looking greyscale base sky texture:</p>
<center> <img alt="Colorized sky" src="/assets/blog/grim-reaper/colorize.webp" /> </center>

<p>Then, waves from a small distortion texture are applied (a similar but more pronounced effect can be used for water ripples). Effect is subtle but does noticeably improve overall look:</p>
<center>
    <video controls="" autoplay="" loop="">
        <source src="/assets/blog/grim-reaper/sky-no-lightning.webm" type="video/webm" />
        Download video <a href="/assets/blog/grim-reaper/sky-no-lightning.webm">MP4</a>
    </video>
</center>

<p>And the final touch is lightning. To recreate somewhat realistic looking lighting which cannot get through dense clouds but shines through clear areas, brightness is increased exponentially — darker parts will get very little increase in brightness while bright areas will be highlighted. Final result with all effects combined looks like this:</p>

<center>
    <video controls="" autoplay="" loop="">
        <source src="/assets/blog/grim-reaper/sky.webm" type="video/webm" />
        Download video <a href="/assets/blog/grim-reaper/sky.webm">MP4</a>
    </video>
</center>

<p>Timer for the lightning strikes is a periodic function of several sine waves combined, clamped to range [0…2]. I’ve used a really handy <a href="https://www.desmos.com/calculator">Desmos graphing calculator</a> to visualize and tweak coefficients for this function — you can clearly see that the “spikes” of positive values create short periodic randomized bursts:</p>

<center> <img alt="Lightning intensity graph" src="/assets/blog/grim-reaper/graph.webp" /> </center>

<p>Additionally, the sky sphere slowly rotates to make the background less static.</p>

<h2 id="ghosts-shader">Ghosts shader</h2>
<p>Ghostly trails floating around the grim reaper are inspired by this Unreal Engine 4 Niagara tutorial — <a href="https://www.artstation.com/artwork/ba4mNn">https://www.artstation.com/artwork/ba4mNn</a>.</p>

<p>Initial idea was to use a geometry in a shape of cut-out from the cylinder side and rotate it around the centre of the reaper model. However, my brother created <a href="https://github.com/keaukraine/webgl-reaper/blob/main/src/shaders/BendShader.ts">a shader</a> for a more flexible approach to use a single geometry which can be rotated at arbitrary radius and stretched to arbitrary length.</p>

<p>To achieve this, vertex shader changes the geometry of the original mesh. It modifies X and Y coordinates of the input model, bending them around the circle of given radius. Z coordinate is not getting additional transformations. It is responsible for scaling the final effect vertically. (World space is Z-up). Shader is tailored to work with a specific model — a tessellated sheet in the XZ plane (all Y coordinates are zero):</p>

<center> <img alt="Ghost geometry" src="/assets/blog/grim-reaper/ghost-geometry.webp" /> </center>

<p>Later, geometry was optimized to tightly fit our sprite texture in order to reduce overdraw:</p>

<center> <img alt="Ghost geometry optimized" src="/assets/blog/grim-reaper/ghost-geometry-optimized.webp" /> </center>

<p>Based on the math of chord length, the X and Y coordinates of bent model are:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = R * sin(theta);
y = R * cos(theta);
</code></pre></div></div>
<p>where <code class="language-plaintext highlighter-rouge">theta = rm_Vertex.x / R</code>, and <code class="language-plaintext highlighter-rouge">R</code> is a bend radius. However, theta is calculated differently in the shader:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>float theta = rm_Vertex.x * lengthToRadius;
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">lengthToRadius</code> value is a uniform, but it is not just a reciprocal of <code class="language-plaintext highlighter-rouge">R</code> — we can pass values greater than <code class="language-plaintext highlighter-rouge">1/R</code> to get effect length scaled (because it essentially is a pre-multiplication of <code class="language-plaintext highlighter-rouge">rm_Vertex.x</code>).
This minor change is done in order to eliminate redundant uniform-only math in the shader. Preliminary division of length by radius is done on the CPU and this result is passed into the shader via <code class="language-plaintext highlighter-rouge">lengthToRadius</code> uniform.
I’ve tried to improve this effect by applying displacement distortion in fragment shader but it appears to be virtually unnoticeable in motion. So we kept the original simpler version with static texture, which is also cheaper for the GPU.</p>

<h2 id="reduced-colours-filter">Reduced colours filter</h2>
<p>Not implemented in the web version, but present in <a href="https://play.google.com/store/apps/details?id=org.androidworks.livewallpaper.reaper">Android app</a> is a reduced colours post-processing. This gritty effect perfectly fits the overall atmosphere and adds a right mood to the scene. It is implemented not as a separate post-processing render pass but is done in the fragment shader, so rendering is still essentially single-pass.</p>

<center> <img alt="Reduced colours filter" src="/assets/blog/grim-reaper/reduced-colors.webp" /> </center>

<p>It is based on code from Q1K3 WebGL game <a href="https://github.com/phoboslab/q1k3">https://github.com/phoboslab/q1k3</a>, and I highly recommend to read a blog post about making of seemingly impossible Q1K3 — <a href="https://phoboslab.org/log/2021/09/q1k3-making-of">https://phoboslab.org/log/2021/09/q1k3-making-of</a>.</p>

<h2 id="textures-compression">Textures compression</h2>

<p>Android live wallpaper targets OpenGL ES 3.0+ and uses efficient ETC2 and ASTC compressed textures. However, WebGL demo is optimized only for the fastest possible loading time. I really hate when some simple WebGL demo takes forever to load its unjustifiably huge resources. Because of this, a decision not to use hardware compressed textures was made. Instead, textures are compressed as lossy WebP. Total size of all assets including HTML/CSS/JS is just 2.7 MB so it loads pretty fast.
Recently, our <a href="https://github.com/keaukraine/webgl-mountains">mountains WebGL demo</a> has also been updated with smaller resources but it is still way larger than the Reaper one — it downloads 10.8 MB of data on initial load.</p>]]></content><author><name>keaukraine</name></author><category term="blog" /><category term="android" /><category term="webgl" /><category term="grim reaper" /><summary type="html"><![CDATA[WebGL Grim Reaper demo]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://keaukraine.site/assets/blog/grim-reaper/title.webp" /><media:content medium="image" url="https://keaukraine.site/assets/blog/grim-reaper/title.webp" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Android apps</title><link href="https://keaukraine.site/android-apps/" rel="alternate" type="text/html" title="Android apps" /><published>1980-01-02T12:00:00+03:00</published><updated>1980-01-02T12:00:00+03:00</updated><id>https://keaukraine.site/android-apps</id><content type="html" xml:base="https://keaukraine.site/android-apps/"><![CDATA[<p>You can get our Android live wallpaper apps from the <a href="https://play.google.com/store/apps/dev?id=6428268730053234821">Google Play store</a>.</p>]]></content><author><name>keaukraine</name></author><category term="projects" /><category term="android" /><category term="apps" /><summary type="html"><![CDATA[Android apps]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://keaukraine.site/assets/blog/metal-msaa/title.webp" /><media:content medium="image" url="https://keaukraine.site/assets/blog/metal-msaa/title.webp" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Mac apps</title><link href="https://keaukraine.site/mac-apps/" rel="alternate" type="text/html" title="Mac apps" /><published>1980-01-01T12:00:00+03:00</published><updated>1980-01-01T12:00:00+03:00</updated><id>https://keaukraine.site/mac-apps</id><content type="html" xml:base="https://keaukraine.site/mac-apps/"><![CDATA[<p>Here you can download our live wallpapers for Mac OS. They work on Mac OS 14 Sonoma and higher
and run on all modern Macs with Apple Silicon chips.</p>

<div class="side-by-side">
    <div class="toleft">
        <img class="image" src="/assets/images/apps/arctic.jpeg" alt="Alt Text" />
        <!-- <figcaption class="caption">Photo by John Doe</figcaption> -->
    </div>

    <div class="toright">
        <h2>Arctic 3D</h2>
        <p>
            Dive into the serene and captivating world of the Arctic with this app! It brings the magic of Arctic waters right to your phone, featuring a fully animated 3D scene with a unique, cartoonish style. Whether you're watching seals and penguins by the icebergs or spotting stars at night, this wallpaper adds a calm, natural beauty to your home screen.
        </p>
        <p>
            <strong>Download: <a href="/assets/downloads/Arctic 3D.dmg">Arctic 3D.dmg</a></strong>
        </p>
        <p>
            <strong>Live web demo: <a target="_blank" href="https://keaukraine.github.io/webgl-arctic/index.html">Arctic 3D</a></strong>
        </p>
    </div>
</div>

<hr />

<div class="side-by-side">
    <div class="toleft">
        <h2>Brutalism 3D</h2>
        <p>
            Transform your home screen with the unique Brutalism 3D Live Wallpaper, featuring an immersive, true 3D scene inspired by iconic brutalist architecture. Dive into a concrete wonderland with sunlight streaming from above, showcasing intricate staircases and multiple floors, all designed to give your device a modern, minimalist look.
        </p>
        <p>
            <strong>Download: <a href="/assets/downloads/Brutalism 3D.dmg">Brutalism 3D.dmg</a></strong>
        </p>
        <p>
            <strong>Live web demo: <a target="_blank" href="https://keaukraine.github.io/webgl-kmp-brutalism/index.html">Brutalism 3D</a></strong>
        </p>
    </div>

    <div class="toright">
        <img class="image" src="/assets/images/apps/brutalism.jpeg" alt="Alt Text" />
    </div>
</div>

<hr />

<div class="side-by-side">
    <div class="toleft">
        <img class="image" src="/assets/images/apps/skyscrapers.jpeg" alt="Alt Text" />
    </div>

    <div class="toright">
        <h2>Skysrapers 3D</h2>
        <p>
            Featuring towering skyscrapers with glowing windows, realistic light reflections, and vibrant neon ad banners, this app brings the urban skyline to life right on your screen.
        </p>
        <p>
            <strong>Download: <a href="/assets/downloads/Skyscrapers 3D.dmg">Skyscrapers 3D.dmg</a></strong>
        </p>
        <p>
            <strong>Live web demo: <a target="_blank" href="https://keaukraine.github.io/webgl-kmp-skyscrapers-internal/index.html">Skyscrapers 3D</a></strong>
        </p>
    </div>
</div>]]></content><author><name>keaukraine</name></author><category term="projects" /><category term="mac" /><category term="apps" /><summary type="html"><![CDATA[Mac apps]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://keaukraine.site/assets/blog/metal-msaa/title.webp" /><media:content medium="image" url="https://keaukraine.site/assets/blog/metal-msaa/title.webp" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>