Official C3 Addon / Runtime?

MikalDev · 22 mai 2024

I did ask him to change the indexArr to just be a flat array with 3 indices forming a triangle, instead of array of arrays. I will also mention the same for the others

It's useful for 3DObject for it to support 3D, you will need to 0 stuff the posArr for just 2D.

MikalDev · 22 mai 2024

Hmm, or could ask for drawMesh2D() also.

Nate · 22 mai 2024

Sounds like a decent improvement, though colors are pretty important. Good idea for the flat array.

MikalDev · 23 mai 2024

Nate We will likely get per tri/quad color passed as vertex in C3 webgpu implementation, since it already has it, do we really need per vertex color or isn't it just used for tintColor, so per triangle is ok? (of course we also want darkTintColor for twoTintColorTint).

Or at least that's what we've been using, twoColorTint per attachment/slot.

Is there something else where per vertex color is important for Spine effects, etc.?

Mario · 23 mai 2024

@MikalDev what do they mean with "emulation"? Constructing degenerate quads? That would explain why they only want to do per triangle (quad really) colors. But it's still a strange decision. Their batcher ultimately must expand the per triangle (quad) colors to individual vertex attributes, both in WebGL and WebGPU.

Then I had a look at c3runtime.js, specifically their WebGLShaderProgram class. It turns out that they are passing colors not as vertex attributes but as uniforms. That means you can not batch sprites with different colors. Anytime you have to change a uniform, you have to submit a new rendering command to the GPU. I'm sure they have their reasons, though I can't think of any myself. Haven't checked if there's a WebGPU implementation in there.

It seems the only reason WebGLRenderer can only draw quads is because they statically fill the index buffer via FillIndexBufferData which assumes that the vertex buffer consists of sequences of 4 points, each making up a quad. If instead the index data would also be written "on-the-fly" each frame, e.g. for quads or arbitrary triangle meshes, then the quad restriction would go away. I'm not intimitely familiar with the whole renderer, but from a look at related code, e.g. WebGLBatchJob, it doesn't look like such a change would influence anything else in the renderer.

WebGLRenderer has a handful of QuadXXX methods, which would also need to write data to the index buffer. Then it's just a matter of adding one more method to WebGLRenderer, which allows specifying vertex and index data directly.

I can see how this could be a scary change tho, and ultimately it is their software and on them, how they want to move forward. The "emulation" will likely not be much faster judging by the renderer code, but it will at least reduce the amout of temporary JS objects that need to be created and garbage collected each frame.

That the parameters to the new method are going to be arrays of arrays is also unfortunate. We (and likely other plugins) would have to shuffle our data around and create temporary arrays each frame, which need garbage collection. On the Construct side, they then have to untangle those arrays and convert them to linear typed arrays again.

In any case, it's great that there's some movement on this. If the initial implementation is slow, that's likely not a big deal. Having a stable API for rendering triangle meshes plugin creators can rely on is a good first step. Ideally, the parameters aren't arrays of arrays but linear (typed) arrays instead.

MikalDev · 24 mai 2024

So, some improvements:

The proposed SDK has changed to do linear typed arrays.
The webGPU implementation does pass color, instead of changing a uniform, so it does not break the batch on color change. I think it uses a single value for all vertices and is passed as a colorData buffer with all four values the same, it is then used directly in the fragment shader to multiply the final color output. In the original description for the webGPU, Ashley mentioned that this implementation will not break the batch.
- As you saw, this is not the case in webgl renderer (though I think it could be and have a vague memory that it was similar to the final webGPU implementation at some point, but I haven't verified that and Ashley doesn't remember it.)

Yeah, I realized afterwards typed arrays are probably faster and also would be faster with built-in mesh rendering in future, as it could potentially just copy them directly to the GPU. So I've changed it to work with typed arrays like this:

const posArr = new Float32Array([
	quad.p1.x + 50, quad.p1.y, 0,
	quad.p2.x + 50, quad.p2.y, 0,
	quad.p3.x, quad.p3.y, 0,
	quad.p4.x, quad.p4.y, 0,

	quad.p1.x + 250, quad.p1.y, 0,
	quad.p2.x + 250, quad.p2.y, 0,
	quad.p3.x + 200, quad.p3.y, 0,
	quad.p4.x + 200, quad.p4.y, 0
]);

const uvArr = new Float32Array([
	rcTex.left, rcTex.top,
	rcTex.right, rcTex.top,
	rcTex.right, rcTex.bottom,
	rcTex.left, rcTex.bottom,

	rcTex.left, rcTex.top,
	rcTex.right, rcTex.top,
	rcTex.right, rcTex.bottom,
	rcTex.left, rcTex.bottom
]);

const indexArr = new Uint16Array([
	0, 1, 2,
	0, 2, 3,

	4, 5, 6,
	4, 6, 7,
]);

renderer.drawMesh(posArr, uvArr, indexArr);

MikalDev · 24 mai 2024

Updated text (corrections), not sure why I can't edit the original post anymore...

The webGPU implementation does pass color, instead of changing a uniform, so it does not break the batch on color change. I think it uses a single value for all 4 vertices / quad and is passed as a colorData buffer along with vertex and texcoord buffers, it is then used in the fragment shader to multiply the final color output. In the original description for the webGPU C3 renderer, Ashley mentioned that its implementation will not break the batch on color change.

    queue["writeBuffer"](this._vertexBuffer, 0, this._vertexData.buffer, 0, quads * 12 * SIZEOF_F32);
    queue["writeBuffer"](this._texcoordBuffer, 0, this._texcoordData.buffer, 0, quads * 12 * SIZEOF_F32);
    queue["writeBuffer"](this._colorBuffer, 0, this._colorData.buffer, 0, quads * 4 * SIZEOF_F32);

MikalDev · 24 mai 2024

An older high level description of the new C3 webGPU renderer:

https://www.construct.net/en/blogs/construct-official-blog-1/introducing-constructs-new-1768

MikalDev · 24 mai 2024

what do they mean with "emulation"? Constructing degenerate quads?

Yes, for the first implementation - with the possibility of improvements over time, perhaps adding vertex color, perhaps submitting the arrays directly as a batch, etc.

Nate · 25 mai 2024

The new APIs improve the situation, but I can't understand why they don't just implement triangles and vertex colors. This is very standard stuff and would not take much effort.

Mario · 27 mai 2024

@MikalDev that sounds much better! Let's hope future C3 releases will allow proper triangle mesh rendering.

MikalDev · 28 mai 2024

Nate all I can say is that I agree with you as do others in the dev C3 community, C3 is pretty much a one person shop from the standpoint of the engine and (private) roadmap.

MikalDev · 28 mai 2024

Mario Would per triangle color help (e.g. same color on all vertices), this would at least cover the single color tint case, right?

It should be straightforward to implement in the 'emulation' implementation and map well to the webgpu implementation (and possible future batching).

When are different vertex colors for a single tri used in Spine? In the light renderer (haven't used that with C3 Spine yet in general.)

Nate · 28 mai 2024

Currently Spine uses a single color per texture region, so a single color per triangle would be sufficient for single color tinting. Two color tinting uses 2 colors per texture region.

If it could be hacked to get tinting, that would make Spine rendering in C3 a lot more useful, as not being able to tint at all is very limiting. It still seems unfortunate that it can't just be done right the first time. It's vertex colors at the lowest levels and sane batching would just provide that.

MikalDev · 28 mai 2024

If I was but the dev, I would give you everything you need, wrapped up pretty in a bow

Will work on per quad color (and later two color tint).

MikalDev · 31 mai 2024

I see the actual drawMesh() implementation, it is still quad focused, but now more efficient than calling Quad3D2 repeatedly, instead uses internal formats for batching (webgl version):

DrawMesh(posArr, uvArr, indexArr) {
const vd = this._vertexData;
const td = this._texcoordData;

// Ensure the index array length is a multiple of 3
if (indexArr.length % 3 !== 0) {
    throw new Error("Invalid index buffer length");
}

// Iterate over index array in steps of 3
for (let i = 0, len = indexArr.length; i < len; ) {
    const index0 = indexArr[i++];
    const index1 = indexArr[i++];
    const index2 = indexArr[i++];

    // Calculate position and UV indices
    const pos0 = index0 * 3;
    const pos1 = index1 * 3;
    const pos2 = index2 * 3;
    const uv0 = index0 * 2;
    const uv1 = index1 * 2;
    const uv2 = index2 * 2;

    // Extend quad batch
    this._ExtendQuadBatch();

    // Update vertex and texture coordinate data
    let v = this._vertexPtr;
    let t = this._texPtr;

    // Vertex positions
    vd[v++] = posArr[pos0];
    vd[v++] = posArr[pos0 + 1];
    vd[v++] = posArr[pos0 + 2];
    vd[v++] = posArr[pos1];
    vd[v++] = posArr[pos1 + 1];
    vd[v++] = posArr[pos1 + 2];
    vd[v++] = posArr[pos2];
    vd[v++] = posArr[pos2 + 1];
    vd[v++] = posArr[pos2 + 2];
    vd[v++] = posArr[pos2];
    vd[v++] = posArr[pos2 + 1];
    vd[v++] = posArr[pos2 + 2];

    // UV coordinates
    td[t++] = uvArr[uv0];
    td[t++] = uvArr[uv0 + 1];
    td[t++] = uvArr[uv1];
    td[t++] = uvArr[uv1 + 1];
    td[t++] = uvArr[uv2];
    td[t++] = uvArr[uv2 + 1];
    td[t++] = uvArr[uv2];
    td[t++] = uvArr[uv2 + 1];

    // Update pointers
    this._vertexPtr = v;
    this._texPtr = t;
}

}

webGPU version (you can see how close we are to having color per quad here, one step away from adding in the color array input and updating the color batch buffer)

    DrawMesh(posArr, uvArr, indexArr) {
    const vd = this._vertexData;
    const td = this._texcoordData;
    const cd = this._colorData;
    if (indexArr.length % 3 !== 0)
        throw new Error("invalid index buffer length");
    const currentMultiTextureIndex = this._currentMultiTextureIndex;
    const currentColor = this._currentColor;
    const currentColorR = currentColor.getR();
    const currentColorG = currentColor.getG();
    const currentColorB = currentColor.getB();
    const currentColorA = currentColor.getA();
    for (let i = 0, len = indexArr.length; i < len; ) {
        const index0 = indexArr[i++];
        const index1 = indexArr[i++];
        const index2 = indexArr[i++];
        const pos0 = index0 * 3;
        const pos1 = index1 * 3;
        const pos2 = index2 * 3;
        const uv0 = index0 * 2;
        const uv1 = index1 * 2;
        const uv2 = index2 * 2;
        this._AddQuadToDrawBatch();
        const qPtr = this._quadPtr++;
        let v = qPtr * 12;
        let t = qPtr * 12;
        let c = qPtr * 4;
        vd[v++] = posArr[pos0 + 0];
        vd[v++] = posArr[pos0 + 1];
        vd[v++] = posArr[pos0 + 2];
        vd[v++] = posArr[pos1 + 0];
        vd[v++] = posArr[pos1 + 1];
        vd[v++] = posArr[pos1 + 2];
        vd[v++] = posArr[pos2 + 0];
        vd[v++] = posArr[pos2 + 1];
        vd[v++] = posArr[pos2 + 2];
        vd[v++] = posArr[pos2 + 0];
        vd[v++] = posArr[pos2 + 1];
        vd[v++] = posArr[pos2 + 2];
        td[t++] = uvArr[uv0 + 0];
        td[t++] = uvArr[uv0 + 1];
        td[t++] = currentMultiTextureIndex;
        td[t++] = uvArr[uv1 + 0];
        td[t++] = uvArr[uv1 + 1];
        td[t++] = currentMultiTextureIndex;
        td[t++] = uvArr[uv2 + 0];
        td[t++] = uvArr[uv2 + 1];
        td[t++] = currentMultiTextureIndex;
        td[t++] = uvArr[uv2 + 0];
        td[t++] = uvArr[uv2 + 1];
        td[t++] = currentMultiTextureIndex;
        cd[c++] = currentColorR;
        cd[c++] = currentColorG;
        cd[c++] = currentColorB;
        cd[c++] = currentColorA
    }
}

Mario · 31 mai 2024

Neat, that's not too bad!

MikalDev · 20 juin 2024

In general, we'll see how perf goes. I think for reasonably sized Spine objects with meshes it will do ok.
If we need to go deeper, I have started looking at creating my own C3 command buffers, etc. and it is all pretty doable. There's a batch command system with things like set texture, doQuad, etc.

Let me know if I can help, here's their comments on SDK V2:
https://www.construct.net/en/make-games/manuals/addon-sdk/guide/porting-addon-sdk-v2

There is also work on framework for developing C3 addons, which makes the process much smoother.
C3IDE2 and branches that use typescript. I'll add links after I investigate the newer framework.

Qq3olegka · 12 févr.

any news?

MikalDev · il y a 14 jours

There have been improvements with drawMesh() in terms of perf (direct vertex buffer) and also now direct support for per vertex color. Hopefully this rekindles the possibility of 'official' spine support for C3.

Esoteric - thoughts?

https://www.construct.net/en/make-games/releases/beta/r441 - They even do a Spine call out