Official C3 Addon / Runtime?

Mario · 16 mai 2024

I think I asked them about such an method a few years ago and they didn't think it was worth it.

What we need is a method/function that looks like this:

drawVertices(positions, uvs, colors, indices, blendMode)

Where:

positions is an array of numbers (x, y coordinates)
uvs is an array of numbers (u, v coordinates, same length as positions)
colors is an array of numbers (ideally 32-bit rgba "ints", otherwise 4 floats r, g, b, a)
indices is an array of numbers, 3 consecutive numbers represent indices into the other arrays forming a triangle
texture depends on what type C3 uses to represent textures used on the GPU. Would basically map to an image in the atlas used by the skeleton
blendMode depends on what's available in C3

That'd be the most minimal thing we'd need which wouldn't support two-color tinting. We could also conform to whatever internal vertex/indices format is used in C3's batcher of course, as long as we can specify vertex positions, uvs, and colors.

QUAD3D and consorts would be way to inefficient, batching not withstanding. E.g. Spineboy is a few hundred triangles. That'd mean we'd have to create a few hundred instances of QUAD3D, which would have to be batched, etc. pp. That's a lot of work that's not necessary, as we already have all the data in a form that is mostly simple to submit in one go to the GPU.

MikalDev · 20 mai 2024

Thanks, added a request:

Scirra/Construct-feature-requests262

Mario · 20 mai 2024

Cheers!

MikalDev · 22 mai 2024

Ok, no joy on adding this specific feature to SDK V2. If we run into perf issues perhaps they will consider the proposal (no guarantee). So - is it still doable with the current SDK, some thoughts.

So, I'm doing a little benchmarking and you tell me if this falls into ballpark of making Spine usable. I have another addon 3DObject which renders gltf 3D models. As you might expect the gltf data is basically formatted as above with a vertex array buffer, vertex index buffer, UV buffer, etc. I decompose it to the Quad3D format during runtime (e.g. fetching actual vertex data using vertex index, similar for UV buffer and UV index). I then create a degenerate quad (triangle) based on this when I call QUAD3D2() (which does some buffer size housekeeping, but generally assembles the vertices in vertex buffer and UVs into UV buffer). At some point later this goes into jobBatch buffer which then does the glDrawElements (I can't remember at what point copies the buffers/indexes over to GPU, etc.) Yes this is close to the format we want of course, but they don't want to change the renderer to pass data directly to a job batch like this as of now.

Ok, so that being said, I am basically doing something similar to a wrapper for drawVertices about QUAD3D2.

The perf I get on a 2017 iMac with an ancient RX580 (at low rez to ignore fill rate bottleneck) is 50k tris per frame at 60fps for ₂₀% of GPU, ₂₀% CPU. Magically scale to 100% and get 250k tris/frame at 60fps. So let's settle around 20% as usable w/ other CPU logic / GPU requirements. Does 50k tris / frame get us in a usable spot for a few medium complexity spine objects? I would think so?

Some further instrumentation, here's what I see submitted on the gl side when rendering each model (each model is about 5k tris), note the models is made of a few separate meshes with different textures, so it must split the batches up (and the meshes are not ordered to optimize texture reuse, though I should look at that!)

This looks roughly right for 5k triangles made of degenerate quads = 6 * 5k = 30k indices. If using the same texture as we can with some spine models, it should not have to break up the batch as much.

Mario · 22 mai 2024

I suppose it's better than nothing. I'm a little surprised that the addition of the requested method isn't on their roadmap, but it's their software so they'll likely know why they don't want to do that.

We could also try to hack the renderer from the "outside"

MikalDev · 22 mai 2024

Heh, please not 'outside' (at least yet) - we are having a big kerfuffle over in the C3 community with Ashley about that. No need to add fuel to the fire

Some good-ish news though, he came back with this proposal, no vertex color, but drawMesh() with possibilities for future improvements, any changes needed for parameters, etc.? I'm going to look at 3DObject also and see what it might need (beyond vertex color for now.)

"Tell you what - it makes sense to do the emulation inside Construct's own renderer where it's probably a bit faster, and it's fairly easy to write. So the next beta has a drawMesh(posArr, uvArr, indexArr) method, where:

posArr is array of [x, y, z]
uvArr is array of [u, v]
indexArr is array of [i, j, k], so each entry corresponds to rendering a single triangle

For example in the drawingPlugin sample you can replace the call to renderer.quad3(quad, rcTex) with this and it renders the texture twice with a skew:

const posArr = [
	[quad.p1.x + 50, quad.p1.y, 0],
	[quad.p2.x + 50, quad.p2.y, 0],
	[quad.p3.x, quad.p3.y, 0],
	[quad.p4.x, quad.p4.y, 0],

	[quad.p1.x + 250, quad.p1.y, 0],
	[quad.p2.x + 250, quad.p2.y, 0],
	[quad.p3.x + 200, quad.p3.y, 0],
	[quad.p4.x + 200, quad.p4.y, 0]
];

const uvArr = [
	[rcTex.left, rcTex.top],
	[rcTex.right, rcTex.top],
	[rcTex.right, rcTex.bottom],
	[rcTex.left, rcTex.bottom],
	[rcTex.left, rcTex.top],
	[rcTex.right, rcTex.top],
	[rcTex.right, rcTex.bottom],
	[rcTex.left, rcTex.bottom]
];;

const indexArr = [
	[0, 1, 2],
	[0, 2, 3],

	[4, 5, 6],
	[4, 6, 7],
];

renderer.drawMesh(posArr, uvArr, indexArr);

It doesn't do per-vertex colors, since as discussed it would take more work to do that, but it can do the textured mesh drawing, including in 3D; colors could be added in future. As it moves the implementation to Construct's own renderer, any future optimizations can be done as an internal change and addons won't need changing. It also gives us scope to do something like proper mesh submission in the WebGPU renderer only while leaving WebGL with an emulated implementation, to be ultimately phased out. So hopefully that's at least a bit of an improvement on the current situation."

MikalDev · 22 mai 2024

I did ask him to change the indexArr to just be a flat array with 3 indices forming a triangle, instead of array of arrays. I will also mention the same for the others

It's useful for 3DObject for it to support 3D, you will need to 0 stuff the posArr for just 2D.

MikalDev · 22 mai 2024

Hmm, or could ask for drawMesh2D() also.

Nate · 22 mai 2024

Sounds like a decent improvement, though colors are pretty important. Good idea for the flat array.

MikalDev · 23 mai 2024

Nate We will likely get per tri/quad color passed as vertex in C3 webgpu implementation, since it already has it, do we really need per vertex color or isn't it just used for tintColor, so per triangle is ok? (of course we also want darkTintColor for twoTintColorTint).

Or at least that's what we've been using, twoColorTint per attachment/slot.

Is there something else where per vertex color is important for Spine effects, etc.?

Mario · 23 mai 2024

@MikalDev what do they mean with "emulation"? Constructing degenerate quads? That would explain why they only want to do per triangle (quad really) colors. But it's still a strange decision. Their batcher ultimately must expand the per triangle (quad) colors to individual vertex attributes, both in WebGL and WebGPU.

Then I had a look at c3runtime.js, specifically their WebGLShaderProgram class. It turns out that they are passing colors not as vertex attributes but as uniforms. That means you can not batch sprites with different colors. Anytime you have to change a uniform, you have to submit a new rendering command to the GPU. I'm sure they have their reasons, though I can't think of any myself. Haven't checked if there's a WebGPU implementation in there.

It seems the only reason WebGLRenderer can only draw quads is because they statically fill the index buffer via FillIndexBufferData which assumes that the vertex buffer consists of sequences of 4 points, each making up a quad. If instead the index data would also be written "on-the-fly" each frame, e.g. for quads or arbitrary triangle meshes, then the quad restriction would go away. I'm not intimitely familiar with the whole renderer, but from a look at related code, e.g. WebGLBatchJob, it doesn't look like such a change would influence anything else in the renderer.

WebGLRenderer has a handful of QuadXXX methods, which would also need to write data to the index buffer. Then it's just a matter of adding one more method to WebGLRenderer, which allows specifying vertex and index data directly.

I can see how this could be a scary change tho, and ultimately it is their software and on them, how they want to move forward. The "emulation" will likely not be much faster judging by the renderer code, but it will at least reduce the amout of temporary JS objects that need to be created and garbage collected each frame.

That the parameters to the new method are going to be arrays of arrays is also unfortunate. We (and likely other plugins) would have to shuffle our data around and create temporary arrays each frame, which need garbage collection. On the Construct side, they then have to untangle those arrays and convert them to linear typed arrays again.

In any case, it's great that there's some movement on this. If the initial implementation is slow, that's likely not a big deal. Having a stable API for rendering triangle meshes plugin creators can rely on is a good first step. Ideally, the parameters aren't arrays of arrays but linear (typed) arrays instead.

MikalDev · 24 mai 2024

So, some improvements:

The proposed SDK has changed to do linear typed arrays.
The webGPU implementation does pass color, instead of changing a uniform, so it does not break the batch on color change. I think it uses a single value for all vertices and is passed as a colorData buffer with all four values the same, it is then used directly in the fragment shader to multiply the final color output. In the original description for the webGPU, Ashley mentioned that this implementation will not break the batch.
- As you saw, this is not the case in webgl renderer (though I think it could be and have a vague memory that it was similar to the final webGPU implementation at some point, but I haven't verified that and Ashley doesn't remember it.)

Yeah, I realized afterwards typed arrays are probably faster and also would be faster with built-in mesh rendering in future, as it could potentially just copy them directly to the GPU. So I've changed it to work with typed arrays like this:

const posArr = new Float32Array([
	quad.p1.x + 50, quad.p1.y, 0,
	quad.p2.x + 50, quad.p2.y, 0,
	quad.p3.x, quad.p3.y, 0,
	quad.p4.x, quad.p4.y, 0,

	quad.p1.x + 250, quad.p1.y, 0,
	quad.p2.x + 250, quad.p2.y, 0,
	quad.p3.x + 200, quad.p3.y, 0,
	quad.p4.x + 200, quad.p4.y, 0
]);

const uvArr = new Float32Array([
	rcTex.left, rcTex.top,
	rcTex.right, rcTex.top,
	rcTex.right, rcTex.bottom,
	rcTex.left, rcTex.bottom,

	rcTex.left, rcTex.top,
	rcTex.right, rcTex.top,
	rcTex.right, rcTex.bottom,
	rcTex.left, rcTex.bottom
]);

const indexArr = new Uint16Array([
	0, 1, 2,
	0, 2, 3,

	4, 5, 6,
	4, 6, 7,
]);

renderer.drawMesh(posArr, uvArr, indexArr);

MikalDev · 24 mai 2024

Updated text (corrections), not sure why I can't edit the original post anymore...

The webGPU implementation does pass color, instead of changing a uniform, so it does not break the batch on color change. I think it uses a single value for all 4 vertices / quad and is passed as a colorData buffer along with vertex and texcoord buffers, it is then used in the fragment shader to multiply the final color output. In the original description for the webGPU C3 renderer, Ashley mentioned that its implementation will not break the batch on color change.

    queue["writeBuffer"](this._vertexBuffer, 0, this._vertexData.buffer, 0, quads * 12 * SIZEOF_F32);
    queue["writeBuffer"](this._texcoordBuffer, 0, this._texcoordData.buffer, 0, quads * 12 * SIZEOF_F32);
    queue["writeBuffer"](this._colorBuffer, 0, this._colorData.buffer, 0, quads * 4 * SIZEOF_F32);

MikalDev · 24 mai 2024

An older high level description of the new C3 webGPU renderer:

https://www.construct.net/en/blogs/construct-official-blog-1/introducing-constructs-new-1768

MikalDev · 24 mai 2024

what do they mean with "emulation"? Constructing degenerate quads?

Yes, for the first implementation - with the possibility of improvements over time, perhaps adding vertex color, perhaps submitting the arrays directly as a batch, etc.

Nate · 25 mai 2024

The new APIs improve the situation, but I can't understand why they don't just implement triangles and vertex colors. This is very standard stuff and would not take much effort.

Mario · 27 mai 2024

@MikalDev that sounds much better! Let's hope future C3 releases will allow proper triangle mesh rendering.

MikalDev · 28 mai 2024

Nate all I can say is that I agree with you as do others in the dev C3 community, C3 is pretty much a one person shop from the standpoint of the engine and (private) roadmap.

MikalDev · 28 mai 2024

Mario Would per triangle color help (e.g. same color on all vertices), this would at least cover the single color tint case, right?

It should be straightforward to implement in the 'emulation' implementation and map well to the webgpu implementation (and possible future batching).

When are different vertex colors for a single tri used in Spine? In the light renderer (haven't used that with C3 Spine yet in general.)

Nate · 28 mai 2024

Currently Spine uses a single color per texture region, so a single color per triangle would be sufficient for single color tinting. Two color tinting uses 2 colors per texture region.

If it could be hacked to get tinting, that would make Spine rendering in C3 a lot more useful, as not being able to tint at all is very limiting. It still seems unfortunate that it can't just be done right the first time. It's vertex colors at the lowest levels and sane batching would just provide that.