QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning
arXiv:2605.16813 (2026)
Explained assuming: you know quad meshes, valence, and point clouds at a working level. The new
deep-learning idea (contrastive/triplet-margin association learning) is explained from scratch below.
The problem: triangle-first generation, then convert, is fragile
Autoregressive mesh generators like MeshAnything V2 are, by construction, triangle-only — every face-token they emit encodes exactly 3 vertices. If a production pipeline actually wants quad-dominant output (the norm in film/game asset pipelines, and what Catmull-Clark subdivision and clean UV layout both want), the standard workaround is to generate triangles, then run a separate triangle-to-quad conversion pass — merging pairs of adjacent triangles into quads by some heuristic. QuadLink's motivating claim is that this two-step approach "typically produces quad meshes with poor topology": a conversion heuristic tuned on clean, human-made triangle meshes doesn't transfer cleanly to the discretization noise and small inconsistencies a generative model's own triangle output actually contains.
The paper's own training pipeline actually needs a triangle-to-quad step too — a "Tri-to-Quad Operator" that converts clean, artist-made ground-truth triangle meshes into quad-dominant supervision targets for training. The paper is explicit that this operator's assumptions "can be violated by triangle-based generative models whose outputs contain discretization errors and perturbations," i.e. the conversion step works fine on clean human-made data (which is all it's ever applied to here) but is exactly the thing that becomes unreliable once you try to apply the same idea downstream, to a model's own noisy output — which is the case for the more common convert-after-generation approach this paper is positioning itself against.
Three-stage pipeline
1. Anchor prediction
From the input point cloud, predict two kinds of "anchor" points at once: candidate mesh vertices, and candidate face centroids (one representative point per quad face, before the face's actual boundary is known). Predicting both together, rather than vertices alone, is what gives the later stage something to associate vertices to.
2. Point-Relation Learning — the new mechanism
Given a soup of predicted vertices and face centroids, which vertices belong to which face? The naive answer — nearest centroid wins — fails for production-style quad meshes, which deliberately have anisotropic (non-uniform, direction-dependent) face density: a long thin quad strip following an edge loop has vertices much closer to a neighboring face's centroid than to their own face's centroid in raw Euclidean distance, so proximity alone gives wrong associations.
QuadLink instead learns a mapping into a separate feature space where correct vertex-to-centroid pairs end up close together and incorrect pairs end up far apart — regardless of how close or far they were in ordinary 3D distance. This is trained with a triplet margin loss: take a (vertex, correct-centroid, wrong-centroid) triplet from training data, and penalize the network whenever the wrong-centroid distance isn't larger than the correct-centroid distance by at least some margin. Repeated over many triplets, the network learns an embedding where "belongs to the same face" becomes a learned, geometry-aware notion of closeness rather than literal 3D proximity — the same general contrastive-learning idea used to train, e.g., face-recognition embeddings (pull same-identity photos together, push different-identity photos apart), applied here to mesh connectivity instead of identity.
3. Face assembly
Once every vertex has a learned association to its face centroid, faces are assembled deterministically — not by another learned step — with geometric validation to reject degenerate or invalid results. Only the association step is learned; turning "these vertices go together" into an actual polygon is ordinary geometric bookkeeping.
How this contrasts with classical quad-remeshing
| Classical field/combinatorial methods | QuadLink | |
|---|---|---|
| When is the "hard problem" solved? | Fresh, per input shape, every time (a field smooth, a flow solve, a linear system, a pattern match) | Once, offline, during training — inference is a single fast forward pass |
| Needs training data? | No | Yes — learns from (point cloud, quad mesh) pairs |
| Can iterate/refine per shape? | Yes, in principle (more solver iterations, tighter tolerances) | No — one shot, the same feed-forward limitation MeshAnything V2 has for triangles: no per-shape optimization loop, so output quality is bounded by training data and whatever single point cloud it's handed |
| Explicit singularity control? | Yes, for methods designed around prescribed or user-guided singularity placement | No — whatever the training distribution taught it to produce |