QuadLink: Autoregressive Quad-Dominant Mesh Generation via Point-Relation Learning

arXiv:2605.16813 (2026)
Explained assuming: you know quad meshes, valence, and point clouds at a working level. The new deep-learning idea (contrastive/triplet-margin association learning) is explained from scratch below.

One-sentence version: Where classical quad-remeshing methods solve a fresh per-shape combinatorial or field-based problem every time, QuadLink instead learns, once and offline, to predict a quad-dominant mesh directly from a point cloud in a single feed-forward pass, sidestepping the common practice of generating triangles first and converting to quads afterward, which this paper argues is unreliable when the triangles being converted are themselves machine-generated rather than clean ground truth.

The problem: triangle-first generation, then convert, is fragile

Autoregressive mesh generators like MeshAnything V2 are, by construction, triangle-only — every face-token they emit encodes exactly 3 vertices. If a production pipeline actually wants quad-dominant output (the norm in film/game asset pipelines, and what Catmull-Clark subdivision and clean UV layout both want), the standard workaround is to generate triangles, then run a separate triangle-to-quad conversion pass — merging pairs of adjacent triangles into quads by some heuristic. QuadLink's motivating claim is that this two-step approach "typically produces quad meshes with poor topology": a conversion heuristic tuned on clean, human-made triangle meshes doesn't transfer cleanly to the discretization noise and small inconsistencies a generative model's own triangle output actually contains.

The paper's own training pipeline actually needs a triangle-to-quad step too — a "Tri-to-Quad Operator" that converts clean, artist-made ground-truth triangle meshes into quad-dominant supervision targets for training. The paper is explicit that this operator's assumptions "can be violated by triangle-based generative models whose outputs contain discretization errors and perturbations," i.e. the conversion step works fine on clean human-made data (which is all it's ever applied to here) but is exactly the thing that becomes unreliable once you try to apply the same idea downstream, to a model's own noisy output — which is the case for the more common convert-after-generation approach this paper is positioning itself against.

Three-stage pipeline

1. Anchor prediction

From the input point cloud, predict two kinds of "anchor" points at once: candidate mesh vertices, and candidate face centroids (one representative point per quad face, before the face's actual boundary is known). Predicting both together, rather than vertices alone, is what gives the later stage something to associate vertices to.

2. Point-Relation Learning — the new mechanism

Given a soup of predicted vertices and face centroids, which vertices belong to which face? The naive answer — nearest centroid wins — fails for production-style quad meshes, which deliberately have anisotropic (non-uniform, direction-dependent) face density: a long thin quad strip following an edge loop has vertices much closer to a neighboring face's centroid than to their own face's centroid in raw Euclidean distance, so proximity alone gives wrong associations.

QuadLink instead learns a mapping into a separate feature space where correct vertex-to-centroid pairs end up close together and incorrect pairs end up far apart — regardless of how close or far they were in ordinary 3D distance. This is trained with a triplet margin loss: take a (vertex, correct-centroid, wrong-centroid) triplet from training data, and penalize the network whenever the wrong-centroid distance isn't larger than the correct-centroid distance by at least some margin. Repeated over many triplets, the network learns an embedding where "belongs to the same face" becomes a learned, geometry-aware notion of closeness rather than literal 3D proximity — the same general contrastive-learning idea used to train, e.g., face-recognition embeddings (pull same-identity photos together, push different-identity photos apart), applied here to mesh connectivity instead of identity.

3. Face assembly

Once every vertex has a learned association to its face centroid, faces are assembled deterministically — not by another learned step — with geometric validation to reject degenerate or invalid results. Only the association step is learned; turning "these vertices go together" into an actual polygon is ordinary geometric bookkeeping.

Blue = the input, the same point-cloud modality other conditioned mesh generators take. Orange = the paper's actual contribution, a learned association mechanism replacing nearest-neighbor grouping. Green = translating the learned associations back into explicit quad faces, ordinary bookkeeping once the hard part is done.

How this contrasts with classical quad-remeshing

	Classical field/combinatorial methods	QuadLink
When is the "hard problem" solved?	Fresh, per input shape, every time (a field smooth, a flow solve, a linear system, a pattern match)	Once, offline, during training — inference is a single fast forward pass
Needs training data?	No	Yes — learns from (point cloud, quad mesh) pairs
Can iterate/refine per shape?	Yes, in principle (more solver iterations, tighter tolerances)	No — one shot, the same feed-forward limitation MeshAnything V2 has for triangles: no per-shape optimization loop, so output quality is bounded by training data and whatever single point cloud it's handed
Explicit singularity control?	Yes, for methods designed around prescribed or user-guided singularity placement	No — whatever the training distribution taught it to produce

Availability note. As of this writing, no public code or pretrained weights have been released for this paper — the ideas above are described from the paper's own text, but there is nothing to run yet.