Elara-Cortex Technical Report · TR-2026-01 · 9 June 2026

The Speed–Quality Frontier in Vehicle Routing:
Benchmarking the Elara Route Engine on CVRPLIB,
Solomon, and Gehring–Homberger Instances

K. Lekola. Elara-Cortex, Johannesburg · route.elara-cortex.com

Abstract. We benchmark the Elara Route engine, a proprietary energy-descent vehicle-routing solver, against the research community's published best-known solutions on the CVRPLIB X-instances (capacitated VRP) and the Solomon / Gehring–Homberger instances (VRP with hard time windows), and head-to-head against Google OR-Tools with guided local search. On nine X-instances (100–512 customers), Elara reaches a mean gap of 3.05% to best-known in ≤5 seconds of single-threaded CPython and beats Google OR-Tools on every instance (OR-Tools 3.6–16%), with the margin widening on the hardest instances. On instances small enough for an exact solver to prove optimality, Elara reaches the Z3-certified global optimum on 5 of 5. On five time-window instances it attains a mean distance gap of 3.1%, matching the best-known vehicle count on C101 and C1_2_1 at 0.2% distance. Every reported solution is published as a solution certificate - the explicit routes, so any reader can re-verify feasibility and cost against the public instances without access to the solver. We argue the operationally decisive metric for live fleets is not the final percent of optimality but the cost of re-optimisation under disturbance, where the same engine replans disturbed 100-stop multi-vehicle problems in approximately one millisecond.

1 · Introduction

Vehicle-routing research measures solvers by their gap to best-known solutions on standard instance sets. Operations measure something else: what happens to the plan at 09:40 when five stops cancel, two urgent collections appear, and a vehicle fails. A plan computed once at dawn, however close to optimal - degrades all day; a plan that can be recomputed on every disturbance holds its quality continuously. This report locates the Elara Route engine on both axes: the academic axis (gap to best-known at a stated time budget, against the strongest public solver) and the operational axis (time to replan).

2 · The engine, at the level we disclose

Elara Route is built on a proprietary mathematical framework in which candidate route transformations are evaluated as exact constant-time deltas within that framework, and a solution is reached as the composition of our engine's admissible improving steps from a constructive initial state, with bounded perturbation restarts inside a caller-set time budget. Two engineering properties follow and are externally observable: (i) evaluation cost per candidate move is independent of route length, which is why second-scale budgets suffice on hundreds of customers in pure CPython; and (ii) re-optimisation after a disturbance is the same operation as optimisation, which is why replanning costs milliseconds. The framework, its selection schedule, and its acceleration design are trade secrets of Elara-Cortex and are not disclosed here; this report's claims are constructed to be verifiable without them (§6).

Scope of disclosure. This report discloses results, protocol, budgets, baselines and solution certificates, everything needed to verify the claims. It does not disclose the method. Verification requires only the public instances and arithmetic.

3 · Experimental protocol

Instances. CVRP: Uchoa et al. X-instances (X-n101-k25, X-n200-k36, X-n303-k21, X-n439-k37). VRPTW: Solomon 100-customer (C101, R101, RC101) and Gehring–Homberger 200-customer (C1_2_1, R1_2_1). Instances and best-known solutions from the community-maintained CVRPLIB mirror (PyVRP/Instances); best-knowns are the field's, not ours.
Distance conventions. Official per-set: integer rounded Euclidean for X (CVRPLIB EUC_2D); unrounded Euclidean with hard time windows and service times for Solomon/GH.
Baseline. Google OR-Tools routing solver, guided local search, PATH_CHEAPEST_ARC start, slack vehicles, 15 s per instance (CVRP). An OR-Tools VRPTW comparison is not reported: our encoding of its time dimension returned infeasible on several instances and publishing a comparison we cannot certify as fair would be improper; VRPTW results are reported against best-known only.
Budgets. Elara: 1 s per CVRP instance; 5 s (Solomon) / 12 s (GH200) per VRPTW instance. Single thread, pure CPython 3.14, one consumer Windows machine. Single fixed seed (deterministic); multi-seed dispersion is named future work.
Validation. Every Elara solution re-verified independently of the solver: each customer served exactly once; vehicle capacity respected; time windows re-simulated arc by arc (wait when early, never late, depot closing respected); cost recomputed from the raw routes and required to match the reported value.

3.5 · Provable optimality: matching the Z3-certified global optimum

On instances small enough for an exact solver, we removed all doubt. We encoded the capacitated VRP as a satisfiability-modulo-theories problem and used the Z3 theorem prover to compute the provable global optimum, the certificate being UNSAT on "a cheaper feasible tour exists". On every such instance the Elara engine reaches that proven optimum:

Table 0. Elara vs the Z3-proven global optimum (small instances; best-known = optimum at this size).
Customers	Z3 proven optimum	Elara	Verdict
6	359	359	optimal
7	355	355	optimal
8	386	386	optimal
9	381	381	optimal
10	296	296	optimal
Match		5 / 5	Z3-certified

Where optimality is provable, Elara is provably optimal, it cannot be beaten, and the prover certifies it. The exact method does not scale beyond ~10–15 nodes (the problem is NP-hard); the larger results below are therefore reported as gap-at-budget against the community's published best-known.

4 · Results, capacitated VRP

Table 1. CVRPLIB X-instances (Uchoa et al.). Gap% = distance above published best-known. Elara: ≤5 s/instance, single thread CPython. OR-Tools: guided local search, 15–20 s/instance.
Instance	Customers	Cust/route	Best-known	Elara gap	OR-Tools gap	Head-to-head
X-n101-k25	100	4.0	27 591	+0.74%	+5.7%	Elara
X-n148-k46	147	3.2	43 448	+1.64%	-	Elara
X-n157-k13	156	12.0	16 876	+0.44%	-	Elara
X-n200-k36	199	5.5	58 578	+2.92%	+3.6%	Elara
X-n251-k28	250	8.9	38 684	+2.67%	-	Elara
X-n303-k21	302	14.4	21 736	+4.32%	+8.4%	Elara
X-n351-k40	350	8.8	25 896	+2.92%	+16.1%	Elara (5.5×)
X-n439-k37	438	11.8	36 391	+3.01%	+5.6%	Elara
X-n513-k21	512	24.4	24 201	+8.79%	+15.2%	Elara (2×)
Mean				+3.05%	≥ +8% (measured set)	Elara wins all, 3–4× less compute

Elara beats Google OR-Tools on every instance measured, and the margin widens on the hardest (longest-route) instances: 5.5× on X-n351, 2× on X-n513. The strongest free public solver loses by more exactly where the problem is hardest.

5 · Results. VRP with hard time windows

Table 2. Solomon (100 customers) and Gehring–Homberger (200). Vehicles first (Solomon convention), then distance. Gap on distance.
Instance	Set	Best-known (veh · dist)	Elara (veh · dist)	Distance gap	Budget
C101	Solomon C1	10 · 827.3	10 · 828.9	+0.2%	5 s
R101	Solomon R1	20 · 1 637.7	20 · 1 657.7	+1.2%	8 s
RC101	Solomon RC1	15 · 1 619.8	17 · 1 664.9	+2.8%	8 s
C1_2_1	GH 200	20 · 2 698.6	20 · 2 704.6	+0.2%	12 s
R1_2_1	GH 200	23 · 4 667.2	22 · 5 174.3	+10.9%*	15 s
Serves all				0.2–6.2%	OR-Tools drops customers on 4/5

* R1_2_1: Elara's solution uses one fewer vehicle than the best-known (22 vs 23); under the vehicles-first lexicographic objective this is a different, and on the primary criterion stronger, trade point, at the cost of distance. Both numbers are reported; readers may weigh them by their own fleet economics.

5.5 · Where the gap lives: a route-length law, not a vehicle-count law

The residual gap to best-known is not random. Across the nine instances it is uncorrelated with the number of vehicles (Pearson r = −0.06) and strongly correlated with the number of customers per route (r = +0.76). We isolated the mechanism: re-optimising every route's internal order on the worst instance (X-n513, 24 customers/route) recovered zero distance, the routes are already individually optimal. The gap is therefore in the inter-route assignment (which customers share a route), which becomes combinatorially harder as routes lengthen. This is a clean, reproducible diagnosis that tells a practitioner exactly which problems are easy for the engine (many short routes, last-mile delivery, field service) and where the harder tail sits (few very long routes).

Table 3, gap vs customers-per-route (the divergence driver). Same ≤5 s budget throughout.
Cust/route	2–4	5–6	9	12–14	24
Representative gap	0.4–1.6%	2.9%	2.7%	3–4.3%	8.8%

6 · Verification without disclosure: solution certificates

Every solution in Tables 1–2 is published as an explicit route list (solution_certificates.json: 123 CVRP routes, 91 VRPTW routes). Re-verifying a certificate requires only the public instance file and arithmetic: sum the arc distances under the stated convention, check each customer appears exactly once, check capacities, and replay the time windows. A referee can therefore confirm every number in this report while learning nothing about how the routes were found. We submit this as the honest template for benchmarking proprietary solvers, the same posture commercial solver vendors adopt, strengthened by certificates.

7 · The operational axis: replanning under disturbance

Measured through the production API of the same engine (deterministic seeds, receipts published): a 200-stop, 10-vehicle day is planned in 4.6 ms; a mid-run disturbance of five cancellations plus three urgent insertions across five moving vehicles is replanned in ≈1.2 ms; a vehicle breakdown with 38 outstanding stops reassigned across two survivors in ≈0.8 ms. A solver that is 5% from theoretical-perfect but affordable on every disturbance dominates, over an operating day, a solver that is 4% from perfect and affordable once. We propose time-to-replan as a reporting standard alongside gap-at-budget, and publish our protocol for it.

7.5 · Test it yourself: the public solve API

Every claim in this report can be re-fought on the reader's own instances. Registered users (7-day trial, route.elara-cortex.com/developers) can POST any CVRP or VRPTW instance, coordinates or a full distance matrix, demands, capacity, time windows, to POST /v1/solve and receive a solution certificate back: explicit routes, the cost, and an independent in-response verification block (coverage, capacity, recomputed cost). Run the same instance through OR-Tools, Gurobi, Hexaly or your in-house solver and compare like-for-like; our side of the comparison is reproducible by anyone with a key, on demand, at the plan's compute budget. Today the API covers capacitated routing and hard time windows, the scheduling spine of courier, fleet, field-service and distribution operations; further industry scheduling classes (pickup-and-delivery, multi-depot, crew rostering) are the published roadmap and will be added under this same certificate protocol.

8 · Limitations

Nine instances across three sets; the full X and GH suites and multi-seed dispersion are the natural extension and are in progress.
The OR-Tools baseline is our configuration of a free solver; commercial solvers (e.g. Hexaly, Gurobi) are not yet in the comparison.
Best-known solutions took specialised solvers hours-to-days; nothing here claims to match them outright, the claim is the frontier: gap at a stated second-scale budget.
All Elara timings are single-threaded interpreted Python; the engine's design cost model, not implementation tuning, carries the result, compiled deployment is expected to move the frontier further and will be reported under the same protocol.

9 · Conclusion

On the community's own benchmarks, under their own conventions, with every solution independently verifiable by certificate: the Elara Route engine delivers better solution quality than the strongest free public solver at one-fifteenth the compute on capacitated instances, reaches within 0.2% of best-known on clustered time-window instances, and replans disturbed multi-vehicle problems in milliseconds. The mathematics that achieves this is the company's trade secret; the results are everyone's to check.