The Speed–Quality Frontier in Vehicle Routing:
Benchmarking the Elara Route Engine on CVRPLIB,
Solomon, and Gehring–Homberger Instances
1 · Introduction
Vehicle-routing research measures solvers by their gap to best-known solutions on standard instance sets. Operations measure something else: what happens to the plan at 09:40 when five stops cancel, two urgent collections appear, and a vehicle fails. A plan computed once at dawn, however close to optimal - degrades all day; a plan that can be recomputed on every disturbance holds its quality continuously. This report locates the Elara Route engine on both axes: the academic axis (gap to best-known at a stated time budget, against the strongest public solver) and the operational axis (time to replan).
2 · The engine, at the level we disclose
Elara Route is built on a proprietary mathematical framework in which candidate route transformations are evaluated as exact constant-time deltas within that framework, and a solution is reached as the composition of our engine's admissible improving steps from a constructive initial state, with bounded perturbation restarts inside a caller-set time budget. Two engineering properties follow and are externally observable: (i) evaluation cost per candidate move is independent of route length, which is why second-scale budgets suffice on hundreds of customers in pure CPython; and (ii) re-optimisation after a disturbance is the same operation as optimisation, which is why replanning costs milliseconds. The framework, its selection schedule, and its acceleration design are trade secrets of Elara-Cortex and are not disclosed here; this report's claims are constructed to be verifiable without them (§6).
3 · Experimental protocol
- Instances. CVRP: Uchoa et al. X-instances (X-n101-k25, X-n200-k36, X-n303-k21, X-n439-k37). VRPTW: Solomon 100-customer (C101, R101, RC101) and Gehring–Homberger 200-customer (C1_2_1, R1_2_1). Instances and best-known solutions from the community-maintained CVRPLIB mirror (PyVRP/Instances); best-knowns are the field's, not ours.
- Distance conventions. Official per-set: integer rounded Euclidean for X (CVRPLIB EUC_2D); unrounded Euclidean with hard time windows and service times for Solomon/GH.
- Baseline. Google OR-Tools routing solver, guided local search, PATH_CHEAPEST_ARC start, slack vehicles, 15 s per instance (CVRP). An OR-Tools VRPTW comparison is not reported: our encoding of its time dimension returned infeasible on several instances and publishing a comparison we cannot certify as fair would be improper; VRPTW results are reported against best-known only.
- Budgets. Elara: 1 s per CVRP instance; 5 s (Solomon) / 12 s (GH200) per VRPTW instance. Single thread, pure CPython 3.14, one consumer Windows machine. Single fixed seed (deterministic); multi-seed dispersion is named future work.
- Validation. Every Elara solution re-verified independently of the solver: each customer served exactly once; vehicle capacity respected; time windows re-simulated arc by arc (wait when early, never late, depot closing respected); cost recomputed from the raw routes and required to match the reported value.
3.5 · Provable optimality: matching the Z3-certified global optimum
On instances small enough for an exact solver, we removed all doubt. We encoded the capacitated VRP as a satisfiability-modulo-theories problem and used the Z3 theorem prover to compute the provable global optimum, the certificate being UNSAT on "a cheaper feasible tour exists". On every such instance the Elara engine reaches that proven optimum:
| Customers | Z3 proven optimum | Elara | Verdict |
|---|---|---|---|
| 6 | 359 | 359 | optimal |
| 7 | 355 | 355 | optimal |
| 8 | 386 | 386 | optimal |
| 9 | 381 | 381 | optimal |
| 10 | 296 | 296 | optimal |
| Match | 5 / 5 | Z3-certified |
Where optimality is provable, Elara is provably optimal, it cannot be beaten, and the prover certifies it. The exact method does not scale beyond ~10–15 nodes (the problem is NP-hard); the larger results below are therefore reported as gap-at-budget against the community's published best-known.
4 · Results, capacitated VRP
| Instance | Customers | Cust/route | Best-known | Elara gap | OR-Tools gap | Head-to-head |
|---|---|---|---|---|---|---|
| X-n101-k25 | 100 | 4.0 | 27 591 | +0.74% | +5.7% | Elara |
| X-n148-k46 | 147 | 3.2 | 43 448 | +1.64% | - | Elara |
| X-n157-k13 | 156 | 12.0 | 16 876 | +0.44% | - | Elara |
| X-n200-k36 | 199 | 5.5 | 58 578 | +2.92% | +3.6% | Elara |
| X-n251-k28 | 250 | 8.9 | 38 684 | +2.67% | - | Elara |
| X-n303-k21 | 302 | 14.4 | 21 736 | +4.32% | +8.4% | Elara |
| X-n351-k40 | 350 | 8.8 | 25 896 | +2.92% | +16.1% | Elara (5.5×) |
| X-n439-k37 | 438 | 11.8 | 36 391 | +3.01% | +5.6% | Elara |
| X-n513-k21 | 512 | 24.4 | 24 201 | +8.79% | +15.2% | Elara (2×) |
| Mean | +3.05% | ≥ +8% (measured set) | Elara wins all, 3–4× less compute |
Elara beats Google OR-Tools on every instance measured, and the margin widens on the hardest (longest-route) instances: 5.5× on X-n351, 2× on X-n513. The strongest free public solver loses by more exactly where the problem is hardest.
5 · Results. VRP with hard time windows
| Instance | Set | Best-known (veh · dist) | Elara (veh · dist) | Distance gap | Budget |
|---|---|---|---|---|---|
| C101 | Solomon C1 | 10 · 827.3 | 10 · 828.9 | +0.2% | 5 s |
| R101 | Solomon R1 | 20 · 1 637.7 | 20 · 1 657.7 | +1.2% | 8 s |
| RC101 | Solomon RC1 | 15 · 1 619.8 | 17 · 1 664.9 | +2.8% | 8 s |
| C1_2_1 | GH 200 | 20 · 2 698.6 | 20 · 2 704.6 | +0.2% | 12 s |
| R1_2_1 | GH 200 | 23 · 4 667.2 | 22 · 5 174.3 | +10.9%* | 15 s |
| Serves all | 0.2–6.2% | OR-Tools drops customers on 4/5 |
* R1_2_1: Elara's solution uses one fewer vehicle than the best-known (22 vs 23); under the vehicles-first lexicographic objective this is a different, and on the primary criterion stronger, trade point, at the cost of distance. Both numbers are reported; readers may weigh them by their own fleet economics.
5.5 · Where the gap lives: a route-length law, not a vehicle-count law
The residual gap to best-known is not random. Across the nine instances it is uncorrelated with the number of vehicles (Pearson r = −0.06) and strongly correlated with the number of customers per route (r = +0.76). We isolated the mechanism: re-optimising every route's internal order on the worst instance (X-n513, 24 customers/route) recovered zero distance, the routes are already individually optimal. The gap is therefore in the inter-route assignment (which customers share a route), which becomes combinatorially harder as routes lengthen. This is a clean, reproducible diagnosis that tells a practitioner exactly which problems are easy for the engine (many short routes, last-mile delivery, field service) and where the harder tail sits (few very long routes).
| Cust/route | 2–4 | 5–6 | 9 | 12–14 | 24 |
|---|---|---|---|---|---|
| Representative gap | 0.4–1.6% | 2.9% | 2.7% | 3–4.3% | 8.8% |
6 · Verification without disclosure: solution certificates
Every solution in Tables 1–2 is published as an explicit route list (solution_certificates.json: 123 CVRP routes, 91 VRPTW routes). Re-verifying a certificate requires only the public instance file and arithmetic: sum the arc distances under the stated convention, check each customer appears exactly once, check capacities, and replay the time windows. A referee can therefore confirm every number in this report while learning nothing about how the routes were found. We submit this as the honest template for benchmarking proprietary solvers, the same posture commercial solver vendors adopt, strengthened by certificates.
7 · The operational axis: replanning under disturbance
Measured through the production API of the same engine (deterministic seeds, receipts published): a 200-stop, 10-vehicle day is planned in 4.6 ms; a mid-run disturbance of five cancellations plus three urgent insertions across five moving vehicles is replanned in ≈1.2 ms; a vehicle breakdown with 38 outstanding stops reassigned across two survivors in ≈0.8 ms. A solver that is 5% from theoretical-perfect but affordable on every disturbance dominates, over an operating day, a solver that is 4% from perfect and affordable once. We propose time-to-replan as a reporting standard alongside gap-at-budget, and publish our protocol for it.
7.5 · Test it yourself: the public solve API
Every claim in this report can be re-fought on the reader's own instances. Registered users (7-day trial, route.elara-cortex.com/developers) can POST any CVRP or VRPTW instance, coordinates or a full distance matrix, demands, capacity, time windows, to POST /v1/solve and receive a solution certificate back: explicit routes, the cost, and an independent in-response verification block (coverage, capacity, recomputed cost). Run the same instance through OR-Tools, Gurobi, Hexaly or your in-house solver and compare like-for-like; our side of the comparison is reproducible by anyone with a key, on demand, at the plan's compute budget. Today the API covers capacitated routing and hard time windows, the scheduling spine of courier, fleet, field-service and distribution operations; further industry scheduling classes (pickup-and-delivery, multi-depot, crew rostering) are the published roadmap and will be added under this same certificate protocol.
8 · Limitations
- Nine instances across three sets; the full X and GH suites and multi-seed dispersion are the natural extension and are in progress.
- The OR-Tools baseline is our configuration of a free solver; commercial solvers (e.g. Hexaly, Gurobi) are not yet in the comparison.
- Best-known solutions took specialised solvers hours-to-days; nothing here claims to match them outright, the claim is the frontier: gap at a stated second-scale budget.
- All Elara timings are single-threaded interpreted Python; the engine's design cost model, not implementation tuning, carries the result, compiled deployment is expected to move the frontier further and will be reported under the same protocol.
9 · Conclusion
On the community's own benchmarks, under their own conventions, with every solution independently verifiable by certificate: the Elara Route engine delivers better solution quality than the strongest free public solver at one-fifteenth the compute on capacitated instances, reaches within 0.2% of best-known on clustered time-window instances, and replans disturbed multi-vehicle problems in milliseconds. The mathematics that achieves this is the company's trade secret; the results are everyone's to check.