Insights
Graph databases: from graph theory to enterprise applications
A graph database treats the relationships between data points as being just as important as the data points themselves. This primer traces the idea from Euler’s bridges of Königsberg to where connected data now drives AI, fraud detection, payment routing and grid optimisation, and explains why, for connected problems, it leaves the relational model standing still.
- 01Origins: graph theory
- 02What is a graph database?
- 03Why graph databases outperform relational databases
- 04Applications in artificial intelligence
- 05Cross-industry applications
- 06Industry-specific applications
- 07Sources and further reading
01 Origins: graph theory
The Königsberg bridge problem
The city of Königsberg was divided into four landmasses by the Pregel River, all connected by seven bridges. Citizens wanted to know whether they could take a stroll through the town and cross every bridge exactly once, without repeating any.
Euler’s solution
In 1736, Euler proved that the walk was mathematically impossible. He realised that the size of the landmasses and the length of the bridges did not matter. He simplified the map into a diagram made of two elements: vertices (points) representing the landmasses, and edges (lines) representing the bridges connecting them.
He showed that to enter and leave a landmass without repeating a bridge, each point must have an even number of lines connected to it. Because Königsberg’s landmasses all had an odd number of bridges (three had three bridges, and one had five), the path could not exist.
What Euler invented
By stripping away physical geography and focusing only on how objects connect, Euler founded two major branches of modern mathematics:
- Graph theory: the study of networks, points and lines. Today this underpins GPS routing, search algorithms, airline schedules and social media friend recommendations.1, 2
- Topology: a kind of geometry that ignores exact measurements, shapes and distances, focusing instead on the structural properties of space that hold when it is stretched or twisted.3, 4, 5
02 What is a graph database?
A graph database is a database designed to treat the relationships between data points with exactly the same importance as the data points themselves.
Instead of storing data in rigid, isolated tables, like a spreadsheet, it stores data as a network of connected pieces. It rests on three simple concepts:
- Nodes (vertices): the entities or “things”, for example a person, a product or a company.
- Edges (relationships): the lines connecting nodes that show how they relate, for example friends with, bought or works at. Edges always have a direction.
- Properties: relevant detail stored inside the nodes or edges. A node’s property might be name: “Alice”; an edge’s property might be date: “2026-06-16”.
03 Why graph databases outperform relational databases
In a traditional relational (SQL) database, connecting data across multiple tables requires costly operations called JOINs. If you want to trace a connection through five or six steps, for example finding friends of friends of friends, a SQL database slows down drastically because it has to search through millions of rows over and over again.
Graph databases use a technique called index-free adjacency. Every piece of data points directly to its connected neighbours in physical memory. Traversing a relationship takes fractions of a millisecond, no matter how large the database grows.
04 Applications in artificial intelligence
Graph databases have become a foundational technology for modern AI systems. They give models context, allowing algorithms to understand how real-world entities relate to one another.
1. Knowledge graphs for LLMs and GenAI (RAG)
Large language models are powerful, but they often hallucinate or lack access to specific, private company data.
- How graphs help: companies build knowledge graphs to act as a structured source of truth.
- GraphRAG: Retrieval-Augmented Generation systems query a graph database to fetch relevant facts and relationships, feeding them into the model prompt. This makes answers more accurate, factual and fully contextual.
2. Fraud detection and anti-money laundering (AML)
Fraud rings rarely use a single identity; they use synthetic networks of stolen data.
- How graphs help: pattern-matching algorithms scan graph networks to detect suspicious connections instantly. If ten different bank accounts are secretly linked to the same phone number or IP address across multiple degrees of separation, the graph exposes the ring immediately.
3. Advanced recommendation engines
Standard recommendation engines look at what similar users bought. Graph-powered AI takes this further.
- How graphs help: Graph Neural Networks (GNNs) analyse multi-layered relationships in real time. They map current browsing behaviour, past purchases, social connections and the granular attributes of products to recommend items with high precision.
4. Supply chain optimisation and impact analysis
Modern supply chains are vulnerable to global disruption.
- How graphs help: AI uses graph data to map factories, suppliers, shipping lanes and weather patterns. If a factory in one country shuts down, the system can simulate the ripple effects across the entire global network and suggest alternative routes.
5. Explainable AI (XAI)
Deep learning models are often criticised as “black boxes”: it is difficult to see why a model made a specific choice.
- How graphs help: when a model reasons over a knowledge graph, developers can trace the exact semantic path the algorithm took to reach a conclusion. That makes the decision auditable, transparent and easier to defend under strict regulation.
05 Cross-industry applications
1. Identity and access management (IAM)
- The problem: in a traditional SQL database, checking whether “User X” has access to “File Y” means tracing a long chain of inheritance (user to team to department to region to resource folder), which produces slow, deeply nested joins.
- The graph solution: identity structures are naturally a graph. Nodes represent users, roles, devices and assets; edges represent relationships such as MEMBER_OF or HAS_PERMISSION. Checking permissions becomes a fast traversal from the user node to the resource node, supporting secure, real-time authorisation.
2. Network and IT operations (dependency mapping)
- The problem: when an outage occurs, finding the root cause is hard because systems rely on intricate, multi-layered dependencies across cloud servers, microservices, container clusters and routing hardware.
- The graph solution: teams map the entire infrastructure as a graph. If a virtual router fails, a query instantly traces all outgoing paths to show which applications, customers and services are affected. Teams can also simulate a shutdown or upgrade before it happens, avoiding accidental downtime.
3. Master data management (MDM)
- The problem: large corporations have customer or product data fractured across dozens of systems (CRM, billing, shipping, support). Merging these silos into a single “360-degree view” usually produces duplication and conflicting records.
- The graph solution: a graph database acts as an overarching structure that links identical entities without forcing the legacy systems to change. It maps that “John Doe” in billing, “J. Doe” in shipping and account ID 12345 are the same physical person, creating a unified corporate source of truth.
4. Social networks and collaboration tools
- The problem: features such as “People You May Know” require tracking second, third and fourth-degree connections across millions of accounts.
- The graph solution: graphs natively represent human connections. A traversal from your node to your friends’ nodes, then to theirs, takes fractions of a millisecond, allowing instant friend recommendations, mutual connection counts and activity feeds.
5. Lineage and data governance
- The problem: under strict privacy law such as GDPR, organisations must prove exactly how they handle, move and alter sensitive data, which flows through a complex pipeline of databases, analytics tools and reporting software.
- The graph solution: compliance teams use graph databases to track data lineage. Every column, table, script and API is a node. If a regulator asks where a piece of customer financial data originated and who has touched it, the graph provides an automated, visual audit trail from source to destination.
06 Industry-specific applications
Financial optimisation problems
Collateral optimisation and liquidity management
In investment banking and derivatives trading, institutions must pledge collateral (cash, government bonds or equities) to back their trades and mitigate risk.
- The challenge: banks hold pools of assets scattered across global entities and custodians, each subject to different regulatory haircuts, interest rates and clearing-house rules.
- The graph approach: assets, legal entities, custodians and clearing houses become nodes. Edges represent the legal pathways, transfer speeds and transaction costs of moving assets.
- The optimisation: network-flow algorithms run over this graph in real time, calculating the cheapest, fastest path to shift assets so the bank meets its margin calls using the lowest-quality eligible collateral, preserving premium liquidity.
Liquidity saving mechanisms (LSM) in payment networks
High-value payment networks such as CHIPS or Fedwire process trillions of dollars daily. If Bank A waits for Bank B to pay before it pays Bank C, the whole system gridlocks.
- The challenge: safely settling billions in payments without requiring banks to hold massive, expensive intraday cash reserves.
- The graph approach: pending payments are submitted to a central queue structured as a directed graph. Nodes are banks; directed edges are outstanding payment obligations from one bank to another.
- The optimisation: cycle-detection and netting algorithms scan the graph for closed loops of obligations (Bank A owes B $10M, B owes C $10M, C owes A $10M). The algorithm cancels out these multi-party loops simultaneously, settling billions without moving a single dollar of actual liquidity.
Arbitrage detection (foreign exchange and crypto)
- The challenge: finding split-second price discrepancies across multiple currencies or trading pairs.
- The graph approach: currencies are nodes, and the current exchange rates are edges.
- The optimisation: the algorithm transforms each rate using negative logarithms (−ln(rate)) and applies the Bellman-Ford algorithm. This finds “negative weight cycles”, instantly flagging a loop of trades where USD to EUR to GBP to USD ends with more money than it started with.
The cheapest payment route between two banks
a. Representing the network as a graph. To find the cheapest route, the routing engine models the financial network mathematically:
- Nodes (V): individual banks, financial institutions or digital wallets.
- Edges (E): a direct financial relationship or liquidity channel between two banks, for example a Nostro/Vostro account or an active credit line.
- Edge weights (W): the total financial cost of moving money across that link.
Unlike a physical map, where the distance from A to B equals the distance from B to A, payment graph edges are directed and asymmetric. Sending USD to EUR via Bank X might cost 0.5%, while sending EUR to USD back through the same bank might cost 1.2%.
b. Formulating the cost function. The cost of a route is not a single number. It combines three variables the algorithm must optimise:
Total cost = Fixed fees + Percentage fees + Liquidity costs (slippage)
- Fixed fees: a flat charge per hop, for example $15 per intermediary bank.
- Percentage fees: a fee proportional to payment size, for example 0.1% or an FX spread margin.
- Liquidity and slippage: if you are sending $10,000,000, a smaller intermediary may not hold enough cash to clear it. Forcing a large payment through a small channel causes slippage, or higher premium rates.
Because percentage fees compound at every hop, the algorithm cannot simply add costs together. It uses logarithmic transformations to turn compounding percentages into simple addition, so standard pathfinding maths works cleanly.
c. The core routing algorithms. Engines pick an algorithm to match the network’s complexity and whether the graph is centralised or decentralised:
- Dijkstra’s algorithm (with a modified cost function): where a central hub such as Swift sees all fees and rates, a modified Dijkstra greedily explores the neighbouring correspondent banks offering the lowest combined fee and FX spread, building a tree of cheapest paths to the destination.
- Multi-path payment and network flow: for a very large amount, say $50 million, a single chain of banks may lack the capital to pass it through without astronomical fees. The engine treats the payment like water through pipes, running a minimum-cost maximum-flow algorithm, then splits the payment into smaller streams ($20M via Path A, $20M via Path B, $10M via Path C) routed simultaneously and reassembled at the destination for the lowest aggregate rate.
- On-the-fly pathfinding (decentralised networks): in crypto liquidity networks or the Bitcoin Lightning Network, no central server knows every node’s balance. Nodes broadcast their fees, and algorithms such as Spira or Flare use localised gossip protocols to map a peer-to-peer path, favouring nodes with high liquidity and low routing fees.
d. A worked example. Sending a payment from Bank A (UK) to Bank Z (Japan), the engine weighs two routes: via Bank B (0.5% + $10, then 0.2% + $5) or via Bank C (0.1% + $50, then 0.1% + $5). The winner depends entirely on the amount.
Scenario 1: a small $1,000 payment.
Path via Bank B (cheapest)
(1,000 × 0.005 + 10) + (1,005 × 0.002 + 5) = 15.00 + 7.01 = $22.01
Lower fixed fees favour small amounts.
Path via Bank C
(1,000 × 0.001 + 50) + (1,051 × 0.001 + 5) = 51.00 + 6.05 = $57.05
The high flat setup fee dominates.
Scenario 2: a large $500,000 payment.
Path via Bank B
(500,000 × 0.005 + 10) + high slippage cost = > $3,500
Percentage fees and slippage punish large amounts.
Path via Bank C (cheapest)
(500,000 × 0.001 + 50) + (500,550 × 0.001 + 5) = 550.00 + 505.55 = $1,055.55
Low percentage fees dominate large transactions.
The same graph, the same edges, two opposite routing decisions. That is the point: cost is a function of the payment, not a fixed property of the path.
Supply chain and physical logistics optimisation
Delivery fleet routing (vehicle routing problem)
- The challenge: companies such as UPS or Amazon dispatch thousands of trucks to deliver millions of packages, minimising fuel, driver hours and missed delivery windows.
- The graph approach: distribution centres and customer addresses are nodes. Edges are roads, weighted with real-time traffic, historical delays and toll costs.
- The optimisation: heuristic algorithms such as genetic algorithms or ant colony optimisation solve the vehicle routing problem, optimising paths for an entire fleet simultaneously under strict capacity constraints rather than for a single vehicle.
Global freight and intermodal shipping
- The challenge: moving cargo containers across oceans, rail and trucks while balancing cost against transit speed.
- The graph approach: ports, rail yards and highway hubs are nodes. Edges are transport links constrained by shipping schedules, container capacities and customs clearance times.
- The optimisation: mixed-integer linear programming finds the shortest or cheapest path for cargo, rerouting dynamically if a port faces a strike or a rail line shuts down.
Energy, utilities and telecommunications
Power grid load balancing and optimal power flow (OPF)
- The challenge: grids must match generation to demand in real time to prevent blackouts, while handling unpredictable inputs such as solar and wind.
- The graph approach: power plants, substations and homes are nodes. Transmission lines are edges, weighted by maximum capacity and impedance (power loss over distance).
- The optimisation: algorithms solve optimal power flow across the topology, deciding exactly how much each generator should output to satisfy demand at the lowest cost without overloading any line.
Telecom routing and content delivery networks (CDNs)
- The challenge: streaming platforms such as Netflix or YouTube must deliver high-definition video to millions of users at once without lag or local congestion.
- The graph approach: the internet is mapped as a graph of data centres, routers and users. Edges are fibre-optic links weighted by current latency and bandwidth.
- The optimisation: max-flow min-cut algorithms split data packets across multiple pathways, avoiding bottlenecks and routing traffic to the server physically closest to the user.
07 Sources and further reading
- Bellew, on Euler and the origins of graph theory (NSHSS). nshss.org/media/29809/bellew.pdf
- Leonhard Euler: the blind mathematician (Math is Figure-Out-Able). mathisfigureoutable.com
- The bridges of Königsberg (Plus Magazine). plus.maths.org/bridges-k-nigsberg
- Applications of topology (University of Bergen). uib.no/en/math/55540/applications
- Topology (Maths Careers). mathscareers.org.uk/topology
Building something where the relationships in your data matter as much as the data itself? Get in touch.