Insights

Graph databases: from graph theory to enterprise applications

A graph database treats the relationships between data points as being just as important as the data points themselves. This primer traces the idea from Euler’s bridges of Königsberg to where connected data now drives AI, fraud detection, payment routing and grid optimisation, and explains why, for connected problems, it leaves the relational model standing still.

01  Origins: graph theory

The Königsberg bridge problem

The city of Königsberg was divided into four landmasses by the Pregel River, all connected by seven bridges. Citizens wanted to know whether they could take a stroll through the town and cross every bridge exactly once, without repeating any.

Euler’s solution

In 1736, Euler proved that the walk was mathematically impossible. He realised that the size of the landmasses and the length of the bridges did not matter. He simplified the map into a diagram made of two elements: vertices (points) representing the landmasses, and edges (lines) representing the bridges connecting them.

He showed that to enter and leave a landmass without repeating a bridge, each point must have an even number of lines connected to it. Because Königsberg’s landmasses all had an odd number of bridges (three had three bridges, and one had five), the path could not exist.

What Euler invented

By stripping away physical geography and focusing only on how objects connect, Euler founded two major branches of modern mathematics:

02  What is a graph database?

A graph database is a database designed to treat the relationships between data points with exactly the same importance as the data points themselves.

Instead of storing data in rigid, isolated tables, like a spreadsheet, it stores data as a network of connected pieces. It rests on three simple concepts:

03  Why graph databases outperform relational databases

In a traditional relational (SQL) database, connecting data across multiple tables requires costly operations called JOINs. If you want to trace a connection through five or six steps, for example finding friends of friends of friends, a SQL database slows down drastically because it has to search through millions of rows over and over again.

Graph databases use a technique called index-free adjacency. Every piece of data points directly to its connected neighbours in physical memory. Traversing a relationship takes fractions of a millisecond, no matter how large the database grows.

04  Applications in artificial intelligence

Graph databases have become a foundational technology for modern AI systems. They give models context, allowing algorithms to understand how real-world entities relate to one another.

1. Knowledge graphs for LLMs and GenAI (RAG)

Large language models are powerful, but they often hallucinate or lack access to specific, private company data.

2. Fraud detection and anti-money laundering (AML)

Fraud rings rarely use a single identity; they use synthetic networks of stolen data.

3. Advanced recommendation engines

Standard recommendation engines look at what similar users bought. Graph-powered AI takes this further.

4. Supply chain optimisation and impact analysis

Modern supply chains are vulnerable to global disruption.

5. Explainable AI (XAI)

Deep learning models are often criticised as “black boxes”: it is difficult to see why a model made a specific choice.

05  Cross-industry applications

1. Identity and access management (IAM)

2. Network and IT operations (dependency mapping)

3. Master data management (MDM)

4. Social networks and collaboration tools

5. Lineage and data governance

06  Industry-specific applications

Financial optimisation problems

Collateral optimisation and liquidity management

In investment banking and derivatives trading, institutions must pledge collateral (cash, government bonds or equities) to back their trades and mitigate risk.

Liquidity saving mechanisms (LSM) in payment networks

High-value payment networks such as CHIPS or Fedwire process trillions of dollars daily. If Bank A waits for Bank B to pay before it pays Bank C, the whole system gridlocks.

Arbitrage detection (foreign exchange and crypto)

The cheapest payment route between two banks

a. Representing the network as a graph. To find the cheapest route, the routing engine models the financial network mathematically:

Unlike a physical map, where the distance from A to B equals the distance from B to A, payment graph edges are directed and asymmetric. Sending USD to EUR via Bank X might cost 0.5%, while sending EUR to USD back through the same bank might cost 1.2%.

b. Formulating the cost function. The cost of a route is not a single number. It combines three variables the algorithm must optimise:

Total cost  =  Fixed fees  +  Percentage fees  +  Liquidity costs (slippage)

Because percentage fees compound at every hop, the algorithm cannot simply add costs together. It uses logarithmic transformations to turn compounding percentages into simple addition, so standard pathfinding maths works cleanly.

c. The core routing algorithms. Engines pick an algorithm to match the network’s complexity and whether the graph is centralised or decentralised:

d. A worked example. Sending a payment from Bank A (UK) to Bank Z (Japan), the engine weighs two routes: via Bank B (0.5% + $10, then 0.2% + $5) or via Bank C (0.1% + $50, then 0.1% + $5). The winner depends entirely on the amount.

Scenario 1: a small $1,000 payment.

Path via Bank B (cheapest)

(1,000 × 0.005 + 10) + (1,005 × 0.002 + 5) = 15.00 + 7.01 = $22.01

Lower fixed fees favour small amounts.

Path via Bank C

(1,000 × 0.001 + 50) + (1,051 × 0.001 + 5) = 51.00 + 6.05 = $57.05

The high flat setup fee dominates.

Scenario 2: a large $500,000 payment.

Path via Bank B

(500,000 × 0.005 + 10) + high slippage cost = > $3,500

Percentage fees and slippage punish large amounts.

Path via Bank C (cheapest)

(500,000 × 0.001 + 50) + (500,550 × 0.001 + 5) = 550.00 + 505.55 = $1,055.55

Low percentage fees dominate large transactions.

The same graph, the same edges, two opposite routing decisions. That is the point: cost is a function of the payment, not a fixed property of the path.

Supply chain and physical logistics optimisation

Delivery fleet routing (vehicle routing problem)

Global freight and intermodal shipping

Energy, utilities and telecommunications

Power grid load balancing and optimal power flow (OPF)

Telecom routing and content delivery networks (CDNs)

07  Sources and further reading


Building something where the relationships in your data matter as much as the data itself? Get in touch.