public/stateres.md
2024-05-22 04:03:59 -07:00

166 lines
4.9 KiB
Markdown

::: header
# how matrix works
:::
::: section
[Matrix](https://matrix.org/) is a merkle-dag based operational transform
crdt, primarily used for instant messaging. This article will try to
break it down into easier to understand parts.
## a crdt
Let's say you want to edit a document with your friends without
a central server. How would you implement it?
```rust
struct Event {
content: Any,
}
```
Let's say you send *event*s containing the changes you're making to
the document. Then, you send this event to your friends so they can apply
it. But what happens if events arrive out of order, or your friends edit
the same document you're editing? We need to know which event came first
using *ordering*!
Could we use a timestamp? It would work, but clocks can get out of sync
and can be set to different times. Luckily, having an exact real-world
time isn't necessary. All you need to know is which events came first,
or a *causual order*. What if there was a simple incrementing counter?
While we're at it, I might as well add an event_id field to know which
event is which.
```rust
struct Event {
event_id: EventId,
content: Any,
count: u64,
}
```
This is a bit better, but is still not great. Counters can be spoofed,
which will be an issue in public documents, like chatrooms or forums. We
can take a page out of git and implement a *directed acyclic graph*.
A graph is a set of nodes, like `Event`s, that are linked together. A
directed graph is a graph that where the links have directions, like
pointers or a parent and child. A directed acyclic graph is an acyclic
graph that has no loops (aka cycles).
```rust
// note: i'm using rust syntax very loosely to make this more approachable
struct Event {
event_id: EventId,
content: Any,
references: List<EventId>,
}
```
[todo: insert image of nodes linked together here]
Similarity to git, we can prevent loops by making event ids cryptographic
hashes. People can still create events pointing to very old parents,
but there's no way to make those parents point to new events.
```rust
type EventId = Sha256Hash;
fn get_event_id(event_without_id: EventWithoutId) -> Sha256Hash {
return sha256(serialize(event_without_id));
}
```
We now have a way to order events! Well, it's not a *total order*, but a
*partial order*. If two events have the same parents, they effectively
happen at the same time and forked the timeline. If two users set the
document name to the same thing, how to we resolve conflicts?
```rust
// the first event doesn't have any references
let createEvent = Event { content: { }, references: [], .. };
let name1Event = Event { content: { name: "foo" }, references: [createEvent.id], .. };
let name2Event = Event { content: { name: "bar" }, references: [createEvent.id], .. };
```
For now we can use a simple and blunt method of resolving state:
whichever event who's id bigger. Technically, we've made a way to
replicate data between peers without any conflicts. This is called a
conflict-free replicated data type, or [crdt](https://crdt.tech/). In
matrix terminology, a merkle dag of events is called a *room*.
## surely there's a better way
Ok, so maybe deciding who wins based on the event id isn't the best
idea. To improve this, we should first at least know who's sending
the events. Let's use public/private key pairs and verify events with
signatures.
```rust
struct Event {
event_id: EventId,
content: Any,
references: List<EventId>,
// new! you can use any pub/privkey system, like rsa or ed25519
sender: UserId,
signature: UserSignature,
}
```
Also, event contents have only been defined as "operations that
change state". Let's improve that a little by defining a few specific
operations. We'll have a create operation to define the root of a room,
an authorize operation to give specific users specific powers, and a
change room name operation.
```rust
struct Event {
// --- existing stuff hidden --
// new! the type of operation this event is
type: EventType,
// new! Lets give each room its own id as a quality of life change
room_id: RoomId,
}
enum EventType {
Create,
Authorize,
SetName,
}
// while we're here, lets define a basic authorization system
type AuthorizeContent = Map<UserId, Capability>;
enum Capability {
/// the user can't send any events
ReadOnly,
/// the user can only send SetName events
SetName,
/// the user can authorize more ReadOnly and SetName users
AddUsers,
/// the user can send any Authorize event
Admin,
}
```
When a room is created, it authorizes the creator to send SetName and
Authorize ops operations. Don't worry about how the authorize op is
sent for now. And with that setup out of the way, it's time to get into
state resolution.
todo: state resolution, full mesh federation
## making things go fast
todo: auth_events, partial state/fast joins, state events and message events, future matrix changes
## things matrix does
todo: the final matrix specific things that are useful to know, like redactions or soft failing
:::