public/stateres.md
2024-05-22 04:03:59 -07:00

4.9 KiB

::: header

how matrix works

:::

::: section Matrix is a merkle-dag based operational transform crdt, primarily used for instant messaging. This article will try to break it down into easier to understand parts.

a crdt

Let's say you want to edit a document with your friends without a central server. How would you implement it?

struct Event {
  content: Any,
}

Let's say you send events containing the changes you're making to the document. Then, you send this event to your friends so they can apply it. But what happens if events arrive out of order, or your friends edit the same document you're editing? We need to know which event came first using ordering!

Could we use a timestamp? It would work, but clocks can get out of sync and can be set to different times. Luckily, having an exact real-world time isn't necessary. All you need to know is which events came first, or a causual order. What if there was a simple incrementing counter?

While we're at it, I might as well add an event_id field to know which event is which.

struct Event {
  event_id: EventId,
  content: Any,
  count: u64,
}

This is a bit better, but is still not great. Counters can be spoofed, which will be an issue in public documents, like chatrooms or forums. We can take a page out of git and implement a directed acyclic graph.

A graph is a set of nodes, like Events, that are linked together. A directed graph is a graph that where the links have directions, like pointers or a parent and child. A directed acyclic graph is an acyclic graph that has no loops (aka cycles).

// note: i'm using rust syntax very loosely to make this more approachable
struct Event {
  event_id: EventId,
  content: Any,
  references: List<EventId>,
}

[todo: insert image of nodes linked together here]

Similarity to git, we can prevent loops by making event ids cryptographic hashes. People can still create events pointing to very old parents, but there's no way to make those parents point to new events.

type EventId = Sha256Hash;

fn get_event_id(event_without_id: EventWithoutId) -> Sha256Hash {
  return sha256(serialize(event_without_id));
}

We now have a way to order events! Well, it's not a total order, but a partial order. If two events have the same parents, they effectively happen at the same time and forked the timeline. If two users set the document name to the same thing, how to we resolve conflicts?

// the first event doesn't have any references
let createEvent = Event { content: { }, references: [], .. };
let name1Event = Event { content: { name: "foo" }, references: [createEvent.id], .. };
let name2Event = Event { content: { name: "bar" }, references: [createEvent.id], .. };

For now we can use a simple and blunt method of resolving state: whichever event who's id bigger. Technically, we've made a way to replicate data between peers without any conflicts. This is called a conflict-free replicated data type, or crdt. In matrix terminology, a merkle dag of events is called a room.

surely there's a better way

Ok, so maybe deciding who wins based on the event id isn't the best idea. To improve this, we should first at least know who's sending the events. Let's use public/private key pairs and verify events with signatures.

struct Event {
  event_id: EventId,
  content: Any,
  references: List<EventId>,

  // new! you can use any pub/privkey system, like rsa or ed25519
  sender: UserId,
  signature: UserSignature,
}

Also, event contents have only been defined as "operations that change state". Let's improve that a little by defining a few specific operations. We'll have a create operation to define the root of a room, an authorize operation to give specific users specific powers, and a change room name operation.

struct Event {
  // --- existing stuff hidden --  
  // new! the type of operation this event is
  type: EventType,
  
  // new! Lets give each room its own id as a quality of life change
  room_id: RoomId,
}

enum EventType {
  Create,
  Authorize,
  SetName,
}

// while we're here, lets define a basic authorization system
type AuthorizeContent = Map<UserId, Capability>;

enum Capability {
  /// the user can't send any events
  ReadOnly,

  /// the user can only send SetName events
  SetName,
  
  /// the user can authorize more ReadOnly and SetName users
  AddUsers,
  
  /// the user can send any Authorize event
  Admin,
}

When a room is created, it authorizes the creator to send SetName and Authorize ops operations. Don't worry about how the authorize op is sent for now. And with that setup out of the way, it's time to get into state resolution.

todo: state resolution, full mesh federation

making things go fast

todo: auth_events, partial state/fast joins, state events and message events, future matrix changes

things matrix does

todo: the final matrix specific things that are useful to know, like redactions or soft failing :::