maintenence, refactor docs, use real issue tracker

This commit is contained in:
tezlm 2023-08-17 23:32:07 -07:00
parent d00c35bc70
commit f8eadc666d
Signed by: tezlm
GPG key ID: 649733FCD94AFBBA
21 changed files with 259 additions and 341 deletions

1
docs/README.md Normal file
View file

@ -0,0 +1 @@
there currently is almost no documentation here, more like random thoughts

88
docs/notes.md Normal file
View file

@ -0,0 +1,88 @@
# notes
Takes ideas from ipfs, perkeep, and matrix.
Theoretically, you only need 3 api endpoints for this to work.
- `POST /things` Upload a blob
- `GET /things/:hash` Get a blob by hash
- `GET /things` Enumerate blobs
You can upload media, but most of the time you'll upload small json
objects called "events". Events can relate to each other. You can specify
access control around event types and relations.
To read events, you upload a json query via the upload endpoint and use
it with the enumerate endpoint.
This is a fairly minimal example.
## index servers
Index servers take events/blobs and index them.
### core api
intentionally uses `/things/` instead of `/blobs/`
- `GET /things/:hash/blob` get a thing as a blob
- `GET /things/:hash` get a thing
- `POST /things` uploads a thing
there are also extensions
- accounts
- `POST /accounts`
- `GET /accounts/:id`
- `DELETE /accounts/:id`
- sessions
- `POST /sessions`
- `DELETE /sessions`
- `GET /sessions`
- `GET /sessions/:id`
- `DELETE /sessions/:id`
- `PATCH /sessions/:id`
- thumbnails
- `GET /things/:hash/thumbnail`
## queries
You can query events
```ts
interface Query {
ids: Array<string>,
types: Array<string>,
relations: Array<Array<string>>,
}
```
## access tokens
todo in the future, things will change a lot
- `1 << 0` get things
- `1 << 1` enumerate things
- `1 << 2` create things
- `1 << 3` remove things (x.redact event specifically)
- `1 << 4` manage shares
- `1 << 5` manage sessions
## features
Different servers can have different features. Here are the official
ones so far:
- `core`: supports the core api (`/things/...`)
- `aliases`: server name -> hash mappings
- `thumbnail`: generates small images/icons for x.file events
- `account`: users can manage accounts
- `session`: users can manage sessions
- `search`: has full text search
- `share`: deprecated
## searching
see <https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html>
deriving takes a lot of dependencies, which might not be desierable

36
docs/spec/blobs.md Normal file
View file

@ -0,0 +1,36 @@
# blob servers
Blob servers have only one goal: store blobs, somewhere. You can have a
blob server that stores on the local filesystem, s3, over sftp, or so
on. Blob servers are content-addressed.
## api contract
| method | path | description |
|--------|--------------|------------------------------------|
| POST | /blobs | upload a new blob |
| GET | /blobs/:hash | get a blob by hash |
| DELETE | /blobs/:hash | delete a blob by hash |
| GET | /blobs | get new blobs, or via long polling |
POST uploads raw bytes directly, GET blob returns the raw bytes, DELETE
returns nothing
GET /blobs enumerates blobs
```ts
// the query parameters (?limit=10&after=asdf)
interface EnumerateParameters {
// the maximum number of blobs to return
limit?: number,
// for paginating through blobs. is opaque.
after?: string,
// if specified, the server should wait up to `timeout` ms before returning (for long polling)
timeout?: number,
}
interface EnumerateResponse {
blobs: Array<Ref>,
after: string,
}
```

37
docs/spec/events.md Normal file
View file

@ -0,0 +1,37 @@
## core event types
Summary of core event types
- `x.actor`: creates an actor
- `x.file`: creates a file (also the only event able to reference blobs)
- `x.update`: removes another event
- `x.redact`: removes another event
- `x.acl`: access control
- `x.annotate`: TODO
- `x.annotate.local`: TODO
- `x.tag`: TODO
- `x.tag.local`: TODO
If an event type starts with `x.` and isn't listed here, it's
invalid.
## namespaces
- `x.`: core events
- `l.`: namespace for "stdlib", or mutually agreed upon event definitions
- `?.`: "local definitions". if `org.foo.bar.baz.forum` exists,
`?.comment` should be able to be used instead of a verbose
`org.foo.bar.baz.forum.comment`. (NOTE: a letter should be chosen
instead of `?.`)
- `tld.dns.name`: namespaced events, where everything else goes. if you
dont own a domain name make something up that hopefully nobody used.
## formats
```ts
interface ActorEvent {
type: "user" | "bot" | "node",
name: string,
// extraneous k/v pairs allowed
}
```

91
docs/spec/types.md Normal file
View file

@ -0,0 +1,91 @@
# types
all base64 is urlsafe and unpadded
## ref
AKA ItemRef in rust code, since `ref` is a reserved keyword. They are
unique identifiers based on hashes referring to a specific blob. They
contain a hash and digest.
When serialized, have the hash type, a hyphen, and a the digest in
hex. The only currently specified hash is `sha224`. Here is an example
ref: `sha224-20a65162a52771d0e2e2a2552bd55b13bf4953404f145e0e1800af78`
NOTE: in the future, i may switch to something like multihash, or at
least switch from hex to base64 for space savings.
## actor
Any entity that can create events. The 3 types are
- `User`: this actor sends events from a human
- `Bot`: this one sends events automatically
- `Node`: this is a server
### actor id
A ed25519 public key, encoded in base64 and prefixed
with a `%` sigil.
## event
The primary atomic piece of data.
```rust
struct Event {
/// the ref of the event blob (not serialized during hashing, signing)
id: Ref,
type: String,
/// event contents depends on type
content: any,
relations: Map<Ref, Relation>,
/// data the server derived from the event and/or its relations
derived: Derived,
sender: ActorId,
/// the actor's signatur of this event (not serialized during signing)
signature: Signature,
/// the timestamp at the event's creator (in miliseconds + utc)
/// also acts as a nonce
origin_ts: u64,
}
interface Relation {
/// this is the relation type used in permission checks
type: string,
/// an optional key for keyed relations (for aggregations, eg. reactions)
key?: string,
}
```
## acl
Acls can be specified on events. It determines what other events can relate to that event. It works recursively.
One event can have many users.
One user can have many roles.
One role can have many permissions.
```rust
// WARNING: `Set`s must be ordered for canonical json
type RoleId = String;
/// a permission pattern to allow sending
/// the tuple corresponds to (source event type, relation type, target event type)
/// any tuple field can have "*" for a catch all
type Permisison = (String, String, String);
/// a role that can be applied to a user
struct Role {
name: Option<String>,
description: Option<String>,
perms: Set<Permission>,
}
struct Acl {
pub roles: Map<RoleId, Role>,
pub users: Map<ActorId, Set<RoleId>>,
pub admins: Set<ActorId>,
}
```

View file

@ -1,47 +0,0 @@
# laundry list
- [ ] base blob system
- [x] upload/download blobs
- [x] enumerate blobs
- [ ] garbage collect redacted blobs
- [ ] process events
- [x] x.file
- [x] x.redact
- [-] x.actor
- [x] x.acl
- [x] x.tag.local
- [ ] x.tag
- [ ] x.annotate
- [ ] x.annotate.local
- [-] query events
- [x] by ref
- [x] by type
- [x] by tags
- [x] by relations
- [x] by sender
- [ ] decentralization
- [-] other servers can get events
- [ ] other servers can query relations
- [ ] other servers can watch for relations
- [-] works through dht
- [-] misc
- [x] files as blobs
- [x] file thumbnails
- [x] full text search
- [ ] website embed api?
- [-] ui [0]
- [x] fuse
- [x] image/file view
- [ ] link view (eg. bookmarks or rss feeds)
- [ ] forum
- [ ] i18n
- [ ] tests [1]
- [-] library (important)
- [ ] blobserver (hard)
- [ ] server
[0]: The current testing utilities don't count. I'm focusing on building
things that I personally need (and would use) for now.
[1]: For dev speed, i'm skipping tests for now unless I need to verify
something. This will probably come back to haunt me...

View file

@ -68,6 +68,10 @@ pub struct WipEvent {
pub origin_ts: u64,
}
// TODO: make x.tag and x.acl special annotations instead of special events
// note: maybe more things could be made annotations? eg x.update annotation?
// note: i'd then need to rework acls to work with annotations then
#[derive(Debug, Serialize, Deserialize, Clone, PartialEq, Eq)]
#[serde(tag = "type", content = "content")]
pub enum EventContent {

View file

@ -1,5 +1,7 @@
# ufh
a global graph database with access control
## how to run
1. `cargo run --bin store-fs`

View file

@ -57,12 +57,6 @@ use crate::state::db::{Database, Location};
const MAX_SIZE: u64 = 1024 * 1024;
// TODO: split out indexers a bit more quickly than anticipated: build
// times are getting long!
// TODO (future): maybe use a websocket instead of long polling?
// for receiving new events, would potentially be a lot more complicated though
type Relations = HashMap<ItemRef, (Event, RelInfo)>;
type RowId = u32;
@ -100,15 +94,8 @@ pub struct ServerState {
server_event: Event,
}
// TODO: replace sharing system with aliases
// eg: like how matrix has #alias:server.tld -> !roomid aliases
// note: sharing was previously to be able to let people anonymously view blobs, which isnt needed now
// note: do i use #alias:server.tld like matrix or something else?
// TODO (future): use generic database trait instead of `Sqlite`
// FIXME: return json for all errors (some endpoints return plain text currently)
#[tokio::main]
async fn main() -> Result<(), Error> {
let log_subscriber = tracing_subscriber::FmtSubscriber::builder()

View file

@ -10,9 +10,6 @@ use ufh::query::{Query, QueryRelation};
use crate::routes::things::thumbnail::ThumbnailSize;
use crate::Error;
// TODO: make a proper type for shares
// type Share = String;
// TODO: abstract the database
pub mod sqlite;

View file

@ -1,4 +0,0 @@
# spec
This is a loose collection of notes for now! A real spec will come once
I work out how I want to implement stuff.

View file

@ -1,274 +0,0 @@
# notes
Takes ideas from ipfs, perkeep, and matrix.
Theoretically, you only need 3 api endpoints for this to work.
- `POST /things` Upload a blob
- `GET /things/:hash` Get a blob by hash
- `GET /things` Enumerate blobs
You can upload media, but most of the time you'll upload small json
objects called "events". Events can relate to each other. You can specify
access control around event types and relations.
To read events, you upload a json query via the upload endpoint and use
it with the enumerate endpoint.
This is a fairly minimal example.
## stages
1. i have a content addressed blob store, blobs can be uploaded then retrieved from its hash (eg. put "hello world", later retrieve by hash)
2. i can store specific json objects called events (eg. a url event with a href, title, description)
3. events can relate to each other (eg. url eventswith relations to a rss feed event)
4. events can be queried based on thing types and relations (eg. query url events that relate to a rss feed event)
5. access control can be specified based on things and relations (eg. comment things that relate to this rss feed thing can only be made by certain people)
For implementation
1. i have a content addressed blob store
2. i can query events
3. i can query relations/structure
4. servers can synchronize with each other
5. i can specify access control on events
This project is working on impl stage 4/5, currently.
TODO: work out a proper api spec for blob servers and index servers
## blob servers
Blob servers have only one goal: store blobs, somewhere. You can have a
blob server that stores on the local filesystem, s3, over sftp, or so
on. Blob servers are content-addressed.
### rough api
| method | path | description |
|--------|--------------|------------------------------------|
| POST | /blobs | upload a new blob |
| GET | /blobs/:hash | get a blob by hash |
| DELETE | /blobs/:hash | delete a blob by hash |
| GET | /blobs | get new blobs, or via long polling |
POST uploads raw bytes directly, GET blob returns the raw bytes, DELETE
returns nothing
GET /blobs takes in a few query paramaters: limit, to limit the number of
blobs returned; after, to paginate blobs; and timeout, to long poll new
blobs. It returns a json object in the form `{ "blobs": Vec<String> }`.
## index servers
Index servers take events/blobs and index them.
### core api
intentionally uses `/things/` instead of `/blobs/`
- `GET /things/:hash/blob` get a thing as a blob
- `GET /things/:hash` get a thing
- `POST /things` uploads a thing
there are also extensions
- accounts
- `POST /accounts`
- `GET /accounts/:id`
- `DELETE /accounts/:id`
- sessions
- `POST /sessions`
- `DELETE /sessions`
- `GET /sessions`
- `GET /sessions/:id`
- `DELETE /sessions/:id`
- `PATCH /sessions/:id`
- thumbnails
- `GET /things/:hash/thumbnail`
## queries
You can query events
```ts
interface Query {
ids: Array<string>,
types: Array<string>,
relations: Array<Array<string>>,
}
```
## derived
The server can derive data for you
```ts
const ev = {
type: "x.file",
content: {
parts: ["ref:sha224-1234"],
},
derived: {
// tbd
"x.file": {
size: 1234,
type: "image/png",
width: 12,
height: 34,
}
},
};
```
It is excluded from hash calculations/signatures. Here are some example derives:
```ts
{
file: {
size: 1234,
name: "filename.ext",
type: "mime/type",
},
media: {
width: 1234,
height: 1234,
duration: 1234,
},
keys: {
reaction: {
"foo": 1,
"bar": 2,
"baz": 3,
},
}
}
```
```ts
// playing around with different possible event formats...
const userEvent = {
type: "x.user",
sender: "ed25519-pubkey",
signature: "ed25519-sig",
};
const roomEvent = {
type: "x.nexus",
content: {
type: "room", // or space, or forum, or torment (please don't)
},
sender: "ed25519-pubkey",
signature: "ed25519-sig",
};
const messageEvent = {
type: "l.chat.message",
content: {
body: "hello",
},
relations: {
"ref:sha224-1234abcd": { type: "prev_event" },
"ref:sha224-5678wxyz": { type: "reply" },
},
sender: "ed25519-pubkey",
signature: "ed25519-sig",
};
const fileEvent = {
type: "x.file",
content: {},
sender: "ed25519-pubkey",
signature: "ed25519-sig",
};
const aclEvent = {
type: "x.acl",
content: {
roles: {
"roleid": [
["l.chat.message", "room", "this"],
["l.chat.message", "reply", "l.chat.message"],
],
"roleid2": [
["l.chat.message", "room", "this"],
["l.chat.message", "reply", "l.chat.message"],
["x.file", "attach", "l.chat.message"],
],
"default": [],
},
admins: ["sha224-userid"],
inherit: "ref:sha224-asdf", // another x.acl event
},
sender: "ed25519-pubkey",
signature: "ed25519-sig",
};
```
## core events
Summary of core event types
- `x.user`: creates a user
- `x.file`: creates a file (also the only event able to reference blobs)
- `x.redact`: removes another event
- `x.acl`: TODO
- `x.annotate`: TODO
- `x.annotate.local`: TODO
- `x.tag`: TODO
## access tokens
todo in the future, things will change a lot
- `1 << 0` get things
- `1 << 1` enumerate things
- `1 << 2` create things
- `1 << 3` remove things (x.redact event specifically)
- `1 << 4` manage shares
- `1 << 5` manage sessions
## acls
- you specify an acl on an event
- it determines how events can relate to that event
- it determines how events which relate to that event can relate to each other
Example of an acl:
```js
{
roles: {
"send": [
["l.chat.message", "chat", "l.chat.room"],
],
"react": [
["l.chat.react", "chat", "l.chat.message"],
],
},
users: {
"%bar": ["send", "react"],
"%baz": ["send"],
},
admins: ["%foobar"],
}
```
## features
Different servers can have different features. Here are the official
ones so far:
- `core`: supports the core api (`/things/...`)
- `aliase`: server name -> hash mappings
- `thumbnail`: generates small images/icons for x.file events
- `account`: users can manage accounts
- `session`: users can manage sessions
- `search`: has full text search
- `share`: deprecated
## searching
see <https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html>
deriving takes a lot of dependencies, which might not be desierable