batch and sync

::: ::: section I'd like to introduce a new api architecture. I know someone has probably invented it before me and can't wait to find out, so I won't try to create a new name for it.

In this api architecture, you have a single querying endpoint and many batched action endpoints. At a high level, the client sends a QUERY message and receives continual SYNCs when any interesting data in the query updates. Think of this kind of like the elm architecture over the network.

For example, imagine the client dispatches a query for user preferences. Later the user settings are changed on a different (or the same) client. The first client automatically gets the new settings, because it has an active query for it.

on the client

How does this look like on the client side in practice? You may be tempted to use websockets, but use long polling instead. It isn't as inefficient as you may think, and has some benefits.

able to use quic/http3
automatic throttling on low bandwidth
can do single/one time queries easily, much easier to script
simpler to implement since you don't need to do the websocket authentication dance

This can be implemented with POST /sync[?after=position], where a new position is returned with each sync. You can also cache the current state and position to resume when the tab is loaded later. Maybe even put the api client in a SharedWorker so every tab shares one logical connection.

Feel free to use whatever you like though, this high level architecture can apply to most things. Though, don't be tempted to create multiple ways to receive data, instead having a single source of truth. Querying/syncing doesn't use requests/responses, and instead is fully asynchronous.

For querying, it can be as simple as getting a few keys and values to graphql or even a minature sql dialect if you need that much flexibility. Some queries don't need to be reactive, like searching.

Besides the single data receiving endpoint, have batching for all other endpoints where it makes sense (ie. don't allow bulk account registering).

on the server

As long as you have only one writer/reducer (but as many readers/syncers as you want), it's fairly simple. Give every record in your database that you want to sync a "last updated at" timestamp. The position you return to the client is the larges timestamp of all the items you're returning. For higher availability, use something like raft to choose a new reducer/writer if the main one fails.

If you must be Web Scale™ and have multiple reducers, timestamps don't quite cut it since servers can be desynced.

To keep readers up to date, consider use a pubsub mechanism. If you're using postgres, there's already one built in. If you need more scale, there's plenty of existing software - I advise against making your own since its one of those things that looks simple at first but gets more complex over time.

For ratelimiting things that only affect local user by requests per second, ratelimit things that affect other users by total batched items per second

issues

For small apps that don't need realtime data or complex querying, you probably don't need to bother with syncing: while simple to implement, it's still unneeded complexity. I do recommend trying to make apis batched whenever though, since it's pretty much free performance.

Sorting can be hard with reactivity, but is fine if you only ever answer queries once.

Pagination is annoying to implement, but still reasonable. Again, this is one of those things that this architecture doesn't handle well and rest is better at. I'd probably add extra GET endpoints to paginate.

conslusion

I'm not sure how well this scales in practice, but it seems nice from an initial inspection. Rest is a fairly leaky abstraction, since not every action happens on a single object. :::

3.9 KiB Raw Blame History

batch and sync

on the client

on the server

issues

conslusion

3.9 KiB

Raw Blame History