Unfortunately we need to pull tracing-opentelemetry from git because
there hasn't been a release including the dependency bump on the other
opentelemetry crates.
`create_cf` already fails when the column family already exists, but
this gives us a much better error message; namely, it tells us what
column family name is at fault.
Also adds a utility for formatting an error message in addition to all
of its sources, i.e. errors that came before it that led to the current
error.
This helps us to provide better error messages to users by including
more information: both our own error message and all of the underlying
error messages, if any, instead of only one or the other.
`panic!()` (and things that invoke it, such as `expect` and `unwrap`)
produces terrible looking error messages and `std::process::exit()`
doesn't run destructors. Instead, we'll make a `try_main` that can
return a `Result` with a structured error type, but for now I'm going to
be lazy and just use `Box<dyn Error>`. Then, `main` will call it and
return the appropriate `ExitCode` value based on `try_main`'s `Result`.
This gives us the opportunity to produce good error messages and doesn't
violently terminate the program.
Previously, we were returning redundant member count updates or encrypted
device updates from the /sync endpoint in some cases. The extra member
count updates are spec-compliant, but unnecessary, while the extra
encrypted device updates violate the spec.
The refactor necessary to fix this bug is also necessary to support
filtering on state events in sync.
Details:
Joined room incremental sync needs to examine state events for four
purposes:
1. determining whether we need to return an update to room member counts
2. determining the set of left/joined devices for encrypted rooms
(returned in `device_lists`)
3. returning state events to the client (in `rooms.joined.*.state`)
4. tracking which member events we have sent to the client, so they can
be omitted on future requests when lazy-loading is enabled.
The state events that we need to examine for the first two cases is member
events in the delta between `since` and the end of `timeline`. For the
second two cases, we need the delta between `since` and the start of
`timeline`, plus contextual member events for any senders that occur in
`timeline`. The second list is subject to filtering, while the first is
not.
Before this change, we were using the same set of state events that we are
returning to the client (cases 3/4) to do the analysis for cases 1/2.
In a compliant implementation, this would result in us missing some
relevant member events in 1/2 in addition to seeing redundant member
events. In current grapevine this is not the case because the set of
events that we return to the client is always a superset of the set that
is needed for cases 1/2. This is because we don't support filtering, and
we have an existing bug[1] where we are returning the delta between
`since` and the end of `timeline` rather than the start.
[1]: https://gitlab.computer.surgery/matrix/grapevine-fork/-/issues/5
Fixing this is necessary to implement filtering because otherwise
we would start missing some member events for member count or encrypted
device updates if the relevant member events are rejected by the filter.
This would be much worse than our current behavior.
tokio::spawn is a span boundary, the spawned future has no parent span.
For short futures, we simply inherit the current span with
`.in_current_span()`.
For long running futures containing a sleeping infinite loop, we don't
actually want a span on the entire task or even the entire loop body,
both would result in very long spans. Instead, we put the outermost span
(created using #[tracing::instrument] or .instrument()) around the
actual work happening after the sleep, which results in a new root span
being created after every sleep.
Previously, `Content-Disposition` was always set to `inline`, even for
HTML, which means that XSS could be easily acheived by uploading
malicious HTML and getting someone to click on the Matrix HTTP API link
for that piece of media. Now, we have an allowlist of safe values for
`Content-Type` that use `inline` while everything else defaults to
`attachment`, including HTML and SVG, which prevents XSS.
We also set the `Content-Security-Policy` header because why not.
A `set_header_or_panic` function is introduced to do what it says in
case Ruma begins providing better or worse values for the relevant
headers in the future. The safest way to handle such a case is simply
to panic.