Commit graph

5 commits

Author SHA1 Message Date
crapStone
044c684a47 Fix compression (#405)
closes #404

Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/405
Co-authored-by: crapStone <me@crapstone.dev>
Co-committed-by: crapStone <me@crapstone.dev>
2024-11-25 12:21:55 +00:00
crapStone
85059aa46e cache pages (#403)
taken from https://codeberg.org/Codeberg/pages-server/pulls/301

Co-authored-by: Moritz Marquardt <git@momar.de>

Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/403
Co-authored-by: crapStone <me@crapstone.dev>
Co-committed-by: crapStone <me@crapstone.dev>
2024-11-21 23:26:30 +00:00
ltdk
5b120f0488 Implement static serving of compressed files (#387)
This provides an option for #223 without fully resolving it. (I think.)

Essentially, it acts very similar to the `gzip_static` and similar options for nginx, where it will check for the existence of pre-compressed files and serve those instead if the client allows it. I couldn't find a pre-existing way to actually parse the Accept-Encoding header properly (admittedly didn't look very hard) and just implemented one on my own that should be fine.

This should hopefully not have the same DOS vulnerabilities as #302, since it relies on the existing caching system. Compressed versions of files will be cached just like any other files, and that includes cache for missing files as well.

The compressed files will also be accessible directly, and this won't automatically decompress them. So, if you have a `tar.gz` file that you access directly, it will still be downloaded as the gzipped version, although you will now gain the option to download the `.tar` directly and decompress it in transit. (Which doesn't affect the server at all, just the client's way of interpreting it.)

----

One key thing this change also adds is a short-circuit when accessing directories: these always return 404 via the API, although they'd try the cache anyway and go through that route, which was kind of slow. Adding in the additional encodings, it's going to try for .gz, .br, and .zst files in the worst case as well, which feels wrong. So, instead, it just always falls back to the index-check behaviour if the path ends in a slash or is empty. (Which is implicitly just a slash.)

----

For testing, I set up this repo: https://codeberg.org/clarfonthey/testrepo

I ended up realising that LFS wasn't supported by default with `just dev`, so, it ended up working until I made sure the files on the repo *didn't* use LFS.

Assuming you've run `just dev`, you can go directly to this page in the browser here: https://clarfonthey.localhost.mock.directory:4430/testrepo/
And also you can try a few cURL commands:

```shell
curl https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure
curl -H 'Accept-Encoding: gz' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | gunzip -
curl -H 'Accept-Encoding: br' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | brotli --decompress -
curl -H 'Accept-Encoding: zst' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | zstd --decompress -
```

Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/387
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Co-authored-by: ltdk <usr@ltdk.xyz>
Co-committed-by: ltdk <usr@ltdk.xyz>
2024-09-29 21:00:54 +00:00
Peter Gerber
bc9111a05f Use correct timestamp format for Last-Modified header (#365)
HTTP uses GMT [1,2] rather than UTC as timezone for timestamps. However,
the Last-Modified header used UTC which confused at least wget.

Before, UTC was used:

$ wget --no-check-certificate -S --spider https://cb_pages_tests.localhost.mock.directory:4430/images/827679288a.jpg
...
  Last-Modified: Sun, 11 Sep 2022 08:37:42 UTC
...
Last-modified header invalid -- time-stamp ignored.
...

After, GMT is used:

$ wget --no-check-certificate -S --spider https://cb_pages_tests.localhost.mock.directory:4430/images/827679288a.jpg
...
  Last-Modified: Sun, 11 Sep 2022 08:37:42 GMT
...
(no last-modified-header-invalid warning)

[1]: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Last-Modified
[2]: https://www.rfc-editor.org/rfc/rfc9110#name-date-time-formats

Fixes #364

---

Whatt I noticed is that the If-Modified-Since header isn't accepted (neither with GMT nor with UTC):

```
$ wget --header "If-Modified-Since: Sun, 11 Sep 2022 08:37:42 GMT" --no-check-certificate -S --spider https://cb_pages_tests.localhost.mock.directory:4430/images/827679288a.jpg
Spider mode enabled. Check if remote file exists.
--2024-07-15 23:31:41--  https://cb_pages_tests.localhost.mock.directory:4430/images/827679288a.jpg
Resolving cb_pages_tests.localhost.mock.directory (cb_pages_tests.localhost.mock.directory)... 127.0.0.1
Connecting to cb_pages_tests.localhost.mock.directory (cb_pages_tests.localhost.mock.directory)|127.0.0.1|:4430... connected.
WARNING: The certificate of ‘cb_pages_tests.localhost.mock.directory’ is not trusted.
WARNING: The certificate of ‘cb_pages_tests.localhost.mock.directory’ doesn't have a known issuer.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Allow: GET, HEAD, OPTIONS
  Cache-Control: public, max-age=600
  Content-Length: 124635
  Content-Type: image/jpeg
  Etag: "073af1960852e2a4ef446202c7974768b9881814"
  Last-Modified: Sun, 11 Sep 2022 08:37:42 GMT
  Referrer-Policy: strict-origin-when-cross-origin
  Server: pages-server
  Strict-Transport-Security: max-age=63072000; includeSubdomains; preload
  Date: Mon, 15 Jul 2024 21:31:42 GMT
Length: 124635 (122K) [image/jpeg]
Remote file exists
```

I would have expected a 304 (Not Modified) rather than a 200 (OK). I assume this is simply not supported and on production 304 is returned by a caching proxy in front of pages-server.

Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/365
Reviewed-by: crapStone <codeberg@crapstone.dev>
Co-authored-by: Peter Gerber <peter@arbitrary.ch>
Co-committed-by: Peter Gerber <peter@arbitrary.ch>
2024-07-23 18:42:24 +00:00
6543
6c63b66ce4 Refactor split long functions (#135)
we have big functions that handle all stuff ... we should split this into smaler chuncks so we could test them seperate and make clear cuts in what happens where

Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/135
2022-11-12 20:43:44 +01:00