pages-server/server/upstream/upstream.go

319 lines
8.9 KiB
Go
Raw Normal View History

2021-12-05 13:47:33 +00:00
package upstream
import (
Implement static serving of compressed files (#387) This provides an option for #223 without fully resolving it. (I think.) Essentially, it acts very similar to the `gzip_static` and similar options for nginx, where it will check for the existence of pre-compressed files and serve those instead if the client allows it. I couldn't find a pre-existing way to actually parse the Accept-Encoding header properly (admittedly didn't look very hard) and just implemented one on my own that should be fine. This should hopefully not have the same DOS vulnerabilities as #302, since it relies on the existing caching system. Compressed versions of files will be cached just like any other files, and that includes cache for missing files as well. The compressed files will also be accessible directly, and this won't automatically decompress them. So, if you have a `tar.gz` file that you access directly, it will still be downloaded as the gzipped version, although you will now gain the option to download the `.tar` directly and decompress it in transit. (Which doesn't affect the server at all, just the client's way of interpreting it.) ---- One key thing this change also adds is a short-circuit when accessing directories: these always return 404 via the API, although they'd try the cache anyway and go through that route, which was kind of slow. Adding in the additional encodings, it's going to try for .gz, .br, and .zst files in the worst case as well, which feels wrong. So, instead, it just always falls back to the index-check behaviour if the path ends in a slash or is empty. (Which is implicitly just a slash.) ---- For testing, I set up this repo: https://codeberg.org/clarfonthey/testrepo I ended up realising that LFS wasn't supported by default with `just dev`, so, it ended up working until I made sure the files on the repo *didn't* use LFS. Assuming you've run `just dev`, you can go directly to this page in the browser here: https://clarfonthey.localhost.mock.directory:4430/testrepo/ And also you can try a few cURL commands: ```shell curl https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure curl -H 'Accept-Encoding: gz' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | gunzip - curl -H 'Accept-Encoding: br' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | brotli --decompress - curl -H 'Accept-Encoding: zst' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | zstd --decompress - ``` Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/387 Reviewed-by: Gusted <gusted@noreply.codeberg.org> Co-authored-by: ltdk <usr@ltdk.xyz> Co-committed-by: ltdk <usr@ltdk.xyz>
2024-09-29 21:00:54 +00:00
"cmp"
"errors"
"fmt"
2021-12-05 13:47:33 +00:00
"io"
"net/http"
Implement static serving of compressed files (#387) This provides an option for #223 without fully resolving it. (I think.) Essentially, it acts very similar to the `gzip_static` and similar options for nginx, where it will check for the existence of pre-compressed files and serve those instead if the client allows it. I couldn't find a pre-existing way to actually parse the Accept-Encoding header properly (admittedly didn't look very hard) and just implemented one on my own that should be fine. This should hopefully not have the same DOS vulnerabilities as #302, since it relies on the existing caching system. Compressed versions of files will be cached just like any other files, and that includes cache for missing files as well. The compressed files will also be accessible directly, and this won't automatically decompress them. So, if you have a `tar.gz` file that you access directly, it will still be downloaded as the gzipped version, although you will now gain the option to download the `.tar` directly and decompress it in transit. (Which doesn't affect the server at all, just the client's way of interpreting it.) ---- One key thing this change also adds is a short-circuit when accessing directories: these always return 404 via the API, although they'd try the cache anyway and go through that route, which was kind of slow. Adding in the additional encodings, it's going to try for .gz, .br, and .zst files in the worst case as well, which feels wrong. So, instead, it just always falls back to the index-check behaviour if the path ends in a slash or is empty. (Which is implicitly just a slash.) ---- For testing, I set up this repo: https://codeberg.org/clarfonthey/testrepo I ended up realising that LFS wasn't supported by default with `just dev`, so, it ended up working until I made sure the files on the repo *didn't* use LFS. Assuming you've run `just dev`, you can go directly to this page in the browser here: https://clarfonthey.localhost.mock.directory:4430/testrepo/ And also you can try a few cURL commands: ```shell curl https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure curl -H 'Accept-Encoding: gz' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | gunzip - curl -H 'Accept-Encoding: br' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | brotli --decompress - curl -H 'Accept-Encoding: zst' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | zstd --decompress - ``` Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/387 Reviewed-by: Gusted <gusted@noreply.codeberg.org> Co-authored-by: ltdk <usr@ltdk.xyz> Co-committed-by: ltdk <usr@ltdk.xyz>
2024-09-29 21:00:54 +00:00
"slices"
"strconv"
2021-12-05 13:47:33 +00:00
"strings"
"time"
"github.com/rs/zerolog/log"
2021-12-05 14:53:46 +00:00
"codeberg.org/codeberg/pages/html"
"codeberg.org/codeberg/pages/server/cache"
"codeberg.org/codeberg/pages/server/context"
"codeberg.org/codeberg/pages/server/gitea"
2021-12-05 13:47:33 +00:00
)
const (
headerLastModified = "Last-Modified"
headerIfModifiedSince = "If-Modified-Since"
Implement static serving of compressed files (#387) This provides an option for #223 without fully resolving it. (I think.) Essentially, it acts very similar to the `gzip_static` and similar options for nginx, where it will check for the existence of pre-compressed files and serve those instead if the client allows it. I couldn't find a pre-existing way to actually parse the Accept-Encoding header properly (admittedly didn't look very hard) and just implemented one on my own that should be fine. This should hopefully not have the same DOS vulnerabilities as #302, since it relies on the existing caching system. Compressed versions of files will be cached just like any other files, and that includes cache for missing files as well. The compressed files will also be accessible directly, and this won't automatically decompress them. So, if you have a `tar.gz` file that you access directly, it will still be downloaded as the gzipped version, although you will now gain the option to download the `.tar` directly and decompress it in transit. (Which doesn't affect the server at all, just the client's way of interpreting it.) ---- One key thing this change also adds is a short-circuit when accessing directories: these always return 404 via the API, although they'd try the cache anyway and go through that route, which was kind of slow. Adding in the additional encodings, it's going to try for .gz, .br, and .zst files in the worst case as well, which feels wrong. So, instead, it just always falls back to the index-check behaviour if the path ends in a slash or is empty. (Which is implicitly just a slash.) ---- For testing, I set up this repo: https://codeberg.org/clarfonthey/testrepo I ended up realising that LFS wasn't supported by default with `just dev`, so, it ended up working until I made sure the files on the repo *didn't* use LFS. Assuming you've run `just dev`, you can go directly to this page in the browser here: https://clarfonthey.localhost.mock.directory:4430/testrepo/ And also you can try a few cURL commands: ```shell curl https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure curl -H 'Accept-Encoding: gz' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | gunzip - curl -H 'Accept-Encoding: br' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | brotli --decompress - curl -H 'Accept-Encoding: zst' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | zstd --decompress - ``` Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/387 Reviewed-by: Gusted <gusted@noreply.codeberg.org> Co-authored-by: ltdk <usr@ltdk.xyz> Co-committed-by: ltdk <usr@ltdk.xyz>
2024-09-29 21:00:54 +00:00
headerAcceptEncoding = "Accept-Encoding"
headerContentEncoding = "Content-Encoding"
rawMime = "text/plain; charset=utf-8"
)
2021-12-05 13:47:33 +00:00
// upstreamIndexPages lists pages that may be considered as index pages for directories.
var upstreamIndexPages = []string{
"index.html",
}
// upstreamNotFoundPages lists pages that may be considered as custom 404 Not Found pages.
var upstreamNotFoundPages = []string{
"404.html",
}
2021-12-05 13:47:33 +00:00
// Options provides various options for the upstream request.
type Options struct {
TargetOwner string
TargetRepo string
TargetBranch string
TargetPath string
// Used for debugging purposes.
Host string
TryIndexPages bool
BranchTimestamp time.Time
2021-12-05 16:57:54 +00:00
// internal
appendTrailingSlash bool
redirectIfExists string
ServeRaw bool
2021-12-05 13:47:33 +00:00
}
Implement static serving of compressed files (#387) This provides an option for #223 without fully resolving it. (I think.) Essentially, it acts very similar to the `gzip_static` and similar options for nginx, where it will check for the existence of pre-compressed files and serve those instead if the client allows it. I couldn't find a pre-existing way to actually parse the Accept-Encoding header properly (admittedly didn't look very hard) and just implemented one on my own that should be fine. This should hopefully not have the same DOS vulnerabilities as #302, since it relies on the existing caching system. Compressed versions of files will be cached just like any other files, and that includes cache for missing files as well. The compressed files will also be accessible directly, and this won't automatically decompress them. So, if you have a `tar.gz` file that you access directly, it will still be downloaded as the gzipped version, although you will now gain the option to download the `.tar` directly and decompress it in transit. (Which doesn't affect the server at all, just the client's way of interpreting it.) ---- One key thing this change also adds is a short-circuit when accessing directories: these always return 404 via the API, although they'd try the cache anyway and go through that route, which was kind of slow. Adding in the additional encodings, it's going to try for .gz, .br, and .zst files in the worst case as well, which feels wrong. So, instead, it just always falls back to the index-check behaviour if the path ends in a slash or is empty. (Which is implicitly just a slash.) ---- For testing, I set up this repo: https://codeberg.org/clarfonthey/testrepo I ended up realising that LFS wasn't supported by default with `just dev`, so, it ended up working until I made sure the files on the repo *didn't* use LFS. Assuming you've run `just dev`, you can go directly to this page in the browser here: https://clarfonthey.localhost.mock.directory:4430/testrepo/ And also you can try a few cURL commands: ```shell curl https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure curl -H 'Accept-Encoding: gz' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | gunzip - curl -H 'Accept-Encoding: br' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | brotli --decompress - curl -H 'Accept-Encoding: zst' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | zstd --decompress - ``` Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/387 Reviewed-by: Gusted <gusted@noreply.codeberg.org> Co-authored-by: ltdk <usr@ltdk.xyz> Co-committed-by: ltdk <usr@ltdk.xyz>
2024-09-29 21:00:54 +00:00
// allowed encodings
var allowedEncodings = map[string]string{
"gzip": ".gz",
"br": ".br",
"zstd": ".zst",
"identity": "",
}
// parses Accept-Encoding header into a list of acceptable encodings
func AcceptEncodings(header string) []string {
log.Trace().Msgf("got accept-encoding: %s", header)
encodings := []string{}
globQuality := 0.0
qualities := make(map[string]float64)
for _, encoding := range strings.Split(header, ",") {
name, quality_str, has_quality := strings.Cut(encoding, ";q=")
quality := 1.0
if has_quality {
var err error
quality, err = strconv.ParseFloat(quality_str, 64)
if err != nil || quality < 0 {
continue
}
}
name = strings.TrimSpace(name)
if name == "*" {
globQuality = quality
} else {
_, allowed := allowedEncodings[name]
if allowed {
qualities[name] = quality
if quality > 0 {
encodings = append(encodings, name)
}
}
}
}
if globQuality > 0 {
for encoding := range allowedEncodings {
_, exists := qualities[encoding]
if !exists {
encodings = append(encodings, encoding)
qualities[encoding] = globQuality
}
}
} else {
_, exists := qualities["identity"]
if !exists {
encodings = append(encodings, "identity")
qualities["identity"] = -1
}
}
slices.SortStableFunc(encodings, func(x, y string) int {
// sort in reverse order; big quality comes first
return cmp.Compare(qualities[y], qualities[x])
})
log.Trace().Msgf("decided encoding order: %v", encodings)
return encodings
}
2021-12-05 13:47:33 +00:00
// Upstream requests a file from the Gitea API at GiteaRoot and writes it to the request context.
func (o *Options) Upstream(ctx *context.Context, giteaClient *gitea.Client, redirectsCache cache.ICache) bool {
log := log.With().Str("ReqId", ctx.ReqId).Strs("upstream", []string{o.TargetOwner, o.TargetRepo, o.TargetBranch, o.TargetPath}).Logger()
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 22:21:42 +00:00
log.Debug().Msg("Start")
if o.TargetOwner == "" || o.TargetRepo == "" {
html.ReturnErrorPage(ctx, "forge client: either repo owner or name info is missing", http.StatusBadRequest)
return true
}
2021-12-05 13:47:33 +00:00
// Check if the branch exists and when it was modified
if o.BranchTimestamp.IsZero() {
branchExist, err := o.GetBranchTimestamp(giteaClient)
// handle 404
if err != nil && errors.Is(err, gitea.ErrorNotFound) || !branchExist {
html.ReturnErrorPage(ctx,
fmt.Sprintf("branch <code>%q</code> for <code>%s/%s</code> not found", o.TargetBranch, o.TargetOwner, o.TargetRepo),
http.StatusNotFound)
return true
}
2021-12-05 13:47:33 +00:00
// handle unexpected errors
if err != nil {
html.ReturnErrorPage(ctx,
fmt.Sprintf("could not get timestamp of branch <code>%q</code>: '%v'", o.TargetBranch, err),
http.StatusFailedDependency)
2021-12-05 13:47:33 +00:00
return true
}
}
// Check if the browser has a cached version
if ctx.Response() != nil {
if ifModifiedSince, err := time.Parse(time.RFC1123, ctx.Response().Header.Get(headerIfModifiedSince)); err == nil {
if ifModifiedSince.After(o.BranchTimestamp) {
ctx.RespWriter.WriteHeader(http.StatusNotModified)
log.Trace().Msg("check response against last modified: valid")
return true
}
2021-12-05 13:47:33 +00:00
}
log.Trace().Msg("check response against last modified: outdated")
2021-12-05 13:47:33 +00:00
}
log.Debug().Msg("Preparing")
2021-12-05 13:47:33 +00:00
Implement static serving of compressed files (#387) This provides an option for #223 without fully resolving it. (I think.) Essentially, it acts very similar to the `gzip_static` and similar options for nginx, where it will check for the existence of pre-compressed files and serve those instead if the client allows it. I couldn't find a pre-existing way to actually parse the Accept-Encoding header properly (admittedly didn't look very hard) and just implemented one on my own that should be fine. This should hopefully not have the same DOS vulnerabilities as #302, since it relies on the existing caching system. Compressed versions of files will be cached just like any other files, and that includes cache for missing files as well. The compressed files will also be accessible directly, and this won't automatically decompress them. So, if you have a `tar.gz` file that you access directly, it will still be downloaded as the gzipped version, although you will now gain the option to download the `.tar` directly and decompress it in transit. (Which doesn't affect the server at all, just the client's way of interpreting it.) ---- One key thing this change also adds is a short-circuit when accessing directories: these always return 404 via the API, although they'd try the cache anyway and go through that route, which was kind of slow. Adding in the additional encodings, it's going to try for .gz, .br, and .zst files in the worst case as well, which feels wrong. So, instead, it just always falls back to the index-check behaviour if the path ends in a slash or is empty. (Which is implicitly just a slash.) ---- For testing, I set up this repo: https://codeberg.org/clarfonthey/testrepo I ended up realising that LFS wasn't supported by default with `just dev`, so, it ended up working until I made sure the files on the repo *didn't* use LFS. Assuming you've run `just dev`, you can go directly to this page in the browser here: https://clarfonthey.localhost.mock.directory:4430/testrepo/ And also you can try a few cURL commands: ```shell curl https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure curl -H 'Accept-Encoding: gz' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | gunzip - curl -H 'Accept-Encoding: br' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | brotli --decompress - curl -H 'Accept-Encoding: zst' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | zstd --decompress - ``` Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/387 Reviewed-by: Gusted <gusted@noreply.codeberg.org> Co-authored-by: ltdk <usr@ltdk.xyz> Co-committed-by: ltdk <usr@ltdk.xyz>
2024-09-29 21:00:54 +00:00
var reader io.ReadCloser
var header http.Header
var statusCode int
var err error
// pick first non-404 response for encoding, *only* if not root
if o.TargetPath == "" || strings.HasSuffix(o.TargetPath, "/") {
err = gitea.ErrorNotFound
} else {
for _, encoding := range AcceptEncodings(ctx.Req.Header.Get(headerAcceptEncoding)) {
log.Trace().Msgf("try %s encoding", encoding)
// add extension for encoding
path := o.TargetPath + allowedEncodings[encoding]
reader, header, statusCode, err = giteaClient.ServeRawContent(ctx, o.TargetOwner, o.TargetRepo, o.TargetBranch, path, true)
if statusCode == http.StatusNotFound {
Implement static serving of compressed files (#387) This provides an option for #223 without fully resolving it. (I think.) Essentially, it acts very similar to the `gzip_static` and similar options for nginx, where it will check for the existence of pre-compressed files and serve those instead if the client allows it. I couldn't find a pre-existing way to actually parse the Accept-Encoding header properly (admittedly didn't look very hard) and just implemented one on my own that should be fine. This should hopefully not have the same DOS vulnerabilities as #302, since it relies on the existing caching system. Compressed versions of files will be cached just like any other files, and that includes cache for missing files as well. The compressed files will also be accessible directly, and this won't automatically decompress them. So, if you have a `tar.gz` file that you access directly, it will still be downloaded as the gzipped version, although you will now gain the option to download the `.tar` directly and decompress it in transit. (Which doesn't affect the server at all, just the client's way of interpreting it.) ---- One key thing this change also adds is a short-circuit when accessing directories: these always return 404 via the API, although they'd try the cache anyway and go through that route, which was kind of slow. Adding in the additional encodings, it's going to try for .gz, .br, and .zst files in the worst case as well, which feels wrong. So, instead, it just always falls back to the index-check behaviour if the path ends in a slash or is empty. (Which is implicitly just a slash.) ---- For testing, I set up this repo: https://codeberg.org/clarfonthey/testrepo I ended up realising that LFS wasn't supported by default with `just dev`, so, it ended up working until I made sure the files on the repo *didn't* use LFS. Assuming you've run `just dev`, you can go directly to this page in the browser here: https://clarfonthey.localhost.mock.directory:4430/testrepo/ And also you can try a few cURL commands: ```shell curl https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure curl -H 'Accept-Encoding: gz' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | gunzip - curl -H 'Accept-Encoding: br' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | brotli --decompress - curl -H 'Accept-Encoding: zst' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | zstd --decompress - ``` Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/387 Reviewed-by: Gusted <gusted@noreply.codeberg.org> Co-authored-by: ltdk <usr@ltdk.xyz> Co-committed-by: ltdk <usr@ltdk.xyz>
2024-09-29 21:00:54 +00:00
continue
}
if err != nil {
break
}
Implement static serving of compressed files (#387) This provides an option for #223 without fully resolving it. (I think.) Essentially, it acts very similar to the `gzip_static` and similar options for nginx, where it will check for the existence of pre-compressed files and serve those instead if the client allows it. I couldn't find a pre-existing way to actually parse the Accept-Encoding header properly (admittedly didn't look very hard) and just implemented one on my own that should be fine. This should hopefully not have the same DOS vulnerabilities as #302, since it relies on the existing caching system. Compressed versions of files will be cached just like any other files, and that includes cache for missing files as well. The compressed files will also be accessible directly, and this won't automatically decompress them. So, if you have a `tar.gz` file that you access directly, it will still be downloaded as the gzipped version, although you will now gain the option to download the `.tar` directly and decompress it in transit. (Which doesn't affect the server at all, just the client's way of interpreting it.) ---- One key thing this change also adds is a short-circuit when accessing directories: these always return 404 via the API, although they'd try the cache anyway and go through that route, which was kind of slow. Adding in the additional encodings, it's going to try for .gz, .br, and .zst files in the worst case as well, which feels wrong. So, instead, it just always falls back to the index-check behaviour if the path ends in a slash or is empty. (Which is implicitly just a slash.) ---- For testing, I set up this repo: https://codeberg.org/clarfonthey/testrepo I ended up realising that LFS wasn't supported by default with `just dev`, so, it ended up working until I made sure the files on the repo *didn't* use LFS. Assuming you've run `just dev`, you can go directly to this page in the browser here: https://clarfonthey.localhost.mock.directory:4430/testrepo/ And also you can try a few cURL commands: ```shell curl https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure curl -H 'Accept-Encoding: gz' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | gunzip - curl -H 'Accept-Encoding: br' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | brotli --decompress - curl -H 'Accept-Encoding: zst' https://clarfonthey.localhost.mock.directory:4430/testrepo/ --verbose --insecure | zstd --decompress - ``` Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/387 Reviewed-by: Gusted <gusted@noreply.codeberg.org> Co-authored-by: ltdk <usr@ltdk.xyz> Co-committed-by: ltdk <usr@ltdk.xyz>
2024-09-29 21:00:54 +00:00
log.Debug().Msgf("using %s encoding", encoding)
if encoding != "identity" {
header.Set(headerContentEncoding, encoding)
}
break
}
if reader != nil {
defer reader.Close()
}
2021-12-05 13:47:33 +00:00
}
2022-11-07 22:09:41 +00:00
log.Debug().Msg("Aquisting")
2021-12-05 13:47:33 +00:00
// Handle not found error
if err != nil && errors.Is(err, gitea.ErrorNotFound) {
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 22:21:42 +00:00
log.Debug().Msg("Handling not found error")
// Get and match redirects
redirects := o.getRedirects(ctx, giteaClient, redirectsCache)
if o.matchRedirects(ctx, giteaClient, redirects, redirectsCache) {
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 22:21:42 +00:00
log.Trace().Msg("redirect")
return true
}
2021-12-05 16:57:54 +00:00
if o.TryIndexPages {
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 22:21:42 +00:00
log.Trace().Msg("try index page")
2021-12-05 16:57:54 +00:00
// copy the o struct & try if an index page exists
optionsForIndexPages := *o
2021-12-05 13:47:33 +00:00
optionsForIndexPages.TryIndexPages = false
2021-12-05 16:57:54 +00:00
optionsForIndexPages.appendTrailingSlash = true
2021-12-05 13:47:33 +00:00
for _, indexPage := range upstreamIndexPages {
optionsForIndexPages.TargetPath = strings.TrimSuffix(o.TargetPath, "/") + "/" + indexPage
if optionsForIndexPages.Upstream(ctx, giteaClient, redirectsCache) {
2021-12-05 13:47:33 +00:00
return true
}
}
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 22:21:42 +00:00
log.Trace().Msg("try html file with path name")
2021-12-05 13:47:33 +00:00
// compatibility fix for GitHub Pages (/example → /example.html)
2021-12-05 16:57:54 +00:00
optionsForIndexPages.appendTrailingSlash = false
optionsForIndexPages.redirectIfExists = strings.TrimSuffix(ctx.Path(), "/") + ".html"
optionsForIndexPages.TargetPath = o.TargetPath + ".html"
if optionsForIndexPages.Upstream(ctx, giteaClient, redirectsCache) {
2021-12-05 13:47:33 +00:00
return true
}
}
log.Debug().Msg("not found")
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 22:21:42 +00:00
ctx.StatusCode = http.StatusNotFound
if o.TryIndexPages {
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 22:21:42 +00:00
log.Trace().Msg("try not found page")
// copy the o struct & try if a not found page exists
optionsForNotFoundPages := *o
optionsForNotFoundPages.TryIndexPages = false
optionsForNotFoundPages.appendTrailingSlash = false
for _, notFoundPage := range upstreamNotFoundPages {
optionsForNotFoundPages.TargetPath = "/" + notFoundPage
if optionsForNotFoundPages.Upstream(ctx, giteaClient, redirectsCache) {
return true
}
}
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 22:21:42 +00:00
log.Trace().Msg("not found page missing")
}
2021-12-05 13:47:33 +00:00
return false
}
// handle unexpected client errors
if err != nil || reader == nil || statusCode != http.StatusOK {
log.Debug().Msg("Handling error")
var msg string
if err != nil {
msg = "forge client: returned unexpected error"
log.Error().Err(err).Msg(msg)
msg = fmt.Sprintf("%s: '%v'", msg, err)
}
if reader == nil {
msg = "forge client: returned no reader"
log.Error().Msg(msg)
}
if statusCode != http.StatusOK {
msg = fmt.Sprintf("forge client: couldn't fetch contents: <code>%d - %s</code>", statusCode, http.StatusText(statusCode))
log.Error().Msg(msg)
}
html.ReturnErrorPage(ctx, msg, http.StatusInternalServerError)
2021-12-05 13:47:33 +00:00
return true
}
// Append trailing slash if missing (for index files), and redirect to fix filenames in general
2021-12-05 16:57:54 +00:00
// o.appendTrailingSlash is only true when looking for index pages
if o.appendTrailingSlash && !strings.HasSuffix(ctx.Path(), "/") {
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 22:21:42 +00:00
log.Trace().Msg("append trailing slash and redirect")
ctx.Redirect(ctx.Path()+"/", http.StatusTemporaryRedirect)
2021-12-05 13:47:33 +00:00
return true
}
if strings.HasSuffix(ctx.Path(), "/index.html") && !o.ServeRaw {
FIX blank internal pages (#164) (#292) Hello 👋 since it affected my deployment of the pages server I started to look into the problem of the blank pages and think I found a solution for it: 1. There is no check if the file response is empty, neither in cache retrieval nor in writing of a cache. Also the provided method for checking for empty responses had a bug. 2. I identified the redirect response to be the issue here. There is a cache write with the full cache key (e. g. rawContent/user/repo|branch|route/index.html) happening in the handling of the redirect response. But the written body here is empty. In the triggered request from the redirect response the server then finds a cache item to the key and serves the empty body. A quick fix is the check for empty file responses mentioned in 1. 3. The decision to redirect the user comes quite far down in the upstream function. Before that happens a lot of stuff that may not be important since after the redirect response comes a new request anyway. Also, I suspect that this causes the caching problem because there is a request to the forge server and its error handling with some recursions happening before. I propose to move two of the redirects before "Preparing" 4. The recursion in the upstream function makes it difficult to understand what is actually happening. I added some more logging to have an easier time with that. 5. I changed the default behaviour to append a trailing slash to the path to true. In my tested scenarios it happened anyway. This way there is no recursion happening before the redirect. I am not developing in go frequently and rarely contribute to open source -> so feedback of all kind is appreciated closes #164 Reviewed-on: https://codeberg.org/Codeberg/pages-server/pulls/292 Reviewed-by: 6543 <6543@obermui.de> Reviewed-by: crapStone <codeberg@crapstone.dev> Co-authored-by: Hoernschen <julian.hoernschemeyer@mailbox.org> Co-committed-by: Hoernschen <julian.hoernschemeyer@mailbox.org>
2024-02-26 22:21:42 +00:00
log.Trace().Msg("remove index.html from path and redirect")
ctx.Redirect(strings.TrimSuffix(ctx.Path(), "index.html"), http.StatusTemporaryRedirect)
2021-12-05 13:47:33 +00:00
return true
}
2021-12-05 16:57:54 +00:00
if o.redirectIfExists != "" {
ctx.Redirect(o.redirectIfExists, http.StatusTemporaryRedirect)
2021-12-05 13:47:33 +00:00
return true
}
2022-11-07 22:09:41 +00:00
// Set ETag & MIME
o.setHeader(ctx, header)
2021-12-05 13:47:33 +00:00
log.Debug().Msg("Prepare response")
2021-12-05 13:47:33 +00:00
ctx.RespWriter.WriteHeader(ctx.StatusCode)
2021-12-05 13:47:33 +00:00
// Write the response body to the original request
if reader != nil {
_, err := io.Copy(ctx.RespWriter, reader)
if err != nil {
log.Error().Err(err).Msgf("Couldn't write body for %q", o.TargetPath)
html.ReturnErrorPage(ctx, "", http.StatusInternalServerError)
return true
2021-12-05 13:47:33 +00:00
}
}
2022-11-07 22:09:41 +00:00
log.Debug().Msg("Sending response")
2021-12-05 13:47:33 +00:00
return true
}