fix(force-command): serialize deploys with a per-container flock

busybox httpd forks a CGI child per HTTP request, so two near-simultaneous
deploy calls (an upstream proxy retrying a slow cold deploy, a browser
double-fire, an overlapping webhook) ran two `git reset/pull` in the same
repo at once and collided on .git/index.lock or a remote-tracking ref
("cannot lock ref ... is at X but expected Y"). Nothing in the chain
HTTP -> CGI -> wrapper -> deploy.sh serialized them.

Hold a non-blocking flock on fd 9 in the generated FORCE_COMMAND wrapper,
which is exec'd by BOTH the HTTP CGI and sshd ForceCommand. A second
concurrent request returns a friendly 200 and leaves the in-flight winner
alone, so an upstream proxy won't retry-storm and connections don't pile
up on a stuck build (busybox flock has no -w timeout). fd 9 stays open
across the exec, so the lock is held for the whole command and releases
when the process tree exits -- even on SIGKILL.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Fabian @ Blax Software 2026-06-16 10:02:18 +02:00
parent 964bb394db
commit 256a8c4571
1 changed files with 17 additions and 0 deletions

View File

@ -190,6 +190,23 @@ if [ -f /home/agent/.ssh/id_rsa ]; then
export GIT_SSH_COMMAND="ssh -o IdentityFile=/home/agent/.ssh/id_rsa -o UserKnownHostsFile=/home/agent/.ssh/known_hosts -o StrictHostKeyChecking=accept-new" export GIT_SSH_COMMAND="ssh -o IdentityFile=/home/agent/.ssh/id_rsa -o UserKnownHostsFile=/home/agent/.ssh/known_hosts -o StrictHostKeyChecking=accept-new"
fi fi
# Serialize invocations: only one FORCE_COMMAND runs at a time in this
# container. Two near-simultaneous deploy requests (an upstream proxy
# retrying a slow cold deploy, a browser double-fire, an overlapping
# webhook) otherwise spawn two `git reset/pull` runs that collide on
# .git/index.lock or a remote-tracking ref ("cannot lock ref"). busybox
# httpd forks a CGI child per request, so nothing upstream serializes them.
# Non-blocking: a second concurrent request returns a friendly 200 and
# leaves the in-flight winner alone, so an upstream proxy won't retry-storm
# and connections don't pile up on a stuck build (busybox flock has no -w).
# fd 9 stays open across the exec, so the lock is held for the whole command
# and releases when the process tree exits — even on SIGKILL.
exec 9>/tmp/bastion-force-command.lock
if ! flock -n 9; then
echo "==> A deploy is already running in this bastion — skipping this request."
exit 0
fi
# Forward args from the caller (CGI passes one optional --patch|--minor| # Forward args from the caller (CGI passes one optional --patch|--minor|
# --major arg; SSH ForceCommand passes none). The user's FORCE_COMMAND in # --major arg; SSH ForceCommand passes none). The user's FORCE_COMMAND in
# compose can reference "$@" to thread these through to deploy.sh. With # compose can reference "$@" to thread these through to deploy.sh. With