Extending Docker images with sidecar services
adding sidecar services to Docker images of third-party services
- 14 minsMany open-source services are distributed as Docker images, but sometimes you’ll want to extend the functionality slightly - whether it be adding your own endpoints, manipulating configuration of the service within the Docker image, or something along those lines.
In some cases, such as for manipulating configuration, most images will allow you to mount configuration within the container or use environment variables, so you can build a proper sidecar service to do whatever updates you want and restart the target container. The same goes for extending endpoints - a proper sidecar can serve you well. You can have one service manage the a large number of containers, which is what I did for a project I worked on at RTrade, Nexus.
There’s a significant convenience factor to keeping your service as a single container however - it’s far easier to distribute and easier to deploy, and if you are trying to extend an off-the-shelf service like Grafana that lives within a large, multi-service deployment like Sourcegraph, adding additional services becomes quite a pain. Heck, even adding an additional port is something that must have additional configuration propagated across an entire fleet of services across various deployment methods.
This article goes over the approach I took to achieve the following without significantly changing the public interface of our Grafana image:
- subscribe to core Sourcegraph configuration from another service
- apply changes to the Grafana instance through API calls or configuration changes
- report problems in the sidecar process
While I’ll generally refer to Grafana in this writeup, you can apply it to pretty much any service image out there. I also use Go here, but you can draw from the same concepts to leverage your language of choice as well.
⚠️ Update: Since the writing of this post, we have pivoted on the plan (sourcegraph#11452) and most of the work here no longer lives in our Grafana distribution, but is instead a part of our Prometheus distribution - see sourcegraph#11832 for the new implementation. You can explore the source code on Sourcegraph, and relevant documentation here.
Most of this article still applies though, but with Prometheus + Alertmanager instead of Grafana.
- Wrapping the sidecar and the service
- Implementing the wrapper
- Source code and pull requests
- About Sourcegraph
# Wrapping the sidecar and the service
In a nutshell, the primary change made to the Grafana image is an adjustment to the entrypoint script:
- exec "/run.sh" # run the Grafana image's default entrypoint
+ exec "/bin/grafana-wrapper" # run our sidecar program, implemented as a wrapper
I’ll go over the specifics of the wrapper in the next section, since I think it’ll help to understand how we’re extending the vanilla image. You’ll want to set up a Dockerfile that builds the program and copies it over to the final image, which should be based on the vanilla image:
FROM golang:latest AS builder
# ... build your sidecar
FROM grafana/grafana:latest AS final
# copy your compiled program from the builder into the final image
COPY --from=builder /go/bin/grafana-wrapper /bin/grafana-wrapper
ENTRYPOINT ["/entry.sh"]
The goal here is to start a wrapper program that will start up your sidecar and the actual service within the image you are trying to extend (grafana/grafana
in this case).
# Implementing the wrapper
Depending on what level of functionality you want to achieve, this program can be as simple as a server that makes API calls to the main service. For example:
sequenceDiagram
participant Sidecar
participant Service
note right of Service: the program<br />you are extending
activate Sidecar
Sidecar->>Service: cmd.Start
activate Service
loop stuff
Sidecar->>Service: Requests
Service->>Sidecar: Responses
end
Service->>Sidecar: cmd.Wait returns
deactivate Service
deactivate Sidecar
This can be achieved using the Go standard library’s os/exec
package to run the main image entrypoint, start up the sidecar, and simply wait for the entrypoint to exit.
import (
"errors"
"os"
"os/exec"
)
// newGrafanaRunCmd instantiates a new command to run grafana.
func newGrafanaRunCmd() *exec.Cmd {
cmd := exec.Command("/run.sh")
cmd.Env = os.Environ() // propagate env to grafana
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
return cmd
}
func main() {
grafanaErrs := make(chan error)
go func() {
grafanaErrs <- newGrafanaRunCmd().Run()
}()
go func() {
// your sidecar
}()
// wait for grafana to exit
err := <-grafanaErrs
if err != nil {
// propagate exit code outwards
var exitErr *exec.ExitError
if errors.As(err, &exitErr) {
os.Exit(exitErr.ProcessState.ExitCode())
}
os.Exit(1)
} else {
os.Exit(0)
}
}
# Adding endpoints
What if both your sidecar and the extended service expose endpoints over the network? Sure, you could simply have the sidecar listen on a separate port, but that would involve adding another port to expose on your container, and adds another point of configuration that dependents need to be aware of before they can connect to your service.
My solution to this is to keep the same container “interface” by having a reverse proxy listen on the exposed port, which would handle forwarding requests to either the main service or the sidecar.
graph TB
subgraph Container
R{Router}
Sidecar
Service
ReverseProxy
end
Dependent <-- $PORT --> R
R <-- sidecarHandler --> Sidecar
R <--> ReverseProxy
ReverseProxy <-- internalServicePort --> Service
Again, the Go standard library comes to the rescue with the net/http/httputil
package. We also use gorilla/mux
for routing in this example, but you can choose any routing library that serves your needs.
import (
"net/http/httputil"
"github.com/gorilla/mux"
)
func main() {
// ... as before
router := mux.NewRouter()
// route specific paths to your sidecar's endpoints
router.Prefix("/sidecar/api", sidecar.Handler())
// if a request doesn't route to the sidecar, route to your main service
router.PathPrefix("/").Handler(&httputil.ReverseProxy{
// the Director of a ReverseProxy handles transforming requests and
// sending them on to the correct location, in this case another port
// in this container (our service's internal port)
Director: func(req *http.Request) {
req.URL.Scheme = "http"
req.URL.Host = fmt.Sprintf(":%s", serviceInternalPort)
},
})
go func() {
// listen on our external port - the port that will be exposed by the
// container - to handle routing
err := http.ListenAndServe(fmt.Sprintf(":%s", exportPort), router)
if err != nil && !errors.Is(err, http.ErrServerClosed) {
os.Exit(1)
}
os.Exit(0)
}()
// ... as before
}
# Restarting the service
In my case, I eventually had to add restart capabilities, since some configuration changes required the service to be restarted.
Simply restarting the container was not an option, since it would complicate how the configuration would persist, and would cause us to lose the advantage of having a single self-isolated container that required no external care.
Fortunately, exec.Cmd
, once started, provides an *os.Process
that we can use to stop an existing process. I introduced a controller that would expose functions through which the sidecar can stop and start the main service:
type grafanaController struct {
mux sync.Mutex
proc *os.Process
}
Stopping is pretty straight-forward - if the service is running, proc
will be non-nill, and we can simply signal it to stop:
func (c *grafanaController) Stop() error {
c.mux.Lock()
defer c.mux.Unlock()
if c.proc != nil {
if err := c.proc.Signal(os.Interrupt); err != nil {
return fmt.Errorf("failed to stop Grafana instance: %w", err)
}
_, _ = c.proc.Wait() // this can error for a variety of irrelvant reasons
if err := c.proc.Release(); err != nil {
c.log.Warn("failed to release process", "error", err)
}
c.proc = nil
}
return nil
}
Notice how this is starting to look a bit gnarly:
- A failed
proc.Wait()
does not strictly indicate that the shutdown failed, but could also indicate that the process shut down immediately (beforeproc.Wait()
could run). However, it is still import to wait, since a signal does not indicate the service has stopped completely. - A failed
proc.Release()
does not strictly indicate a fatal error, so we log and continue as if nothing has happened.
Starting the service is even less appealing - we can’t just start the service on a goroutine and ignore it, since we want to be aware of and log errors. However, not every error should be fatal, and the line is blurry.
func (c *grafanaController) RunServer() error {
c.mux.Lock()
defer c.mux.Unlock()
// spin up grafana and track process
c.log.Debug("starting Grafana server")
cmd := newGrafanaRunCmd()
if err := cmd.Start(); err != nil {
return fmt.Errorf("failed to start Grafana: %w", err)
}
c.proc = cmd.Process
// capture results from grafana process
go func() {
// cmd.Wait output:
// * exits with status 0 => nil
// * command fails to run or stopped => *ExitErr
// * other IO error => error
if err := cmd.Wait(); err != nil {
var exitErr *exec.ExitError
if errors.As(err, &exitErr) {
exitCode := exitErr.ProcessState.ExitCode()
// unfortunately grafana exits with code 1 on sigint
if exitCode > 1 {
c.log.Crit("grafana exited with unexpected code", "exitcode", exitCode)
os.Exit(exitCode)
}
c.log.Info("grafana has stopped", "exitcode", exitCode)
return
}
c.log.Warn("error waiting for grafana to stop", "error", err)
}
}()
return nil
}
Just like errors in libraries, exit codes are often up to the jurisdiction of the developer, and in this case Grafana does not give us a useful indication of whether a process has stopped because of an intentional SIGINT
, or if a fatal error occurred causing it to exit (and indicating that we should exit our controller).
You can add some additional management (i.e. a thread-safe flag or channel to indicate that a shutdown has been triggered intentionally, and only exit on code 1 if this flag is not set), but the complexity of what is meant to be a simple wrapper will quickly ramp up.
At the time of writing I’m not sure that any additional handling is required for a reasonable experience, but I’ll be keeping an eye on how this code behaves.
Note that this also means we can no longer just block on the main program until the service exits, since it can exit (intentionally) at any time - instead, we must depend on an external SIGINT
to tell us when to stop:
import (
"os"
"os/signal"
)
func main() {
// ... mostly as before
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
<-c
if err := grafana.Stop(); err != nil {
log.Warn("failed to stop Grafana server", "error", err)
}
}
# Source code and pull requests
And that’s it for a rudimentary sidecar service that allows you to continue treating a service container as a completely isolated unit!
Some relevant pull requests implementing these features:
- sourcegraph#11427 - I ended up reverting this due to bugs in certain environments and adding it back in sourcegraph#11483, but both PRs include relevant discussions. These PRs implements a basic sidecar without start and restart capabilities.
- sourcegraph#11554 adds the ability for the sidecar to start and restart the main service.
Note that most of the above work has been superseded by a pivot to Prometheus (see the update at the start of this post). Following the pivot, a lot of other work was enabled by the addition of this sidecar:
- sourcegraph#12010 (implementation: sourcegraph#12491) proposed a mechanism for denoting ownership in our monitoring and routing alerts appropriately.
- sourcegraph#17602 demonstrated potential summary capabilities a sidecar can export.
- sourcegraph#17014 and sourcegraph#17034 adds timestamped links to relevant Grafana panels to alert messages.
# About Sourcegraph
Learn more about Sourcegraph here.