Jekyll2023-02-28T00:01:36+00:00https://bobheadxi.dev/feed.xmlrobert lin📰 Source for Robert's site, blog and portfolio - https://bobheadxi.dev/Investing in the development of the developer experience2022-10-10T00:00:00+00:002022-10-10T00:00:00+00:00https://bobheadxi.dev/investing-in-development-of-devx<p>At <a href="https://github.com/sourcegraph/sourcegraph">Sourcegraph</a> we have a developer tool called <a href="https://docs.sourcegraph.com/dev/background-information/sg"><code class="language-plaintext highlighter-rouge">sg</code></a>, which has become the way we ensure the development of tooling continues to scale at Sourcegrpah.
But why invest in ensuring contributions to your dev tooling scales?</p>
<p>Imagine you’re developing a sizable application spanning multiple services - say, a code search and code intelligence platform like Sourcegraph.
You’ll want to be able to spin up everything to some degree locally to help you experiment.</p>
<p>So you pick up an off-the-shelf tool like <a href="https://github.com/mattn/goreman"><code class="language-plaintext highlighter-rouge">goreman</code></a>, a Procfile runner we used to use - but this could be any tool, really, like docker-compose or something else.
A tool like this usually it takes a bit of configuration, but it works good enough to start off!</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>goreman <span class="nt">-f</span> dev/Procfile
</code></pre></div></div>
<p>Inevitably you add a few layers of configuration specific to your project for your tool of choice:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">SRC_LOG_LEVEL</span><span class="o">=</span><span class="k">${</span><span class="nv">SRC_LOG_LEVEL</span><span class="k">:-</span><span class="nv">info</span><span class="k">}</span>
<span class="nb">export </span><span class="nv">SRC_LOG_FORMAT</span><span class="o">=</span><span class="k">${</span><span class="nv">SRC_LOG_FORMAT</span><span class="k">:-</span><span class="nv">condensed</span><span class="k">}</span>
goreman <span class="nt">--set-ports</span><span class="o">=</span><span class="nb">false</span> <span class="nt">--exit-on-error</span> <span class="nt">-f</span> dev/Procfile
</code></pre></div></div>
<p>This ends up going in a script or Makefile, to encode this setup as the de-facto way of running things that you can share with your team.</p>
<p>Then you realise your tool doesn’t have hot-reloading, or some other feature, which you end up writing some automation for.</p>
<p><a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@a73993357386775c55d28cc2f3de69b1b6328b56/-/blob/dev/start.sh?L42">Your little start script</a> ends up with several hundred lines of configuration options, which you can only find out by reading it, and alongside that you have <a href="https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/sourcegraph/sourcegraph$@a739933+file:%5Edev/.*.sh&patternType=standard">dozens of scripts that do various dev-related tasks</a>:</p>
<ul>
<li>running adjacent services,</li>
<li>generating code,</li>
<li>or running linters,</li>
<li>or just parts of linters in particular ways in CI,</li>
<li>or combining scripts and configured them in mysterious ways…</li>
</ul>
<p>This eventually leads to a frustrating and brittle developer experience.</p>
<blockquote>
<p>It’s nearly impossible to find out which development tasks I can run. It’s really hard to run them standalone without knowing about some global state they depend on. It’s really hard to extend these, because who knows which global state might influence them or depend on their global state.</p>
<p>— Thorsten Ball, <a href="https://docs.google.com/document/d/18hrRIN0pUBRwUFF7vkcVmstJccqWeHiecNF2t1GAZfU/edit#heading=h.trqab8y0kufp">RFC 348: Lack of conventions</a></p>
</blockquote>
<p>It became hard to find out what tooling was available, how each script was configured, and how to extend them and add to them - hindering progress in our tooling.</p>
<p>That’s why we started <code class="language-plaintext highlighter-rouge">sg</code>, Sourcegraph’s developer tool, to become the centralised home for all development tasks.</p>
<p><code class="language-plaintext highlighter-rouge">sg</code> started as a single command to run Sourcegraph locally in March 2021 - today it features over 60 commands covering all sorts of functionality and utilities that you might need throughout your development lifecycle:</p>
<ul>
<li>dev environment setup</li>
<li>linters</li>
<li>RFC/ADR browser</li>
<li>migrations tooling</li>
<li>CI status checker, flakes investigation tooling, etc.</li>
<li>monitoring tooling</li>
<li>and more!</li>
</ul>
<p>The tool is built in Go, and thus has the usual good Go stuff - it’s self-contained and portable, so it’s easy to build self-updating for.
Installation is a simple one-liner, making <code class="language-plaintext highlighter-rouge">sg</code> very easy to distribute to teammates:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">--proto</span> <span class="s1">'=https'</span> <span class="nt">--tlsv1</span>.2 <span class="nt">-sSLf</span> https://install.sg.dev | sh
</code></pre></div></div>
<p>Introducing Go also enables more powerful, type-safe programming on top of just running commands - programming that is trickier to do in Bash, where you need to account for a more limited syntax and variants of unix commands and so on.</p>
<p>Using a CLI library with commands to represent tasks effectively encodes the available scripts in a powerful structured format, making documentation and configuration options easier to configure and access:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">dbCommand</span> <span class="o">=</span> <span class="o">&</span><span class="n">cli</span><span class="o">.</span><span class="n">Command</span><span class="p">{</span>
<span class="n">Name</span><span class="o">:</span> <span class="s">"db"</span><span class="p">,</span>
<span class="n">Usage</span><span class="o">:</span> <span class="s">"Interact with local Sourcegraph databases for development"</span><span class="p">,</span>
<span class="n">UsageText</span><span class="o">:</span> <span class="s">`
# Reset the Sourcegraph 'frontend' database
sg db reset-pg
# Reset the 'frontend' and 'codeintel' databases
sg db reset-pg -db=frontend,codeintel
# Reset all databases ('frontend', 'codeintel', 'codeinsights')
sg db reset-pg -db=all
# Reset the redis database
sg db reset-redis
# Create a site-admin user whose email and password are foo@sourcegraph.com and sourcegraph.
sg db add-user -name=foo
`</span><span class="p">,</span>
<span class="n">Category</span><span class="o">:</span> <span class="n">CategoryDev</span><span class="p">,</span>
<span class="n">Subcommands</span><span class="o">:</span> <span class="p">[]</span><span class="o">*</span><span class="n">cli</span><span class="o">.</span><span class="n">Command</span><span class="p">{</span>
<span class="p">{</span>
<span class="n">Name</span><span class="o">:</span> <span class="s">"reset-pg"</span><span class="p">,</span>
<span class="n">Usage</span><span class="o">:</span> <span class="s">"Drops, recreates and migrates the specified Sourcegraph database"</span><span class="p">,</span>
<span class="n">Description</span><span class="o">:</span> <span class="s">`If -db is not set, then the "frontend" database is used (what's set as PGDATABASE in env or the sg.config.yaml). If -db is set to "all" then all databases are reset and recreated.`</span><span class="p">,</span>
<span class="n">Flags</span><span class="o">:</span> <span class="p">[]</span><span class="n">cli</span><span class="o">.</span><span class="n">Flag</span><span class="p">{</span>
<span class="o">&</span><span class="n">cli</span><span class="o">.</span><span class="n">StringFlag</span><span class="p">{</span>
<span class="n">Name</span><span class="o">:</span> <span class="s">"db"</span><span class="p">,</span>
<span class="n">Value</span><span class="o">:</span> <span class="n">db</span><span class="o">.</span><span class="n">DefaultDatabase</span><span class="o">.</span><span class="n">Name</span><span class="p">,</span>
<span class="n">Usage</span><span class="o">:</span> <span class="s">"The target database instance."</span><span class="p">,</span>
<span class="n">Destination</span><span class="o">:</span> <span class="o">&</span><span class="n">dbDatabaseNameFlag</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="n">Action</span><span class="o">:</span> <span class="n">dbResetPGExec</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">}</span>
</code></pre></div></div>
<p>But to make this kind of tool effective, you need more than just converting scripts into a Go program.
In developing <code class="language-plaintext highlighter-rouge">sg</code>, I’ve noticed some patterns come up that I believe are crucial to its utility - tooling should:</p>
<ul>
<li><a href="#tooling-should-be-approachable">be approachable</a></li>
<li><a href="#tooling-should-work-with-your-tools">work with your tools</a></li>
<li><a href="#tooling-should-codify-standards">codify standards</a></li>
</ul>
<h2 id="tooling-should-be-approachable">Tooling should be approachable</h2>
<p>Firstly, tooling should be approachable, easy to learn and find out about, and easy to discover.
The goal is to abstract implementation details away behind a friendly, usable interface.</p>
<p>For example, with documentation, you might want to meet your users where they are, and provide options for learning -
whether it be through complete single-page references in the browser, or directly in the command line.</p>
<figure>
<img src="../../assets/images/posts/investing-in-devx/generated-docs.png" />
</figure>
<p>A structured CLI makes all this easy to generate from a single source of truth so that your documentation is available everywhere and always up-to-date.</p>
<p>Using the tool should be intuitive - to help with this, you can provide usability features like autocompletions, which in <code class="language-plaintext highlighter-rouge">sg</code> is configured for you during installation. This makes it easy to figure out what you can do on the fly!</p>
<figure>
<img src="../../assets/images/posts/investing-in-devx/autocomplete.gif" />
</figure>
<p>When developing new sg commands, adding custom completions is also easy for commands that have a fixed set of possible arguments:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">BashComplete</span><span class="o">:</span> <span class="n">cliutil</span><span class="o">.</span><span class="n">CompleteOptions</span><span class="p">(</span><span class="k">func</span><span class="p">()</span> <span class="p">(</span><span class="n">options</span> <span class="p">[]</span><span class="kt">string</span><span class="p">)</span> <span class="p">{</span>
<span class="n">config</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">getConfig</span><span class="p">()</span>
<span class="k">if</span> <span class="n">config</span> <span class="o">==</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">name</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">config</span><span class="o">.</span><span class="n">Commands</span> <span class="p">{</span>
<span class="n">options</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">options</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">return</span>
<span class="p">}),</span>
</code></pre></div></div>
<h2 id="tooling-should-work-with-your-tools">Tooling should work with your tools</h2>
<p>Secondly, tooling should interop and work with your tools - one of <code class="language-plaintext highlighter-rouge">sg</code>’s’ goals is specifically to not become a build system or container orchestrator, but to provide a uniform and programmable layer on top of them that is specific to Sourcegraph’s needs.</p>
<p>Take <code class="language-plaintext highlighter-rouge">sg start</code>, the command that replaced the <code class="language-plaintext highlighter-rouge">goreman</code> setup we talked about earlier, for example.
<code class="language-plaintext highlighter-rouge">sg start</code> just uses whatever tools each service normally uses to build, run, and update itself, and provides some additional features on top that is specific to how Sourcegraph works.
A service configuration might look like:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="na">oss-frontend</span><span class="pi">:</span>
<span class="na">cmd</span><span class="pi">:</span> <span class="s">.bin/oss-frontend</span>
<span class="na">install</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">if [ -n "$DELVE" ]; then</span>
<span class="s">export GCFLAGS='all=-N -l'</span>
<span class="s">fi</span>
<span class="s">go build -gcflags="$GCFLAGS" -o .bin/oss-frontend github.com/sourcegraph/sourcegraph/cmd/frontend</span>
<span class="na">checkBinary</span><span class="pi">:</span> <span class="s">.bin/oss-frontend</span>
<span class="na">env</span><span class="pi">:</span>
<span class="na">CONFIGURATION_MODE</span><span class="pi">:</span> <span class="s">server</span>
<span class="na">USE_ENHANCED_LANGUAGE_DETECTION</span><span class="pi">:</span> <span class="no">false</span>
<span class="c1"># frontend processes need this to be so that the paths to the assets are rendered correctly</span>
<span class="na">WEBPACK_DEV_SERVER</span><span class="pi">:</span> <span class="m">1</span>
<span class="na">watch</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">lib</span>
<span class="pi">-</span> <span class="s">internal</span>
<span class="pi">-</span> <span class="s">cmd/frontend</span>
</code></pre></div></div>
<p>You’re not constrained to using <code class="language-plaintext highlighter-rouge">sg start</code> - you can run all these steps yourself still with tools of your choice, but sg start combines everything for you into tidied up output, complete with configuration, colours, hot-reloading, and everything you might need to start experimenting with your new features!</p>
<figure>
<img src="../../assets/images/posts/investing-in-devx/sg-start.gif" />
</figure>
<h2 id="tooling-should-codify-standards">Tooling should codify standards</h2>
<p>Lastly, tooling should codify standards.
Automation and scripting encodes best practices that, when shared, builds on past learnings to provide a smooth experience for everyone.</p>
<p>Consider the typical process of setting up your development environment, we’ve all been there - a big page of things to install and set up in certain ways:</p>
<div class="language-md highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gu">### Prerequisites</span>
<span class="p">
-</span> Install <span class="sb">`A`</span>
<span class="p">-</span> Configure the thing
<span class="p">-</span> Install <span class="sb">`B`</span>
<span class="p">-</span> Install <span class="sb">`C`</span> (but not that version!)
</code></pre></div></div>
<p>Instead, at Sourcegraph we have <code class="language-plaintext highlighter-rouge">sg setup</code>, which automatically figures out what’s missing on your machine…</p>
<p><img src="../../assets/images/posts/investing-in-devx/sg-setup-check.png" alt="" /></p>
<p>…and <code class="language-plaintext highlighter-rouge">sg</code> will take the steps required to get you set up!</p>
<p><img src="../../assets/images/posts/investing-in-devx/sg-setup-fix.png" alt="" /></p>
<p>Programming this fixes enables us to standardise installations over time, automatically addressing issues teammates run into so that future teammates won’t have to.</p>
<p>For example, we can configure <code class="language-plaintext highlighter-rouge">PATH</code> for you, or make sure things are installed in the right place and configured in the appropriate manner - building on top of other tool managers like <a href="https://brew.sh/">Homebrew</a> and <a href="https://asdf-vm.com/">asdf</a> to provide a smooth experience.</p>
<h2 id="wrap-up">Wrap-up</h2>
<p>Enabling the development of good tooling, scripting, automation makes a difference.
There’s a lot that can be done to improve how tooling is developed and improved, like the ideas I’ve brought up in this post - we don’t have to settle for cryptic tooling everywhere!</p>
<p>If you’re interested in how all this is implemented, <a href="https://github.com/sourcegraph/sourcegraph/tree/main/dev/sg"><code class="language-plaintext highlighter-rouge">sg</code> is open source - come check us out on GitHub</a>!</p>
<p><em>Note - I had originally hoped to present this as a lightning talk at Gophercon Chicago 2022, but I was too late to queue up on the day of the presentations, so I figured might as well turn it into a post.</em></p>
<p><br /></p>
<h2 id="about-sourcegraph">About Sourcegraph</h2>
<p>Sourcegraph builds universal code search for every developer and company so they can innovate faster. We help developers and companies with billions of lines of code create the software you use every day.
Learn more about Sourcegraph <a href="https://about.sourcegraph.com/">here</a>.</p>
<p>Interested in joining? <a href="https://about.sourcegraph.com/jobs/">We’re hiring</a>!</p>robertAt Sourcegraph we have a developer tool called sg, which has become the way we ensure the development of tooling continues to scale at Sourcegrpah. But why invest in ensuring contributions to your dev tooling scales?Anatomy of a logger2022-05-21T00:00:00+00:002022-05-21T00:00:00+00:00https://bobheadxi.dev/anatomy-of-a-logger<p><a href="https://github.com/uber-go/zap">Zap</a> is a structured logging library from Uber that is built on top of a “reflection-free, zero-allocation JSON encoder” to achieve some very impressions performance comapred to other popular logging libraries for Go. As part of developing integrations for it at <a href="/content/_experience/2021-7-5-sourcegraph.md">Sourcegraph</a>, I thought I’d take the time to look at what goes on under the hood.</p>
<p>Logging seems like a simple thing that should be tangential to your application’s concerns - how complicated could writing some output be? Why bother making logging faster at all? The first item in <a href="https://github.com/uber-go/zap/blob/v1.21.0/FAQ.md">Zap’s FAQ</a> provides a brief explanation:</p>
<blockquote>
<p>Of course, most applications won’t notice the impact of a slow logger: they already take tens or hundreds of milliseconds for each operation, so an extra millisecond doesn’t matter.</p>
<p>On the other hand, why not make structured logging fast? […] Across a fleet of Go microservices, making each application even slightly more efficient adds up quickly.</p>
</blockquote>
<p>In my personal experience, I’ve seen logging cause some very real issues - <a href="http://localhost:4000/mirroring-github-permissions-at-scale/#debug-logging">a debug statement I left in a Sourcegraph service once caused a customer instance to stall completely</a>!</p>
<blockquote>
<p>Metrics indicated jobs were timing out, and a look at the logs revealed thousands upon thousands of lines of random comma-delimited numbers. It seemed that printing all this junk was causing the service to stall, and sure enough setting the log driver to none to disable all output on the relevant service allowed the sync to proceed and continue. […] At scale these entries could contain many thousands of entries, causing the system to degrade. Be careful what you log!</p>
</blockquote>
<p>At <a href="/content/_experience/2021-7-5-sourcegraph.md">Sourcegraph</a> we currently use the cheekily named <a href="https://github.com/inconshreveable/log15"><code class="language-plaintext highlighter-rouge">log15</code> logging library</a>. Of course, a faster logger likely would not have prevented the above scenario from occurring (though we are in the process of <a href="https://github.com/sourcegraph/sourcegraph/issues/33192">migrating to our new Zap-based logger</a>), but here’s a set of (very unscientific) profiles that compare a somewhat “average” scenario of logging 3 fields with 3 fields of existing context in JSON format to demonstrate just how different Zap and <code class="language-plaintext highlighter-rouge">log15</code> handles rendering a log entry behind the scenes:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">iters</span> <span class="o">=</span> <span class="m">100000</span>
<span class="k">var</span> <span class="p">(</span>
<span class="n">thing1</span> <span class="o">=</span> <span class="o">&</span><span class="n">thing</span><span class="p">{</span><span class="n">Field</span><span class="o">:</span> <span class="s">"field1"</span><span class="p">,</span> <span class="n">Date</span><span class="o">:</span> <span class="n">time</span><span class="o">.</span><span class="n">Now</span><span class="p">()}</span>
<span class="n">thing2</span> <span class="o">=</span> <span class="o">&</span><span class="n">thing</span><span class="p">{</span><span class="n">Field</span><span class="o">:</span> <span class="s">"field2"</span><span class="p">,</span> <span class="n">Date</span><span class="o">:</span> <span class="n">time</span><span class="o">.</span><span class="n">Now</span><span class="p">()}</span>
<span class="p">)</span>
<span class="k">func</span> <span class="n">profileZap</span><span class="p">(</span><span class="n">f</span> <span class="o">*</span><span class="n">os</span><span class="o">.</span><span class="n">File</span><span class="p">)</span> <span class="p">{</span>
<span class="c">// Create JSON format l with fields, normalised against log15 features</span>
<span class="n">cfg</span> <span class="o">:=</span> <span class="n">zap</span><span class="o">.</span><span class="n">NewProductionConfig</span><span class="p">()</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">Sampling</span> <span class="o">=</span> <span class="no">nil</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">DisableCaller</span> <span class="o">=</span> <span class="no">true</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">DisableStacktrace</span> <span class="o">=</span> <span class="no">true</span>
<span class="n">l</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">zap</span><span class="o">.</span><span class="n">NewProduction</span><span class="p">()</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">l</span><span class="o">.</span><span class="n">With</span><span class="p">(</span>
<span class="n">zap</span><span class="o">.</span><span class="n">String</span><span class="p">(</span><span class="s">"1"</span><span class="p">,</span> <span class="s">"foobar"</span><span class="p">),</span>
<span class="n">zap</span><span class="o">.</span><span class="n">Int</span><span class="p">(</span><span class="s">"2"</span><span class="p">,</span> <span class="m">123</span><span class="p">),</span>
<span class="n">zap</span><span class="o">.</span><span class="n">Any</span><span class="p">(</span><span class="s">"3"</span><span class="p">,</span> <span class="n">thing1</span><span class="p">),</span>
<span class="p">)</span>
<span class="c">// Start profile and log a lot</span>
<span class="n">pprof</span><span class="o">.</span><span class="n">StartCPUProfile</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">iters</span><span class="p">;</span> <span class="n">i</span> <span class="o">+=</span> <span class="m">1</span> <span class="p">{</span>
<span class="n">l</span><span class="o">.</span><span class="n">Info</span><span class="p">(</span><span class="s">"message"</span><span class="p">,</span>
<span class="n">zap</span><span class="o">.</span><span class="n">String</span><span class="p">(</span><span class="s">"4"</span><span class="p">,</span> <span class="s">"foobar"</span><span class="p">),</span>
<span class="n">zap</span><span class="o">.</span><span class="n">Int</span><span class="p">(</span><span class="s">"5"</span><span class="p">,</span> <span class="m">123</span><span class="p">),</span>
<span class="n">zap</span><span class="o">.</span><span class="n">Any</span><span class="p">(</span><span class="s">"6"</span><span class="p">,</span> <span class="n">thing2</span><span class="p">),</span>
<span class="p">)</span>
<span class="p">}</span>
<span class="n">l</span><span class="o">.</span><span class="n">Sync</span><span class="p">()</span>
<span class="n">pprof</span><span class="o">.</span><span class="n">StopCPUProfile</span><span class="p">()</span>
<span class="p">}</span>
<span class="k">func</span> <span class="n">profileLog15</span><span class="p">(</span><span class="n">f</span> <span class="o">*</span><span class="n">os</span><span class="o">.</span><span class="n">File</span><span class="p">)</span> <span class="p">{</span>
<span class="c">// Create JSON format l with fields</span>
<span class="n">l</span> <span class="o">:=</span> <span class="n">log15</span><span class="o">.</span><span class="n">New</span><span class="p">(</span>
<span class="s">"1"</span><span class="p">,</span> <span class="s">"foobar"</span><span class="p">,</span>
<span class="s">"2"</span><span class="p">,</span> <span class="m">123</span><span class="p">,</span>
<span class="s">"3"</span><span class="p">,</span> <span class="n">thing1</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">l</span><span class="o">.</span><span class="n">SetHandler</span><span class="p">(</span><span class="n">log15</span><span class="o">.</span><span class="n">StreamHandler</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">Stdout</span><span class="p">,</span> <span class="n">log15</span><span class="o">.</span><span class="n">JsonFormat</span><span class="p">()))</span>
<span class="c">// Start profile and log a lot</span>
<span class="n">pprof</span><span class="o">.</span><span class="n">StartCPUProfile</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">iters</span><span class="p">;</span> <span class="n">i</span> <span class="o">+=</span> <span class="m">1</span> <span class="p">{</span>
<span class="n">l</span><span class="o">.</span><span class="n">Info</span><span class="p">(</span><span class="s">"message"</span><span class="p">,</span>
<span class="s">"4"</span><span class="p">,</span> <span class="s">"foobar"</span><span class="p">,</span>
<span class="s">"5"</span><span class="p">,</span> <span class="m">123</span><span class="p">,</span>
<span class="s">"6"</span><span class="p">,</span> <span class="n">thing2</span><span class="p">,</span>
<span class="p">)</span>
<span class="p">}</span>
<span class="n">pprof</span><span class="o">.</span><span class="n">StopCPUProfile</span><span class="p">()</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The resulting call graphs, generated using <code class="language-plaintext highlighter-rouge">go tool pprof -prune_from=^os -png</code>, with <code class="language-plaintext highlighter-rouge">log15</code> on the left and Zap on the right:</p>
<figure>
<div>
<img src="/assets/images/posts/anatomy-logger/log15-to-os.png" alt="log15" style="max-height:48rem;width:auto !important" />
<img src="/assets/images/posts/anatomy-logger/zap-to-os.png" alt="zap" style="max-height:48rem;width:auto !important" />
</div>
<figcaption>
Profiles showing CPU time spent throughout log calls, up until it reaches package <code>os</code> code where work begins to write data to disk - <code>log15</code> is on the left, and <code>zap</code> is on the right. You might have to zoom in a bit.
<br /><br />
Check out the <a href="https://github.com/google/pprof/blob/master/doc/README.md#interpreting-the-callgraph">pprof documentation for intepreting the callgraph</a> to learn more.
</figcaption>
</figure>
<p>It is not immediately evident how the Zap logger is supposed to be better than the <code class="language-plaintext highlighter-rouge">log15</code> logger, since both finish running pretty quickly, have similar-looking call graphs, and ultimately have I/O as the major bottleneck (the big red <code class="language-plaintext highlighter-rouge">os.(*.File).write</code> blocks).
However, a closer look (like, <em>really</em> close - you gotta zoom all the way in!) reveals a key hint - both loggers spend enough time in JSON encoding stages for the profiler to pick up, but the details of their JSON encoding differs somewhat:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">log15</code> quickly delegates what appears to be the entire log entry to <code class="language-plaintext highlighter-rouge">json.Marshal</code>, which accounts for ~6ms.</li>
<li>Zap delegates fields to several different handlers: we see an <code class="language-plaintext highlighter-rouge">AddString</code> and <code class="language-plaintext highlighter-rouge">AddReflected</code>, where only the latter ends up in the <code class="language-plaintext highlighter-rouge">json</code> library, where it only accounts for ~2ms. Presumably, it is handling certain fields differently than others, where in some cases it skips encoding with the <code class="language-plaintext highlighter-rouge">json</code> library entirely!</li>
</ul>
<p>Zap’s documentation provides a brief explanation of why delegating to <code class="language-plaintext highlighter-rouge">json</code> is an issue:</p>
<blockquote>
<p>For applications that log in the hot path, reflection-based serialisation and string formatting are prohibitively expensive — they’re CPU-intensive and make many small allocations. Put differently, using <code class="language-plaintext highlighter-rouge">encoding/json</code> and <code class="language-plaintext highlighter-rouge">fmt.Fprintf</code> to log tons of <code class="language-plaintext highlighter-rouge">interface{}</code>s makes your application slow.</p>
</blockquote>
<p>As a more scientific approach to demonstrating the benefits of Zap’s implementation, here’s a snapshot of the <a href="https://github.com/uber-go/zap/tree/v1.21.0#performance">advertised benchmarks against some other popular libraries (as of v1.21.0)</a>, emphasis mine:</p>
<blockquote>
<p>Log a message and 10 fields:</p>
<table>
<thead>
<tr>
<th style="text-align: left">Package</th>
<th style="text-align: center">Time</th>
<th style="text-align: center">Time % to zap</th>
<th style="text-align: center">Objects Allocated</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">:zap: zap</td>
<td style="text-align: center">2900 ns/op</td>
<td style="text-align: center">+0%</td>
<td style="text-align: center">5 allocs/op</td>
</tr>
<tr>
<td style="text-align: left">:zap: zap (sugared)</td>
<td style="text-align: center">3475 ns/op</td>
<td style="text-align: center">+20%</td>
<td style="text-align: center">10 allocs/op</td>
</tr>
<tr>
<td style="text-align: left">zerolog</td>
<td style="text-align: center">10639 ns/op</td>
<td style="text-align: center">+267%</td>
<td style="text-align: center">32 allocs/op</td>
</tr>
<tr>
<td style="text-align: left">go-kit</td>
<td style="text-align: center">14434 ns/op</td>
<td style="text-align: center">+398%</td>
<td style="text-align: center">59 allocs/op</td>
</tr>
<tr>
<td style="text-align: left">logrus</td>
<td style="text-align: center">17104 ns/op</td>
<td style="text-align: center">+490%</td>
<td style="text-align: center">81 allocs/op</td>
</tr>
<tr>
<td style="text-align: left">apex/log</td>
<td style="text-align: center">32424 ns/op</td>
<td style="text-align: center">+1018%</td>
<td style="text-align: center">66 allocs/op</td>
</tr>
<tr>
<td style="text-align: left"><strong>log15</strong></td>
<td style="text-align: center"><strong>33579 ns/op</strong></td>
<td style="text-align: center"><strong>+1058%</strong></td>
<td style="text-align: center"><strong>76 allocs/op</strong></td>
</tr>
</tbody>
</table>
<p>Log a message with a logger that already has 10 fields of context:</p>
<table>
<thead>
<tr>
<th style="text-align: left">Package</th>
<th style="text-align: center">Time</th>
<th style="text-align: center">Time % to zap</th>
<th style="text-align: center">Objects Allocated</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">:zap: zap</td>
<td style="text-align: center">373 ns/op</td>
<td style="text-align: center">+0%</td>
<td style="text-align: center">0 allocs/op</td>
</tr>
<tr>
<td style="text-align: left">:zap: zap (sugared)</td>
<td style="text-align: center">452 ns/op</td>
<td style="text-align: center">+21%</td>
<td style="text-align: center">1 allocs/op</td>
</tr>
<tr>
<td style="text-align: left">zerolog</td>
<td style="text-align: center">288 ns/op</td>
<td style="text-align: center">-23%</td>
<td style="text-align: center">0 allocs/op</td>
</tr>
<tr>
<td style="text-align: left">go-kit</td>
<td style="text-align: center">11785 ns/op</td>
<td style="text-align: center">+3060%</td>
<td style="text-align: center">58 allocs/op</td>
</tr>
<tr>
<td style="text-align: left">logrus</td>
<td style="text-align: center">19629 ns/op</td>
<td style="text-align: center">+5162%</td>
<td style="text-align: center">70 allocs/op</td>
</tr>
<tr>
<td style="text-align: left"><strong>log15</strong></td>
<td style="text-align: center"><strong>21866 ns/op</strong></td>
<td style="text-align: center"><strong>+5762%</strong></td>
<td style="text-align: center"><strong>72 allocs/op</strong></td>
</tr>
<tr>
<td style="text-align: left">apex/log</td>
<td style="text-align: center">30890 ns/op</td>
<td style="text-align: center">+8182%</td>
<td style="text-align: center">55 allocs/op</td>
</tr>
</tbody>
</table>
</blockquote>
<p>In these scenarios, <code class="language-plaintext highlighter-rouge">log15</code> can be a whopping <strong>10 to 50 times slower</strong> - very cool! Evidently Zap’s approach has impressive results, and we know roughly what it <em>doesn’t</em> do to achieve this performance - but how does it work in practice?</p>
<h2 id="a-writer-for-log-entries">A writer for log entries</h2>
<p>The README suggests the following as the preferred way to create and start using a Zap logger, which is very similar to what I do when I attempted to profile logging calls earlier:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">zap</span><span class="o">.</span><span class="n">NewProduction</span><span class="p">()</span>
<span class="k">defer</span> <span class="n">logger</span><span class="o">.</span><span class="n">Sync</span><span class="p">()</span>
</code></pre></div></div>
<p>Internally, this takes <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/config.go?L115-133">a default, high-level configuration</a> and <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/config.go?L172-196">builds a logger from it</a> using the following components:</p>
<ul>
<li>a <code class="language-plaintext highlighter-rouge">zapcore.Core</code>, which is constructed from:
<ul>
<li>a <code class="language-plaintext highlighter-rouge">zapcore.Encoder</code></li>
<li>a <code class="language-plaintext highlighter-rouge">zapcore.WriteSyncer</code> (also referred to as a “sink”)</li>
</ul>
</li>
<li>a <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/config.go?L198-247">bunch of <code class="language-plaintext highlighter-rouge">Option</code>s</a></li>
</ul>
<p>For brevity, let’s forget about the <code class="language-plaintext highlighter-rouge">Option</code>s for now and focus on the first component: <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/zapcore/core.go?L23-45"><code class="language-plaintext highlighter-rouge">zapcore.Core</code></a>, which is described as the real logging interface beneath Zap, which exports the more traditional logging methods like <code class="language-plaintext highlighter-rouge">.Info()</code>, <code class="language-plaintext highlighter-rouge">.Warn()</code>, and so on - the equivalent of an <code class="language-plaintext highlighter-rouge">io.Writer</code> for structured logging instead of generic output.</p>
<p><code class="language-plaintext highlighter-rouge">zapcore.Core</code> splits the logging of a message, such as <code class="language-plaintext highlighter-rouge">.Info("message", fields...)</code>, into the following distinct steps:</p>
<ol>
<li><strong>Check</strong>: <code class="language-plaintext highlighter-rouge">Check(Entry, *CheckedEntry) *CheckedEntry</code> that determines if the message should be logged at all. This is where the traditional level filtering comes in (i.e. when you want to only log messages above a certain level, like discarding <code class="language-plaintext highlighter-rouge">.Debug()</code> messages), or discarding repeated messages through <a href="https://github.com/uber-go/zap/blob/master/FAQ.md#why-sample-application-logs">sampling</a>.
<ol>
<li>In this interface, we get a read-only <code class="language-plaintext highlighter-rouge">Entry</code> and a mutable <code class="language-plaintext highlighter-rouge">*CheckedEntry</code> that a core registers itself onto if it decides the given <code class="language-plaintext highlighter-rouge">Entry</code> should be logged.</li>
</ol>
</li>
<li><strong>Write</strong>: <code class="language-plaintext highlighter-rouge">Write(Entry, []Field) error</code>, where the rendering of a log entry into the destination occurs.</li>
</ol>
<p>In addition, we have distinct steps for:</p>
<ol>
<li><strong>Adding fields to the logger</strong> (as opposed to just a specific entry): <code class="language-plaintext highlighter-rouge">With([]Field) Core</code> - this allows <code class="language-plaintext highlighter-rouge">Core</code> implementations render fields once and not repeat work for subsequent log entries. We’ll get to how this works later!
<ol>
<li>It’s not noted on the interface documentation, but because of the above, the fields provided to <code class="language-plaintext highlighter-rouge">With()</code> are <strong>not</strong> provided to <code class="language-plaintext highlighter-rouge">Write()</code>.</li>
</ol>
</li>
<li><strong>Flushing output</strong>: <code class="language-plaintext highlighter-rouge">Sync() error</code> allows for buffering output and batching writes together, minimising instances of being bottlenecked by I/O, or allowing <code class="language-plaintext highlighter-rouge">Core</code> implementations to handle logs in an asynchronous manner.</li>
</ol>
<p>We can see this in action in the default <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/logger.go?L33-40"><code class="language-plaintext highlighter-rouge">*zap.Logger</code></a> implementation. Let’s check out the seemingly innocuous <code class="language-plaintext highlighter-rouge">.Info()</code> function:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">log</span> <span class="o">*</span><span class="n">Logger</span><span class="p">)</span> <span class="n">Info</span><span class="p">(</span><span class="n">msg</span> <span class="kt">string</span><span class="p">,</span> <span class="n">fields</span> <span class="o">...</span><span class="n">Field</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="n">ce</span> <span class="o">:=</span> <span class="n">log</span><span class="o">.</span><span class="n">check</span><span class="p">(</span><span class="n">InfoLevel</span><span class="p">,</span> <span class="n">msg</span><span class="p">);</span> <span class="n">ce</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="n">ce</span><span class="o">.</span><span class="n">Write</span><span class="p">(</span><span class="n">fields</span><span class="o">...</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="check">Check</h3>
<p>First up we have <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/logger.go?L261"><code class="language-plaintext highlighter-rouge">log.check</code></a>, a whopping 102-line function that implements the <strong>check</strong> step of writing a log entry, which constructs an <code class="language-plaintext highlighter-rouge">zapcore.Entry</code> and calls the <code class="language-plaintext highlighter-rouge">core.Check</code> function:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">log</span> <span class="o">*</span><span class="n">Logger</span><span class="p">)</span> <span class="n">check</span><span class="p">(</span><span class="n">lvl</span> <span class="n">zapcore</span><span class="o">.</span><span class="n">Level</span><span class="p">,</span> <span class="n">msg</span> <span class="kt">string</span><span class="p">)</span> <span class="o">*</span><span class="n">zapcore</span><span class="o">.</span><span class="n">CheckedEntry</span> <span class="p">{</span>
<span class="c">// ... omitted for brevity</span>
<span class="c">// Create basic checked entry thru the core; this will be non-nil if the</span>
<span class="c">// log message will actually be written somewhere.</span>
<span class="n">ent</span> <span class="o">:=</span> <span class="n">zapcore</span><span class="o">.</span><span class="n">Entry</span><span class="p">{</span>
<span class="n">LoggerName</span><span class="o">:</span> <span class="n">log</span><span class="o">.</span><span class="n">name</span><span class="p">,</span>
<span class="n">Time</span><span class="o">:</span> <span class="n">log</span><span class="o">.</span><span class="n">clock</span><span class="o">.</span><span class="n">Now</span><span class="p">(),</span>
<span class="n">Level</span><span class="o">:</span> <span class="n">lvl</span><span class="p">,</span>
<span class="n">Message</span><span class="o">:</span> <span class="n">msg</span><span class="p">,</span>
<span class="p">}</span>
<span class="n">ce</span> <span class="o">:=</span> <span class="n">log</span><span class="o">.</span><span class="n">core</span><span class="o">.</span><span class="n">Check</span><span class="p">(</span><span class="n">ent</span><span class="p">,</span> <span class="no">nil</span><span class="p">)</span>
<span class="c">// ...</span>
<span class="k">return</span> <span class="n">ce</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Note that <code class="language-plaintext highlighter-rouge">log.core.Check(ent, nil)</code> is pretty elaborate here - we noted previously that in this function, <code class="language-plaintext highlighter-rouge">Core</code> implementations should register themselves on the second argument <code class="language-plaintext highlighter-rouge">CheckedEntry</code>. How does that work if the <code class="language-plaintext highlighter-rouge">CheckedEntry</code> argument is a <code class="language-plaintext highlighter-rouge">nil</code> pointer? Taking a look at <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/zapcore/entry.go?L179:2"><code class="language-plaintext highlighter-rouge">CheckedEntry.Write()</code></a>, we can see the first hints of some pretty aggressive optimization:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// AddCore adds a Core that has agreed to log this CheckedEntry. It's intended to be</span>
<span class="c">// used by Core.Check implementations, and is safe to call on nil CheckedEntry</span>
<span class="c">// references.</span>
<span class="k">func</span> <span class="p">(</span><span class="n">ce</span> <span class="o">*</span><span class="n">CheckedEntry</span><span class="p">)</span> <span class="n">AddCore</span><span class="p">(</span><span class="n">ent</span> <span class="n">Entry</span><span class="p">,</span> <span class="n">core</span> <span class="n">Core</span><span class="p">)</span> <span class="o">*</span><span class="n">CheckedEntry</span> <span class="p">{</span>
<span class="k">if</span> <span class="n">ce</span> <span class="o">==</span> <span class="no">nil</span> <span class="p">{</span>
<span class="n">ce</span> <span class="o">=</span> <span class="n">getCheckedEntry</span><span class="p">()</span>
<span class="n">ce</span><span class="o">.</span><span class="n">Entry</span> <span class="o">=</span> <span class="n">ent</span>
<span class="p">}</span>
<span class="n">ce</span><span class="o">.</span><span class="n">cores</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">ce</span><span class="o">.</span><span class="n">cores</span><span class="p">,</span> <span class="n">core</span><span class="p">)</span>
<span class="k">return</span> <span class="n">ce</span>
<span class="p">}</span>
<span class="k">var</span> <span class="n">_cePool</span> <span class="o">=</span> <span class="n">sync</span><span class="o">.</span><span class="n">Pool</span><span class="p">{</span><span class="n">New</span><span class="o">:</span> <span class="k">func</span><span class="p">()</span> <span class="k">interface</span><span class="p">{}</span> <span class="p">{</span>
<span class="c">// Pre-allocate some space for cores.</span>
<span class="k">return</span> <span class="o">&</span><span class="n">CheckedEntry</span><span class="p">{</span>
<span class="n">cores</span><span class="o">:</span> <span class="nb">make</span><span class="p">([]</span><span class="n">Core</span><span class="p">,</span> <span class="m">4</span><span class="p">),</span>
<span class="p">}</span>
<span class="p">}}</span>
<span class="k">func</span> <span class="n">getCheckedEntry</span><span class="p">()</span> <span class="o">*</span><span class="n">CheckedEntry</span> <span class="p">{</span>
<span class="n">ce</span> <span class="o">:=</span> <span class="n">_cePool</span><span class="o">.</span><span class="n">Get</span><span class="p">()</span><span class="o">.</span><span class="p">(</span><span class="o">*</span><span class="n">CheckedEntry</span><span class="p">)</span>
<span class="n">ce</span><span class="o">.</span><span class="n">reset</span><span class="p">()</span>
<span class="k">return</span> <span class="n">ce</span>
<span class="p">}</span>
</code></pre></div></div>
<p>In short, <code class="language-plaintext highlighter-rouge">CheckedEntry</code> instances are created <em>or reused</em> on demand (this way, if no cores register themselves to write an <code class="language-plaintext highlighter-rouge">Entry</code>, no <code class="language-plaintext highlighter-rouge">CheckedEntry</code> is ever created) from a global <a href="https://pkg.go.dev/sync#Pool"><code class="language-plaintext highlighter-rouge">sync.Pool</code></a>:</p>
<blockquote>
<p>A Pool is a set of temporary objects that may be individually saved and retrieved […] Pool’s purpose is to cache allocated but unused items for later reuse, relieving pressure on the garbage collector. […] Pool provides a way to amortise allocation overhead across many clients.</p>
</blockquote>
<p>If many logs entries are written in a short time, allocated memory can be recycled by <code class="language-plaintext highlighter-rouge">Pool</code>, which is faster than having the Go runtime always allocate new memory and garbage-collecting unused <code class="language-plaintext highlighter-rouge">CheckedEntry</code> instances.</p>
<h3 id="write">Write</h3>
<p>Then we move on to the <strong>write</strong> step, done in <code class="language-plaintext highlighter-rouge">ce.Write</code>. This is the <code class="language-plaintext highlighter-rouge">*zapcore.CheckedEntry</code> we mentioned before performing a write on all registered cores:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">ce</span> <span class="o">*</span><span class="n">CheckedEntry</span><span class="p">)</span> <span class="n">Write</span><span class="p">(</span><span class="n">fields</span> <span class="o">...</span><span class="n">Field</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="n">ce</span> <span class="o">==</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span>
<span class="p">}</span>
<span class="c">// ... omitted for brevity</span>
<span class="k">var</span> <span class="n">err</span> <span class="kt">error</span>
<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">ce</span><span class="o">.</span><span class="n">cores</span> <span class="p">{</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">multierr</span><span class="o">.</span><span class="n">Append</span><span class="p">(</span><span class="n">err</span><span class="p">,</span> <span class="n">ce</span><span class="o">.</span><span class="n">cores</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">Write</span><span class="p">(</span><span class="n">ce</span><span class="o">.</span><span class="n">Entry</span><span class="p">,</span> <span class="n">fields</span><span class="p">))</span>
<span class="p">}</span>
<span class="c">// ...</span>
<span class="n">putCheckedEntry</span><span class="p">(</span><span class="n">ce</span><span class="p">)</span>
<span class="c">// ...</span>
<span class="p">}</span>
<span class="k">func</span> <span class="n">putCheckedEntry</span><span class="p">(</span><span class="n">ce</span> <span class="o">*</span><span class="n">CheckedEntry</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="n">ce</span> <span class="o">==</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span>
<span class="p">}</span>
<span class="n">_cePool</span><span class="o">.</span><span class="n">Put</span><span class="p">(</span><span class="n">ce</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Note the call to <code class="language-plaintext highlighter-rouge">putCheckedEntry</code> - after the entry has been written, it is no longer needed, and this call places the entry into the entry for reuse. Nifty!</p>
<p>Sent into <code class="language-plaintext highlighter-rouge">Write</code> is still an <code class="language-plaintext highlighter-rouge">Entry</code> and <code class="language-plaintext highlighter-rouge">Field</code>s, however - we’ve yet to see how our message ends up as text, which is where the performance gains are supposed to be.</p>
<h2 id="encoding-and-writing-output">Encoding and writing output</h2>
<p>Looking back, we have two components that are used to create a <code class="language-plaintext highlighter-rouge">Core</code> earlier on: <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/zapcore/encoder.go?L429-448#tab=references"><code class="language-plaintext highlighter-rouge">zapcore.Encoder</code></a> and <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/zapcore/write_syncer.go?L32:6"><code class="language-plaintext highlighter-rouge">zapcore.WriteSyncer</code></a>.</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">log</span> <span class="o">:=</span> <span class="n">New</span><span class="p">(</span>
<span class="n">zapcore</span><span class="o">.</span><span class="n">NewCore</span><span class="p">(</span><span class="n">enc</span><span class="p">,</span> <span class="n">sink</span><span class="p">,</span> <span class="n">cfg</span><span class="o">.</span><span class="n">Level</span><span class="p">),</span>
<span class="n">cfg</span><span class="o">.</span><span class="n">buildOptions</span><span class="p">(</span><span class="n">errSink</span><span class="p">)</span><span class="o">...</span><span class="p">,</span>
<span class="p">)</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">Encoder</code> exports a function, <code class="language-plaintext highlighter-rouge">EncodeEntry</code>, that seems to mirror the signature of <code class="language-plaintext highlighter-rouge">Core.Write</code>, and also embeds the <code class="language-plaintext highlighter-rouge">ObjectEncoder</code> interface:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Encoder is a format-agnostic interface for all log entry marshalers. Since</span>
<span class="c">// log encoders don't need to support the same wide range of use cases as</span>
<span class="c">// general-purpose marshalers, it's possible to make them faster and</span>
<span class="c">// lower-allocation.</span>
<span class="k">type</span> <span class="n">Encoder</span> <span class="k">interface</span> <span class="p">{</span>
<span class="n">ObjectEncoder</span>
<span class="c">// EncodeEntry encodes an entry and fields, along with any accumulated</span>
<span class="c">// context, into a byte buffer and returns it. Any fields that are empty,</span>
<span class="c">// including fields on the `Entry` type, should be omitted.</span>
<span class="n">EncodeEntry</span><span class="p">(</span><span class="n">Entry</span><span class="p">,</span> <span class="p">[]</span><span class="n">Field</span><span class="p">)</span> <span class="p">(</span><span class="o">*</span><span class="n">buffer</span><span class="o">.</span><span class="n">Buffer</span><span class="p">,</span> <span class="kt">error</span><span class="p">)</span>
<span class="c">// ...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>In <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/zapcore/encoder.go?L346:6"><code class="language-plaintext highlighter-rouge">ObjectEncoder</code></a> we see the promise of a “reflection-free, zero-allocation JSON encoder” in the form of a <em>giant</em> interface, shortened for brevity:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// ObjectEncoder is a strongly-typed, encoding-agnostic interface for adding a</span>
<span class="c">// map- or struct-like object to the logging context. Like maps, ObjectEncoders</span>
<span class="c">// aren't safe for concurrent use (though typical use shouldn't require locks).</span>
<span class="k">type</span> <span class="n">ObjectEncoder</span> <span class="k">interface</span> <span class="p">{</span>
<span class="c">// Logging-specific marshalers.</span>
<span class="n">AddObject</span><span class="p">(</span><span class="n">key</span> <span class="kt">string</span><span class="p">,</span> <span class="n">marshaler</span> <span class="n">ObjectMarshaler</span><span class="p">)</span> <span class="kt">error</span>
<span class="c">// Built-in types.</span>
<span class="n">AddBool</span><span class="p">(</span><span class="n">key</span> <span class="kt">string</span><span class="p">,</span> <span class="n">value</span> <span class="kt">bool</span><span class="p">)</span>
<span class="n">AddDuration</span><span class="p">(</span><span class="n">key</span> <span class="kt">string</span><span class="p">,</span> <span class="n">value</span> <span class="n">time</span><span class="o">.</span><span class="n">Duration</span><span class="p">)</span>
<span class="n">AddInt</span><span class="p">(</span><span class="n">key</span> <span class="kt">string</span><span class="p">,</span> <span class="n">value</span> <span class="kt">int</span><span class="p">)</span>
<span class="n">AddString</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span> <span class="kt">string</span><span class="p">)</span>
<span class="n">AddTime</span><span class="p">(</span><span class="n">key</span> <span class="kt">string</span><span class="p">,</span> <span class="n">value</span> <span class="n">time</span><span class="o">.</span><span class="n">Time</span><span class="p">)</span>
<span class="c">// AddReflected uses reflection to serialise arbitrary objects, so it can be</span>
<span class="c">// slow and allocation-heavy.</span>
<span class="n">AddReflected</span><span class="p">(</span><span class="n">key</span> <span class="kt">string</span><span class="p">,</span> <span class="n">value</span> <span class="k">interface</span><span class="p">{})</span> <span class="kt">error</span>
<span class="c">// ...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This seemingly crazy interface allows messages to be incrementally built in the desired format without ever hitting <code class="language-plaintext highlighter-rouge">json.Marshal</code>. For example, we can look at what the <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/zapcore/json_encoder.go?L58">JSON encoder</a> does to add a string field:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">enc</span> <span class="o">*</span><span class="n">jsonEncoder</span><span class="p">)</span> <span class="n">AddString</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">val</span> <span class="kt">string</span><span class="p">)</span> <span class="p">{</span>
<span class="n">enc</span><span class="o">.</span><span class="n">addKey</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="n">enc</span><span class="o">.</span><span class="n">AppendString</span><span class="p">(</span><span class="n">val</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We start with adding the key, then the value:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">enc</span> <span class="o">*</span><span class="n">jsonEncoder</span><span class="p">)</span> <span class="n">addKey</span><span class="p">(</span><span class="n">key</span> <span class="kt">string</span><span class="p">)</span> <span class="p">{</span>
<span class="n">enc</span><span class="o">.</span><span class="n">addElementSeparator</span><span class="p">()</span>
<span class="n">enc</span><span class="o">.</span><span class="n">buf</span><span class="o">.</span><span class="n">AppendByte</span><span class="p">(</span><span class="sc">'"'</span><span class="p">)</span>
<span class="n">enc</span><span class="o">.</span><span class="n">safeAddString</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="n">enc</span><span class="o">.</span><span class="n">buf</span><span class="o">.</span><span class="n">AppendByte</span><span class="p">(</span><span class="sc">'"'</span><span class="p">)</span>
<span class="n">enc</span><span class="o">.</span><span class="n">buf</span><span class="o">.</span><span class="n">AppendByte</span><span class="p">(</span><span class="sc">':'</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Reading this carefully, given a <code class="language-plaintext highlighter-rouge">key</code> you’ll end up with the following being added to <code class="language-plaintext highlighter-rouge">enc.buf</code> (a bytes buffer):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"key":
^ ^ ^^
| | ||
| | |└ AppendByte(':')
| | └ AppendByte('"')
| └ safeAddString(key)
└ AppendByte('"')
</code></pre></div></div>
<p>Presumably what comes next is a value, for example a string:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">enc</span> <span class="o">*</span><span class="n">jsonEncoder</span><span class="p">)</span> <span class="n">AppendString</span><span class="p">(</span><span class="n">val</span> <span class="kt">string</span><span class="p">)</span> <span class="p">{</span>
<span class="n">enc</span><span class="o">.</span><span class="n">addElementSeparator</span><span class="p">()</span>
<span class="n">enc</span><span class="o">.</span><span class="n">buf</span><span class="o">.</span><span class="n">AppendByte</span><span class="p">(</span><span class="sc">'"'</span><span class="p">)</span>
<span class="n">enc</span><span class="o">.</span><span class="n">safeAddString</span><span class="p">(</span><span class="n">val</span><span class="p">)</span>
<span class="n">enc</span><span class="o">.</span><span class="n">buf</span><span class="o">.</span><span class="n">AppendByte</span><span class="p">(</span><span class="sc">'"'</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>"key":"val"
^ ^ ^
| | |
| | |
| | └ AppendByte('"')
| └ safeAddString(val)
└ AppendByte('"')
</code></pre></div></div>
<p>Encoding the entire entry in <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/zapcore/json_encoder.go?L363:25"><code class="language-plaintext highlighter-rouge">EncodeEntry</code></a> works similarly, with your typical JSON opening and closing braces being written first:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">final</span><span class="o">.</span><span class="n">buf</span><span class="o">.</span><span class="n">AppendByte</span><span class="p">(</span><span class="sc">'{'</span><span class="p">)</span>
<span class="c">// ... render log entry</span>
<span class="n">final</span><span class="o">.</span><span class="n">buf</span><span class="o">.</span><span class="n">AppendByte</span><span class="p">(</span><span class="sc">'}'</span><span class="p">)</span>
<span class="n">final</span><span class="o">.</span><span class="n">buf</span><span class="o">.</span><span class="n">AppendString</span><span class="p">(</span><span class="n">final</span><span class="o">.</span><span class="n">LineEnding</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{"key":"val"}\n
^ ^ ^
| | └ AppendString(final.LineEnding)
| └ AppendByte('}')
└ AppendByte('{')
</code></pre></div></div>
<p>Stepping back up a bit, we can now better understand how <code class="language-plaintext highlighter-rouge">zapcore.Field</code> works, again condensed for brevity:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">type</span> <span class="n">Field</span> <span class="k">struct</span> <span class="p">{</span>
<span class="n">Key</span> <span class="kt">string</span>
<span class="n">Type</span> <span class="n">FieldType</span>
<span class="n">Integer</span> <span class="kt">int64</span>
<span class="n">String</span> <span class="kt">string</span>
<span class="n">Interface</span> <span class="k">interface</span><span class="p">{}</span>
<span class="p">}</span>
<span class="k">func</span> <span class="p">(</span><span class="n">f</span> <span class="n">Field</span><span class="p">)</span> <span class="n">AddTo</span><span class="p">(</span><span class="n">enc</span> <span class="n">ObjectEncoder</span><span class="p">)</span> <span class="p">{</span>
<span class="k">var</span> <span class="n">err</span> <span class="kt">error</span>
<span class="k">switch</span> <span class="n">f</span><span class="o">.</span><span class="n">Type</span> <span class="p">{</span>
<span class="k">case</span> <span class="n">ObjectMarshalerType</span><span class="o">:</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">enc</span><span class="o">.</span><span class="n">AddObject</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">Key</span><span class="p">,</span> <span class="n">f</span><span class="o">.</span><span class="n">Interface</span><span class="o">.</span><span class="p">(</span><span class="n">ObjectMarshaler</span><span class="p">))</span>
<span class="k">case</span> <span class="n">BoolType</span><span class="o">:</span>
<span class="n">enc</span><span class="o">.</span><span class="n">AddBool</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">Key</span><span class="p">,</span> <span class="n">f</span><span class="o">.</span><span class="n">Integer</span> <span class="o">==</span> <span class="m">1</span><span class="p">)</span>
<span class="k">case</span> <span class="n">DurationType</span><span class="o">:</span>
<span class="n">enc</span><span class="o">.</span><span class="n">AddDuration</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">Key</span><span class="p">,</span> <span class="n">time</span><span class="o">.</span><span class="n">Duration</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">Integer</span><span class="p">))</span>
<span class="k">case</span> <span class="n">StringType</span><span class="o">:</span>
<span class="n">enc</span><span class="o">.</span><span class="n">AddString</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">Key</span><span class="p">,</span> <span class="n">f</span><span class="o">.</span><span class="n">String</span><span class="p">)</span>
<span class="k">case</span> <span class="n">ReflectType</span><span class="o">:</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">enc</span><span class="o">.</span><span class="n">AddReflected</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">Key</span><span class="p">,</span> <span class="n">f</span><span class="o">.</span><span class="n">Interface</span><span class="p">)</span>
<span class="c">// ...</span>
<span class="p">}</span>
<span class="c">// ...</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Here we can see that for most cases, when one creates a strongly typed field with e.g. <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/field.go?L221:6"><code class="language-plaintext highlighter-rouge">zap.String(key string, val string) Field</code></a>, Zap can track the type information and pass the <code class="language-plaintext highlighter-rouge">Field</code> directly to the most appropriate function on the underlying encoder. Together with the fact that the entire log message is constructed incrementally, this means that it’s possible for most log messages to never encounter the need to reflect or use the <code class="language-plaintext highlighter-rouge">json</code> package to serialise the message. Nifty! This explains why we spend less time in <code class="language-plaintext highlighter-rouge">json</code> in the profile at the start of this post - most of the log message can be serialised directly, except for one field:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">l</span><span class="o">.</span><span class="n">Info</span><span class="p">(</span><span class="s">"message"</span><span class="p">,</span>
<span class="n">zap</span><span class="o">.</span><span class="n">String</span><span class="p">(</span><span class="s">"4"</span><span class="p">,</span> <span class="s">"foobar"</span><span class="p">),</span>
<span class="n">zap</span><span class="o">.</span><span class="n">Int</span><span class="p">(</span><span class="s">"5"</span><span class="p">,</span> <span class="m">123</span><span class="p">),</span>
<span class="n">zap</span><span class="o">.</span><span class="n">Any</span><span class="p">(</span><span class="s">"6"</span><span class="p">,</span> <span class="n">thing2</span><span class="p">),</span> <span class="c">// this goes to AddReflected, which uses JSON to marshal the field</span>
<span class="p">)</span>
</code></pre></div></div>
<p>To get around this, we could implement <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/zapcore/marshaler.go?L23-32"><code class="language-plaintext highlighter-rouge">ObjectMarshaler</code></a> which we saw on the <code class="language-plaintext highlighter-rouge">Encoder</code> interface previously. If implemented, we can serialise our object directly in an efficient manner:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">type</span> <span class="n">thing</span> <span class="k">struct</span> <span class="p">{</span>
<span class="n">Field</span> <span class="kt">string</span>
<span class="n">Date</span> <span class="n">time</span><span class="o">.</span><span class="n">Time</span>
<span class="p">}</span>
<span class="k">func</span> <span class="p">(</span><span class="n">t</span> <span class="o">*</span><span class="n">thing</span><span class="p">)</span> <span class="n">MarshalLogObject</span><span class="p">(</span><span class="n">enc</span> <span class="n">zapcore</span><span class="o">.</span><span class="n">ObjectEncoder</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
<span class="n">enc</span><span class="o">.</span><span class="n">AddString</span><span class="p">(</span><span class="s">"Field"</span><span class="p">,</span> <span class="n">t</span><span class="o">.</span><span class="n">Field</span><span class="p">)</span>
<span class="n">enc</span><span class="o">.</span><span class="n">AddTime</span><span class="p">(</span><span class="s">"Date"</span><span class="p">,</span> <span class="n">t</span><span class="o">.</span><span class="n">Date</span><span class="p">)</span>
<span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We can re-run the profiling script from the start of the post to see that there’s no more usage of <code class="language-plaintext highlighter-rouge">json</code>!</p>
<p>Going back a bit, we can see that this also simplifies the encoding of fields that are added to the logger itself in the <code class="language-plaintext highlighter-rouge">Core.WithFields</code> we saw earlier by looking at the <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/zapcore/core.go?L72:18"><code class="language-plaintext highlighter-rouge">ioCore.With</code></a> implementation, which immediately encodes the given fields:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">c</span> <span class="o">*</span><span class="n">ioCore</span><span class="p">)</span> <span class="n">With</span><span class="p">(</span><span class="n">fields</span> <span class="p">[]</span><span class="n">Field</span><span class="p">)</span> <span class="n">Core</span> <span class="p">{</span>
<span class="n">clone</span> <span class="o">:=</span> <span class="n">c</span><span class="o">.</span><span class="n">clone</span><span class="p">()</span>
<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">fields</span> <span class="p">{</span>
<span class="n">fields</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">AddTo</span><span class="p">(</span><span class="n">enc</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">clone</span>
<span class="p">}</span>
</code></pre></div></div>
<p><a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/zapcore/json_encoder.go?L418-421"><code class="language-plaintext highlighter-rouge">EncodeEntry</code> checks if there are fields already encoded, and adds the partial JSON into the message directly</a> - no additional work needed.</p>
<h2 id="tldr">tl;dr</h2>
<p>Turns out, seemingly simple things can be kind of complicated! However, in this case the result is a neat exhibit of a variety of optimization techniques and a logging implementation that can outpace other libraries by an order of magnitude.</p>
<p>Zap’s design also provides some interesting ways to hook into its behaviour - Zap itself offers some examples, such as <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/zaptest/logger.go"><code class="language-plaintext highlighter-rouge">zaptest</code></a>, which creates a logger with a custom <code class="language-plaintext highlighter-rouge">Writer</code> that sends output to Go’s standard testing library.</p>
<p>At Sourcegraph, our <a href="https://github.com/sourcegraph/sourcegraph/issues/33192">new Zap-based logger</a> offers utilities to <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/lib/log/logtest/logtest.go?L118-121">hook into an our configured logger</a> using Zap’s <a href="https://sourcegraph.com/github.com/uber-go/zap@v1.21.0/-/blob/options.go?L42:6"><code class="language-plaintext highlighter-rouge">WrapCore</code> API</a> to assert against log output (mostly for testing the log library itself), partly built on the existing <code class="language-plaintext highlighter-rouge">zaptest</code> utilities. We’re also working on custom <code class="language-plaintext highlighter-rouge">Core</code> implementations to <a href="https://github.com/sourcegraph/sourcegraph/pull/35582">automatically send logged errors to Sentry</a>, and we <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/lib/log/fields.go">wrap <code class="language-plaintext highlighter-rouge">Field</code> constructors</a> to define custom behaviours (we disallow importing directly from Zap for this reason). Pretty nifty to still have such a high degree of customizability in an implementation so focused on optimizations!</p>
<p><br /></p>
<h2 id="about-sourcegraph">About Sourcegraph</h2>
<p>Sourcegraph builds universal code search for every developer and company so they can innovate faster. We help developers and companies with billions of lines of code create the software you use every day.
Learn more about Sourcegraph <a href="https://about.sourcegraph.com/">here</a>.</p>
<p>Interested in joining? <a href="https://about.sourcegraph.com/jobs/">We’re hiring</a>!</p>robertZap is a structured logging library from Uber that is built on top of a “reflection-free, zero-allocation JSON encoder” to achieve some very impressions performance comapred to other popular logging libraries for Go. As part of developing integrations for it at Sourcegraph, I thought I’d take the time to look at what goes on under the hood.Scaling a CI service with dynamic and stateless Kubernetes Jobs2022-04-18T00:00:00+00:002022-04-18T00:00:00+00:00https://bobheadxi.dev/stateless-ci<p><a href="/experience/sourcegraph">Sourcegraph</a>’s continuous integration infrastructure uses <a href="https://buildkite.com/">Buildkite</a>, a platform for running pipelines on CI agents we operate. After using the default approach of scaling persistent agent deployments for a long time, we’ve recently switched over to completely stateless agents on dynamically dispatched Kubernetes Jobs to improve the stability of our CI pipelines.</p>
<p>In Buildkite, events (such as a push to a repository) trigger “builds” on a “pipeline” that consist of multiple “jobs”, each of which correspond to a “pipeline step”. This is all of which is managed by the hosted Buildkite service, which then dispatches Buildkite jobs onto any Buildkite agents that are live on our infrastructure that meet each job’s “queue” requirements.</p>
<p>Previously, our Buildkite agent fleet was operated as a simple <a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/">Kubernetes Deployment</a>:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">apps/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Deployment</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">buildkite-agent</span>
<span class="c1"># ...</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">replicas</span><span class="pi">:</span> <span class="m">5</span>
<span class="c1"># ...</span>
<span class="na">template</span><span class="pi">:</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="c1"># ...</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">containers</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">buildkite-agent</span>
<span class="c1"># ...</span>
</code></pre></div></div>
<p>A separate deployment, running a custom service called <code class="language-plaintext highlighter-rouge">buildkite-autoscaler</code>, would poll the Buildkite API for a list of running and schedule jobs and scale the fleet accordingly by making a Kubernetes API call to update the <code class="language-plaintext highlighter-rouge">spec.replicas</code> value in the base Deployment:</p>
<pre><code class="language-mermaid">sequenceDiagram
participant ba as buildkite-autoscaler
participant k8s as Kubernetes
participant bk as Buildkite
loop
ba->>bk: list running, pending jobs
activate bk
bk-->>ba: job queue counts
deactivate bk
activate ba
ba->>ba: determine desired agent count
ba->>k8s: get Deployment
deactivate ba
activate k8s
k8s-->>ba: active Deployment
ba->>k8s: list Deployment Pods
k8s-->>ba: active Pods
deactivate k8s
ba->>k8s: set spec.replicas to desired
end
</code></pre>
<p>As long as there are jobs in the Buildkite queue, deployed agent pods would remain online until the autoscaler deems it appropriate to scale down. As such, multiple jobs could be dispatched onto the same agent before the fleet gets scaled down.</p>
<p>While Buildkite has mechanisms for mitigating state issues across jobs, and most Sourcegraph pipelines have cleanup and best practices for migitating them as well, we occasionally still run into “botched” agents. These are particularly prevalent in jobs where tools are installed globally, or Docker containers are started but not correctly cleaned up (for example, if directories are moounted), and so on. We’ve also had issues where certain pods encounter network issues, causing them to fail all the jobs they accept. We also have jobs work “by accident”, especially in some of our more obscure repositories, where jobs rely on tools being installed by other jobs, and suddenly stop working if they land on a “fresh” agent, or those tools get upgraded unexpected.</p>
<p>All of these issues eventually lead us to decide to build a stateless approach to running our Buildkite agents.</p>
<h2 id="preparing-for-the-switch">Preparing for the switch</h2>
<p>The main Sourcegraph mono-repository, <a href="https://github.com/sourcegraph/sourcegraph"><code class="language-plaintext highlighter-rouge">sourcegraph/sourcegraph</code></a>, uses <a href="/self-documenting-self-updating/#continuous-integration-pipelines">generated pipelines</a> that create pipelines on the fly for Buildkite. Thanks to this, we could easily implement a flag within the generator to redirect builds to the new agents on a gradual basis.</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">var</span> <span class="n">FeatureFlags</span> <span class="o">=</span> <span class="n">featureFlags</span><span class="p">{</span>
<span class="n">StatelessBuild</span><span class="o">:</span> <span class="n">os</span><span class="o">.</span><span class="n">Getenv</span><span class="p">(</span><span class="s">"CI_FEATURE_FLAG_STATELESS"</span><span class="p">)</span> <span class="o">==</span> <span class="s">"true"</span> <span class="o">||</span>
<span class="c">// Roll out to 50% of builds</span>
<span class="n">rand</span><span class="o">.</span><span class="n">NewSource</span><span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">Now</span><span class="p">()</span><span class="o">.</span><span class="n">UnixNano</span><span class="p">())</span><span class="o">.</span><span class="n">Int63</span><span class="p">()</span><span class="o">%</span><span class="m">100</span> <span class="o"><</span> <span class="m">50</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This feature flag could be used to apply <code class="language-plaintext highlighter-rouge">queue</code> configuration and environment variables on builds, allowing us to easily test out larger loads on the new agents and roll back changes with ease.</p>
<h2 id="static-kubernetes-jobs">Static Kubernetes Jobs</h2>
<p>The initial approach undertaken by the team used a single persistent <a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/">Kubernetes Job</a>. Agents would start up with <a href="https://buildkite.com/docs/agent/v3/cli-start#disconnect-after-job"><code class="language-plaintext highlighter-rouge">--disconnect-after-job</code></a>, indicating that they should consume a single job from the queue and immediately disconnect.</p>
<p>A new autoscaler service, <code class="language-plaintext highlighter-rouge">job-autoscaler</code>, was set up that pretty much did the exact same thing as the old <code class="language-plaintext highlighter-rouge">buildkite-autoscaler</code>, but instead of adjusting <code class="language-plaintext highlighter-rouge">spec.replicas</code>, it updated <code class="language-plaintext highlighter-rouge">spec.parallelism</code> instead, setting <code class="language-plaintext highlighter-rouge">spec.completions</code> and <code class="language-plaintext highlighter-rouge">spec.backoffLimit</code> to arbitrarily large values to prevent the Job from ever completing and shutting down.</p>
<p>This initial approach was used to iterate on some refinements to our pipelines to accommodate stateless agents (namely improved caching of resources). Upon rolling this out on a larger scale, however, we immediately ran into issues resulting in major CI outages, after which I outlined my thoughts in <a href="https://github.com/sourcegraph/sourcegraph/issues/32843">sourcegraph#32843 dev/ci: stateless autoscaler: investigate revamped approach with dynamic jobs</a>. It turns out, we probably should not be applying a stateful management approach (scaling a single Job entity up and down) to what should probably be a stateless queue processing mechanism. I decided to take point on re-implementing our approach.</p>
<h2 id="dynamic-kubernetes-jobs">Dynamic Kubernetes Jobs</h2>
<p>In <a href="https://github.com/sourcegraph/sourcegraph/issues/32843">sourcegraph#32843</a> I proposed an approach where we dispatch agents by creating new Kubernetes Jobs with <code class="language-plaintext highlighter-rouge">spec.parallelism</code> and <code class="language-plaintext highlighter-rouge">spec.completions</code> set to roughly number of agents needed to process all the jobs within the Buildkite jobs queue. This would mean that as soon as all the agents within a dispatched Job are “consumed” (have processed a Buildkite job and exited), <a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs">Kubernetes can clean up the Job and related resources</a>, and that would be that. If more agents are needed, we simply keep dispatching more Jobs. This is done by a new service called <code class="language-plaintext highlighter-rouge">buildkite-job-dispatcher</code>.</p>
<p>Luckily, all the setup has been done for stateless agents with the existing Buildkite Job, so the way the dispatcher works is by fetching the deployed Job, resetting a variety of fields used internally by Kubernetes:</p>
<ul>
<li>in <code class="language-plaintext highlighter-rouge">metadata</code>: <a href="https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids">UID</a>, resource version, and labels</li>
<li>in the Job spec: <code class="language-plaintext highlighter-rouge">selector</code> and <code class="language-plaintext highlighter-rouge">template.metadata.labels</code></li>
</ul>
<p>Making a few changes:</p>
<ul>
<li>setting <code class="language-plaintext highlighter-rouge">parallelism</code> = <code class="language-plaintext highlighter-rouge">completions</code> = number of jobs in queue + buffer
<ul>
<li>this means that we are dispatching agents to consume the queue, and exit when done</li>
</ul>
</li>
<li>setting <a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/#job-termination-and-cleanup"><code class="language-plaintext highlighter-rouge">activeDeadlineSeconds</code></a>, <a href="https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs"><code class="language-plaintext highlighter-rouge">ttlSecondsAfterFinished</code></a> to reasonable values
<ul>
<li><code class="language-plaintext highlighter-rouge">activeDeadlineSeconds</code> prevents stale agents from sitting around for too long in case, for example, a build gets cancelled</li>
<li><code class="language-plaintext highlighter-rouge">ttlSecondsAfterFinished</code> ensures resources are freed after use</li>
</ul>
</li>
<li>adjusting the <a href="https://buildkite.com/docs/agent/v3/cli-start#setting-tags"><code class="language-plaintext highlighter-rouge">BUILDKITE_AGENT_TAGS</code></a> environment variable on the Buildkite agent container</li>
</ul>
<p>And deploying the adjusted spec as a new Job!</p>
<pre><code class="language-mermaid">sequenceDiagram
participant ba as buildkite-job-dispatcher
participant k8s as Kubernetes
participant bk as Buildkite
participant gh as GitHub
loop
gh->>bk: enqueue jobs
activate bk
ba->>bk: list queued jobs and total agents
bk-->>ba: queued jobs, total agents
activate ba
ba->>ba: determine required agents
alt queue needs agents
ba->>k8s: get template Job
activate k8s
k8s-->>ba: template Job
deactivate k8s
ba->>ba: modify Job template
ba->>k8s: dispatch new Job
activate k8s
k8s->>bk: register agents
bk-->>k8s: assign jobs to agents
loop while % of Pods not online or completed
par deployed agents process jobs
k8s-->>bk: report completed jobs
bk-->>gh: report pipeline status
deactivate bk
and check previous dispatch
ba->>k8s: list Pods from dispatched Job
k8s-->>ba: Pods states
end
end
end
deactivate ba
k8s->>k8s: Clean up completed Jobs
deactivate k8s
end
</code></pre>
<p>As noted in the diagram above, there’s also a “cooldown” mechanism where the dispatcher waits for the previous dispatch to roll out at least partially before dispatching a new Job to account for delays in our infrastructure. Without it, the dispatcher could continuously create new agents as the visible agent count appears low, leading to overprovisioning. We do this by simply listing the Pods associated with the most recently dispatched Job, which is easy enough to track within the dispatcher.</p>
<h2 id="observability">Observability</h2>
<p><code class="language-plaintext highlighter-rouge">buildkite-job-dispatcher</code> runs on a loop, with each run associated with a <code class="language-plaintext highlighter-rouge">dispatchID</code>, a simplified <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">UUID</a> with all special character removed. Everything that happens within a dispatch iteration is associated with this ID, starting with log entries, built on <a href="https://github.com/uber-go/zap"><code class="language-plaintext highlighter-rouge">go.uber.org/zap</code></a>:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="s">"go.uber.org/zap"</span>
<span class="k">func</span> <span class="p">(</span><span class="n">d</span> <span class="o">*</span><span class="n">Dispatcher</span><span class="p">)</span> <span class="n">run</span><span class="p">(</span><span class="n">ctx</span> <span class="n">context</span><span class="o">.</span><span class="n">Context</span><span class="p">,</span> <span class="n">k8sClient</span> <span class="o">*</span><span class="n">k8s</span><span class="o">.</span><span class="n">Client</span><span class="p">,</span> <span class="n">dispatchID</span> <span class="kt">string</span><span class="p">)</span> <span class="kt">error</span> <span class="p">{</span>
<span class="c">// Allows us to key in on a specifc dispatch run when looking at logs</span>
<span class="n">runLog</span> <span class="o">:=</span> <span class="n">d</span><span class="o">.</span><span class="n">log</span><span class="o">.</span><span class="n">With</span><span class="p">(</span><span class="n">zap</span><span class="o">.</span><span class="n">String</span><span class="p">(</span><span class="s">"dispatchID"</span><span class="p">,</span> <span class="n">dispatchID</span><span class="p">))</span>
<span class="n">runLog</span><span class="o">.</span><span class="n">Debug</span><span class="p">(</span><span class="s">"start run"</span><span class="p">,</span> <span class="n">zap</span><span class="o">.</span><span class="n">Any</span><span class="p">(</span><span class="s">"config"</span><span class="p">,</span> <span class="n">config</span><span class="p">))</span>
<span class="c">// {"msg":"start run","dispatchID":"...","config":{...}}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Dispatched agents have the dispatch ID attached to their name and labels as well:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">batch/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Job</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">annotations</span><span class="pi">:</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Stateless Buildkite agents for running CI builds.</span>
<span class="s">kubectl.kubernetes.io/last-applied-configuration</span><span class="pi">:</span> <span class="c1"># ...</span>
<span class="na">creationTimestamp</span><span class="pi">:</span> <span class="s2">"</span><span class="s">2022-04-18T00:04:34Z"</span>
<span class="na">labels</span><span class="pi">:</span>
<span class="na">app</span><span class="pi">:</span> <span class="s">buildkite-agent-stateless</span>
<span class="s">dispatch.id</span><span class="pi">:</span> <span class="s">3506b2adb17945d7b690bd5f9e6a6fb0</span>
<span class="s">dispatch.queues</span><span class="pi">:</span> <span class="s">stateless_standard_default_job</span>
</code></pre></div></div>
<p>This means that when something unexpected happens - for example, when agents are underpovisioned or overprovisioned, we can easily look at the Jobs dispatched and link back to the log entries associated with their creation:</p>
<figure>
<img src="/assets/images/posts/stateless-ci/logs.png" />
</figure>
<p>The dispatcher’s structured logs also allow us to leverage <a href="https://cloud.google.com/logging/docs/logs-based-metrics">Google Cloud’s log-based metrics</a> by generating metrics from numeric fields within log entries. These metrics form the basis for our at-a-glance overview dashboard of the state of our Buildkite agent fleet and how the dispatcher is responding to demand, as well as alerting for potential issues (for example, if Jobs are taking too long to roll out).</p>
<figure>
<img src="/assets/images/posts/stateless-ci/dashboard.png" />
</figure>
<p>Based on these metrics, we can make adjustments to the numerous knobs available for fine-tuning the behaviour of the dispatcher: target minimum and maximum agents, the frequency of polling, the ratio of agents to require to come online before starting a new dispatch, agent TTLs, and more.</p>
<h2 id="git-mirror-caches">Git mirror caches</h2>
<p>During the initial stateless agent implementation, my teammates <a href="https://github.com/jhchabran/">@jhchabran</a> and <a href="https://github.com/davejrt">@davejrt</a> developed some nifty mechanisms for caching <a href="https://asdf-vm.com/">asdf</a> (a tool management tool) and <a href="https://yarnpkg.com/">Yarn</a> dependencies. It uses <a href="https://github.com/gencer/cache-buildkite-plugin">a Buildkite plugin for caching</a> under the hood, and exposes a simple API for use with Sourcegraph’s <a href="/self-documenting-self-updating/#continuous-integration-pipelines">generated pipelines</a>:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">withYarnCache</span><span class="p">()</span> <span class="n">buildkite</span><span class="o">.</span><span class="n">StepOpt</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">buildkite</span><span class="o">.</span><span class="n">Cache</span><span class="p">(</span><span class="o">&</span><span class="n">buildkite</span><span class="o">.</span><span class="n">CacheOptions</span><span class="p">{</span>
<span class="n">ID</span><span class="o">:</span> <span class="s">"node_modules"</span><span class="p">,</span>
<span class="n">Key</span><span class="o">:</span> <span class="s">"cache-node_modules-{{ checksum 'yarn.lock' }}"</span><span class="p">,</span>
<span class="n">RestoreKeys</span><span class="o">:</span> <span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"cache-node_modules-{{ checksum 'yarn.lock' }}"</span><span class="p">},</span>
<span class="n">Paths</span><span class="o">:</span> <span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"node_modules"</span><span class="p">,</span> <span class="c">/* ... */</span><span class="p">},</span>
<span class="n">Compress</span><span class="o">:</span> <span class="no">false</span><span class="p">,</span>
<span class="p">})</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">addPrettier</span><span class="p">(</span><span class="n">pipeline</span> <span class="o">*</span><span class="n">bk</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">)</span> <span class="p">{</span>
<span class="n">pipeline</span><span class="o">.</span><span class="n">AddStep</span><span class="p">(</span><span class="s">":lipstick: Prettier"</span><span class="p">,</span>
<span class="n">withYarnCache</span><span class="p">(),</span>
<span class="n">bk</span><span class="o">.</span><span class="n">Cmd</span><span class="p">(</span><span class="s">"dev/ci/yarn-run.sh format:check"</span><span class="p">))</span>
<span class="p">}</span>
</code></pre></div></div>
<p>A lingering problem continued to be the initial clone step, however, especially in the main <a href="https://github.com/sourcegraph/sourcegraph"><code class="language-plaintext highlighter-rouge">sourcegraph/sourcegraph</code> monorepo</a>, which can take upwards of 30 seconds to perform a shallow clone. We can’t entirely depend on shallow clones either, since our pipeline generator depends on performing diffs against our <code class="language-plaintext highlighter-rouge">main</code> branch to determine how to construct a pipeline. This is especially painful for short steps, where the time to run a linter check might be around the same amount of time it takes to perform a clone.</p>
<p>Buildkite supports a feature that <a href="https://buildkite.com/changelog/107-share-git-checkouts-with-the-git-mirrors-agent-experiment">allows all jobs on a single host to share a single git clone</a>, using <a href="https://git-scm.com/docs/git-clone/2.36.0#Documentation/git-clone.txt---mirror"><code class="language-plaintext highlighter-rouge">git clone --mirror</code></a>. Subsequent clones after the initial clone can leverage the mirror repository with <a href="https://git-scm.com/docs/git-clone/2.36.0#Documentation/git-clone.txt---reference-if-ableltrepositorygt"><code class="language-plaintext highlighter-rouge">git clone --reference</code></a>:</p>
<blockquote>
<p>If the reference repository is on the local machine, […] obtain objects from the reference repository. Using an already existing repository as an alternate will require fewer objects to be copied from the repository being cloned, reducing network and local storage costs.</p>
</blockquote>
<p>On our old stateless agents, this means that while some jobs can take the same 30 seconds to clone the repository, most jobs that land on “warm” agents will have a much faster clone time - roughly 5 seconds.</p>
<p>To recreate this feature on our stateless agents, I created a daily cron job that:</p>
<ol>
<li>Creates a disk in Google Cloud, with <code class="language-plaintext highlighter-rouge">gcloud compute disks create buildkite-git-references-"$BUILDKITE_BUILD_NUMBER"</code></li>
<li>Deploys a Kubernetes <a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/">PersistentVolume and PersistentVolumeClaim</a> corresponding to the new disk</li>
<li>Deploys a Kubernetes Job that mounts the generated PersistentVolumeClaim and creates a clone mirror</li>
<li>Updates the PersistentVolumeClaim to be labelled <code class="language-plaintext highlighter-rouge">state: ready</code></li>
</ol>
<p>We generate resources to deploy using <a href="https://www.gnu.org/software/gettext/manual/html_node/envsubst-Invocation.html"><code class="language-plaintext highlighter-rouge">envsubst <$TEMPLATE >$GENERATED</code></a> on a template spec. For example, the PersistentVolume template spec looks like:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">PersistentVolume</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">buildkite-git-references-$BUILDKITE_BUILD_NUMBER</span>
<span class="na">namespace</span><span class="pi">:</span> <span class="s">buildkite</span>
<span class="na">labels</span><span class="pi">:</span>
<span class="na">deploy</span><span class="pi">:</span> <span class="s">buildkite</span>
<span class="na">for</span><span class="pi">:</span> <span class="s">buildkite-git-references</span>
<span class="na">state</span><span class="pi">:</span> <span class="s">$PV_STATE</span>
<span class="na">id</span><span class="pi">:</span> <span class="s1">'</span><span class="s">$BUILDKITE_BUILD_NUMBER'</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">accessModes</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">ReadWriteOnce</span>
<span class="pi">-</span> <span class="s">ReadOnlyMany</span>
<span class="na">claimRef</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">buildkite-git-references-$BUILDKITE_BUILD_NUMBER</span>
<span class="na">namespace</span><span class="pi">:</span> <span class="s">buildkite</span>
<span class="na">gcePersistentDisk</span><span class="pi">:</span>
<span class="na">fsType</span><span class="pi">:</span> <span class="s">ext4</span>
<span class="c1"># the disk we created with 'gcloud compute disks create'</span>
<span class="na">pdName</span><span class="pi">:</span> <span class="s">buildkite-git-references-$BUILDKITE_BUILD_NUMBER</span>
<span class="na">capacity</span><span class="pi">:</span>
<span class="na">storage</span><span class="pi">:</span> <span class="s">16G</span>
<span class="na">persistentVolumeReclaimPolicy</span><span class="pi">:</span> <span class="s">Delete</span>
<span class="na">storageClassName</span><span class="pi">:</span> <span class="s">buildkite-git-references</span>
</code></pre></div></div>
<p>PersitentVolumes are created with <a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes"><code class="language-plaintext highlighter-rouge">accessModes: [ReadWriteOnce, ReadOnlyMany]</code></a> - the idea is that we will mount it as <code class="language-plaintext highlighter-rouge">ReadWriteOnce</code> to populate the disk with a mirror repository, before allowing all our agents to mount the disk as <code class="language-plaintext highlighter-rouge">ReadOnlyMany</code>:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">batch/v1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">Job</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">buildkite-git-references-populate</span>
<span class="na">namespace</span><span class="pi">:</span> <span class="s">buildkite</span>
<span class="na">annotations</span><span class="pi">:</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Populates the latest buildkite-git-references disk with data.</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">parallelism</span><span class="pi">:</span> <span class="m">1</span>
<span class="na">completions</span><span class="pi">:</span> <span class="m">1</span>
<span class="na">ttlSecondsAfterFinished</span><span class="pi">:</span> <span class="m">240</span> <span class="c1"># allow us to fetch logs</span>
<span class="na">template</span><span class="pi">:</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">labels</span><span class="pi">:</span>
<span class="na">app</span><span class="pi">:</span> <span class="s">buildkite-git-references-populate</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">containers</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">populate-references</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">alpine/git:v2.32.0</span>
<span class="na">imagePullPolicy</span><span class="pi">:</span> <span class="s">IfNotPresent</span>
<span class="na">command</span><span class="pi">:</span> <span class="pi">[</span><span class="s1">'</span><span class="s">/bin/sh'</span><span class="pi">]</span>
<span class="na">args</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s1">'</span><span class="s">-c'</span>
<span class="c1"># Format:</span>
<span class="c1"># git clone git@github.com:sourcegraph/$REPO /buildkite-git-references/$REPO.reference;</span>
<span class="pi">-</span> <span class="pi">|</span>
<span class="s">mkdir /root/.ssh; cp /buildkite/.ssh/* /root/.ssh/;</span>
<span class="s">git clone git@github.com:sourcegraph/sourcegraph.git \</span>
<span class="s">/buildkite-git-references/sourcegraph.reference;</span>
<span class="s">echo 'Done';</span>
<span class="na">volumeMounts</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">mountPath</span><span class="pi">:</span> <span class="s">/buildkite-git-references</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">buildkite-git-references</span>
<span class="na">restartPolicy</span><span class="pi">:</span> <span class="s">OnFailure</span>
<span class="na">volumes</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">buildkite-git-references</span>
<span class="na">persistentVolumeClaim</span><span class="pi">:</span>
<span class="na">claimName</span><span class="pi">:</span> <span class="s">buildkite-git-references-$BUILDKITE_BUILD_NUMBER</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">buildkite-job-dispatcher</code> can now simply list all the available PersistentVolumeClaims that are ready:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">var</span> <span class="n">gitReferencesPVC</span> <span class="o">*</span><span class="n">corev1</span><span class="o">.</span><span class="n">PersistentVolumeClaim</span>
<span class="k">var</span> <span class="n">listGitReferencesPVCs</span> <span class="n">corev1</span><span class="o">.</span><span class="n">PersistentVolumeClaimList</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">k8sClient</span><span class="o">.</span><span class="n">List</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">config</span><span class="o">.</span><span class="n">TemplateJobNamespace</span><span class="p">,</span> <span class="o">&</span><span class="n">listGitReferencesPVCs</span><span class="p">,</span>
<span class="n">k8s</span><span class="o">.</span><span class="n">QueryParam</span><span class="p">(</span><span class="s">"labelSelector"</span><span class="p">,</span> <span class="s">"state=ready,for=buildkite-git-references"</span><span class="p">),</span>
<span class="p">);</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="n">runLog</span><span class="o">.</span><span class="n">Error</span><span class="p">(</span><span class="s">"failed to fetch buildkite-git-references PVCs"</span><span class="p">,</span> <span class="n">zap</span><span class="o">.</span><span class="n">Error</span><span class="p">(</span><span class="n">err</span><span class="p">))</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">gitReferencesPVCs</span> <span class="o">:=</span> <span class="n">PersistentVolumeClaims</span><span class="p">(</span><span class="n">listGitReferencesPVCs</span><span class="o">.</span><span class="n">GetItems</span><span class="p">())</span>
<span class="n">pvcCount</span> <span class="o">:=</span> <span class="n">zapMetric</span><span class="p">(</span><span class="s">"pvcs"</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">gitReferencesPVCs</span><span class="p">))</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">gitReferencesPVCs</span><span class="p">)</span> <span class="o">></span> <span class="m">0</span> <span class="p">{</span>
<span class="n">sort</span><span class="o">.</span><span class="n">Sort</span><span class="p">(</span><span class="n">gitReferencesPVCs</span><span class="p">)</span>
<span class="n">gitReferencesPVC</span> <span class="o">=</span> <span class="n">gitReferencesPVCs</span><span class="p">[</span><span class="m">0</span><span class="p">]</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">runLog</span><span class="o">.</span><span class="n">Warn</span><span class="p">(</span><span class="s">"no buildkite-git-references PVCs found"</span><span class="p">,</span> <span class="n">pvcCount</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>And apply it to the agent Jobs we dispatch:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">gitReferencePVC</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="n">job</span><span class="o">.</span><span class="n">Spec</span><span class="o">.</span><span class="n">Template</span><span class="o">.</span><span class="n">GetSpec</span><span class="p">()</span><span class="o">.</span><span class="n">Volumes</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">job</span><span class="o">.</span><span class="n">Spec</span><span class="o">.</span><span class="n">Template</span><span class="o">.</span><span class="n">GetSpec</span><span class="p">()</span><span class="o">.</span><span class="n">GetVolumes</span><span class="p">(),</span>
<span class="o">&</span><span class="n">corev1</span><span class="o">.</span><span class="n">Volume</span><span class="p">{</span>
<span class="n">Name</span><span class="o">:</span> <span class="n">stringPtr</span><span class="p">(</span><span class="s">"buildkite-git-references"</span><span class="p">),</span>
<span class="n">VolumeSource</span><span class="o">:</span> <span class="o">&</span><span class="n">corev1</span><span class="o">.</span><span class="n">VolumeSource</span><span class="p">{</span>
<span class="n">PersistentVolumeClaim</span><span class="o">:</span> <span class="o">&</span><span class="n">corev1</span><span class="o">.</span><span class="n">PersistentVolumeClaimVolumeSource</span><span class="p">{</span>
<span class="n">ClaimName</span><span class="o">:</span> <span class="n">gitReferencePVC</span><span class="o">.</span><span class="n">GetMetadata</span><span class="p">()</span><span class="o">.</span><span class="n">Name</span><span class="p">,</span>
<span class="n">ReadOnly</span><span class="o">:</span> <span class="n">boolPtr</span><span class="p">(</span><span class="no">true</span><span class="p">),</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">})</span>
<span class="n">agentContainer</span><span class="o">.</span><span class="n">VolumeMounts</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">agentContainer</span><span class="o">.</span><span class="n">GetVolumeMounts</span><span class="p">(),</span>
<span class="o">&</span><span class="n">corev1</span><span class="o">.</span><span class="n">VolumeMount</span><span class="p">{</span>
<span class="n">Name</span><span class="o">:</span> <span class="n">stringPtr</span><span class="p">(</span><span class="s">"buildkite-git-references"</span><span class="p">),</span>
<span class="n">ReadOnly</span><span class="o">:</span> <span class="n">boolPtr</span><span class="p">(</span><span class="no">true</span><span class="p">),</span>
<span class="n">MountPath</span><span class="o">:</span> <span class="n">stringPtr</span><span class="p">(</span><span class="s">"/buildkite-git-references"</span><span class="p">),</span>
<span class="p">})</span>
<span class="p">}</span>
</code></pre></div></div>
<p>And that’s it! We now have repository clone times that are consistently within the 3-7 seconds range, depending on how much your branch has diverged from <code class="language-plaintext highlighter-rouge">main</code>. As new disks become available, newly dispatched agents will automatically leverage more up-to-date mirror repositories.</p>
<figure>
<img src="/assets/images/posts/stateless-ci/git-clone-reference.png" />
</figure>
<p>Within the same daily cron job that deploys these disks, we can also prune disks that are no longer used by any agents:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl describe pvc <span class="nt">-l</span> <span class="k">for</span><span class="o">=</span>buildkite-git-references,id!<span class="o">=</span><span class="s2">"</span><span class="nv">$BUILDKITE_BUILD_NUMBER</span><span class="s2">"</span> |
<span class="nb">grep</span> <span class="nt">-E</span> <span class="s2">"^Name:.*</span><span class="nv">$|</span><span class="s2">^Used By:.*$"</span> | <span class="nb">grep</span> <span class="nt">-B</span> 2 <span class="s2">"<none>"</span> | <span class="nb">grep</span> <span class="nt">-E</span> <span class="s2">"^Name:.*$"</span> |
<span class="nb">awk</span> <span class="s1">'$2 {print$2}'</span> |
<span class="k">while </span><span class="nb">read</span> <span class="nt">-r</span> vol<span class="p">;</span> <span class="k">do </span>kubectl delete pvc/<span class="s2">"</span><span class="k">${</span><span class="nv">vol</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--wait</span><span class="o">=</span><span class="nb">false</span><span class="p">;</span> <span class="k">done</span>
</code></pre></div></div>
<p>Interestingly enough, there is no way to easily detect if a PersistentVolumeClaim is completely unused. We can detect <a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/#phase"><em>unbound</em></a> disks easily, but that doesn’t mean the same thing - in this setup PersistentVolumes are always bound, even when that PersistentVolumeClaim may or may not be in use. <code class="language-plaintext highlighter-rouge">kubectl describe</code> has this information though<sup id="fnref:kubectl" role="doc-noteref"><a href="#fn:kubectl" class="footnote" rel="footnote">1</a></sup>, which is what the above script (based on <a href="https://stackoverflow.com/a/59758937">this StackOverflow answer</a>) uses.</p>
<h2 id="stateless-agents">Stateless agents</h2>
<p>So far, we have already seen a drastic reduction in tool-related flakes in CI, and the switch to stateless agents has helped us maintain confidence that issues are related to botched state and poor isolation. There are probably other mechanisms for maintaining isolation between builds, but for our case this seemed to have the easiest migration path.</p>
<p><br /></p>
<h2 id="about-sourcegraph">About Sourcegraph</h2>
<p>Sourcegraph builds universal code search for every developer and company so they can innovate faster. We help developers and companies with billions of lines of code create the software you use every day.
Learn more about Sourcegraph <a href="https://about.sourcegraph.com/">here</a>.</p>
<p>Interested in joining? <a href="https://about.sourcegraph.com/jobs/">We’re hiring</a>!</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:kubectl" role="doc-endnote">
<p>A quick Sourcegraph search for <code class="language-plaintext highlighter-rouge">"Used By"</code> quickly reveals <a href="https://sourcegraph.com/github.com/kubernetes/kubectl@18a5313a74f7d83f6b54377d72b421b5ebfa66c9/-/blob/pkg/describe/describe.go?L1616:25">this line</a> as the source of the output. A <a href="https://sourcegraph.com/github.com/kubernetes/kubectl@18a5313a74f7d83f6b54377d72b421b5ebfa66c9/-/blob/pkg/describe/describe.go?L1583-1586">custom <code class="language-plaintext highlighter-rouge">getPodsForPVC</code></a> is the source of the pods listed here, and looking for references reveals that no <code class="language-plaintext highlighter-rouge">kubectl</code> command exposes this functionality except <code class="language-plaintext highlighter-rouge">kubectl describe</code>, so lengthy script it is! <a href="#fnref:kubectl" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>robertSourcegraph’s continuous integration infrastructure uses Buildkite, a platform for running pipelines on CI agents we operate. After using the default approach of scaling persistent agent deployments for a long time, we’ve recently switched over to completely stateless agents on dynamically dispatched Kubernetes Jobs to improve the stability of our CI pipelines.Extending Sourcegraph search2022-04-10T00:00:00+00:002022-04-10T00:00:00+00:00https://bobheadxi.dev/extending-search<p><a href="/experience/sourcegraph">Sourcegraph</a> recently held a brief internal hackathon where we got to work on a variety of ideas related to our <a href="https://about.sourcegraph.com/use-cases">freshly minted “Sourcegraph use cases”</a>. One idea that was raised was extending Sourcegraph’s <a href="https://about.sourcegraph.com/code-search">core code search functionality</a> to allow queries over <a href="https://docs.sourcegraph.com/notebooks">search notebooks</a>, a new product that enables live and persistent documentation based on code search, to aid in content discovery for onboarding.</p>
<p>The minimum viable product of this project was to implement the ability to do the following search within the Sourcegraph search language:</p>
<pre><code class="language-none">type:notebook my notebook query select:notebook.block.md
_____________ _________________ ________________________
| | └ render Markdown sections of the notebook match
| └ query string
└ type filter
</code></pre>
<p>And render search notebooks (and/or selected “blocks”, or sections) within search results! For some context, this is what Sourcegraph’s code search results usually look like:</p>
<figure>
<img src="/assets/images/posts/extending-search/search.png" />
</figure>
<p>And this is what search notebooks look like, with each section being a separate notebook block:</p>
<figure>
<video autoplay="" loop="" muted="" playsinline="">
<source src="https://storage.googleapis.com/sourcegraph-assets/notebooks/notebooks_overview_v3_dark.webm" type="video/webm" />
<source src="https://storage.googleapis.com/sourcegraph-assets/notebooks/notebooks_overview_v3_dark.mp4" type="video/mp4" />
</video>
</figure>
<p>In this post, I’ll walk through a brief overview of what I learned about how Sourcegraph search works and what we did to implement an additional search and search result type!</p>
<ul>
<li><a href="#introducing-a-search-job">Introducing a search job</a></li>
<li><a href="#sending-results-over-the-wire">Sending results over the wire</a></li>
<li><a href="#querying-the-database-for-real-results">Querying the database for real results</a></li>
<li><a href="#implementing-notebook-blocks-results">Implementing notebook blocks results</a></li>
<li><a href="#rendering-search-notebook-results">Rendering search notebook results</a></li>
</ul>
<p>A sneak peak of the end result:</p>
<figure>
<img src="/assets/images/posts/extending-search/block-search.png" />
<figcaption>
End-to-end notebook block search!
</figcaption>
</figure>
<p>Note that all the code internals mentioned in this post may change - you can view the Sourcegraph repository at <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e"><code class="language-plaintext highlighter-rouge">73a484e</code></a> for a accurate picture of what the codebase looked like at the time! I’d also like to thank <a href="https://github.com/tsenart">@tsenart</a> who both proposed the original idea and worked with me through several brainstorming sessions to discuss the implementation.</p>
<p>Additionally, I am basically a complete outsider when it comes to our search internals, and the search code I interact with in this post was built by <a href="https://handbook.sourcegraph.com/departments/product-engineering/engineering/code-graph/search/">Sourcegraph’s fantastic search teams</a>, so kudos<sup id="fnref:kudos" role="doc-noteref"><a href="#fn:kudos" class="footnote" rel="footnote">1</a></sup> to the teams for making this hack possible in the first place!</p>
<h2 id="introducing-a-search-job">Introducing a search job</h2>
<p>The Sourcegraph docs page <a href="https://docs.sourcegraph.com/dev/background-information/architecture/life-of-a-search-query">Life of a search query</a> briefly goes over what happens when, for example, you enter a query into <a href="https://sourcegraph.com/search">sourcegraph.com/search</a>:</p>
<ol>
<li>A client makes a request to (typically) the <code class="language-plaintext highlighter-rouge">/.api/stream</code> endpoint - see <a href="https://sourcegraph.com/github.com/bobheadxi/raycast-sourcegraph@7bc6dd80ffcb714b46b1911bc8368139d061dd82/-/blob/src/sourcegraph/stream-search/index.ts?L53-56">how it is done in the <code class="language-plaintext highlighter-rouge">raycast-sourcegraph</code> extension for a simplified example</a>.</li>
<li>The query makes its way to <code class="language-plaintext highlighter-rouge">sourcegraph-frontend</code>, which converts the query text into a search plan composed of search jobs to execute against various backends (such as <a href="https://github.com/sourcegraph/zoekt">Zoekt</a>).</li>
<li>Jobs get executed and the results get streamed back over the wire to the client.</li>
</ol>
<p>For example, a typical query <code class="language-plaintext highlighter-rouge">foobar</code> will evaluate to a plan of jobs like the following, calling out to a variety of search backends (<code class="language-plaintext highlighter-rouge">ZoektGlobalSearch</code>, <code class="language-plaintext highlighter-rouge">RepoSearch</code>, <code class="language-plaintext highlighter-rouge">ComputeExcludedRepos</code>) within certain limits<sup id="fnref:timeout" role="doc-noteref"><a href="#fn:timeout" class="footnote" rel="footnote">2</a></sup>, imposed by jobs for enforcing those limits on child jobs.</p>
<pre><code class="language-mermaid">flowchart TB
0([TIMEOUT])
0---1
1[20s]
0---2
2([LIMIT])
2---3
3[500]
2---4
4([PARALLEL])
4---5
5([ZoektGlobalSearch])
4---6
6([RepoSearch])
4---7
7([ComputeExcludedRepos])
</code></pre>
<p>The typical example here is a search job that reaches out to our <a href="https://github.com/sourcegraph/zoekt">Zoekt backends</a>. A <code class="language-plaintext highlighter-rouge">Job</code> could also combine multiple search jobs, such as to <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/internal/search/job/combinators.go?L104-121">run a set of jobs in parallel</a> or to <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/internal/search/job/combinators.go?L38-81">prioritise results from certain jobs before others</a>.</p>
<p>The evaluated search job varies based on your search query - an <a href="https://docs.sourcegraph.com/code_search/how-to/exhaustive">exhaustive</a> commit search (<code class="language-plaintext highlighter-rouge">foo type:commit count:all</code>) will create the following job instead, with a longer timeout and higher limit:</p>
<pre><code class="language-mermaid">flowchart TB
0([TIMEOUT])
0---1
1[1m0s]
0---2
2([LIMIT])
2---3
3[99999999]
2---4
4([PARALLEL])
4---5
5([Commit])
4---6
6([ComputeExcludedRepos])
</code></pre>
<p>Each search job within these plans are implemented behind the <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/internal/search/job/types.go?L23:6#tab=references"><code class="language-plaintext highlighter-rouge">Job</code> interface</a>:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Job is an interface shared by all individual search operations in the</span>
<span class="c">// backend (e.g., text vs commit vs symbol search are represented as different</span>
<span class="c">// jobs) as well as combinations over those searches (run a set in parallel,</span>
<span class="c">// timeout). Calling Run on a job object runs a search.</span>
<span class="k">type</span> <span class="n">Job</span> <span class="k">interface</span> <span class="p">{</span>
<span class="n">Run</span><span class="p">(</span><span class="n">context</span><span class="o">.</span><span class="n">Context</span><span class="p">,</span> <span class="n">database</span><span class="o">.</span><span class="n">DB</span><span class="p">,</span> <span class="n">streaming</span><span class="o">.</span><span class="n">Sender</span><span class="p">)</span> <span class="p">(</span><span class="o">*</span><span class="n">search</span><span class="o">.</span><span class="n">Alert</span><span class="p">,</span> <span class="kt">error</span><span class="p">)</span>
<span class="n">Name</span><span class="p">()</span> <span class="kt">string</span>
<span class="p">}</span>
</code></pre></div></div>
<p>So how do these jobs in the query plan get created? Poking around for constructors of the <code class="language-plaintext highlighter-rouge">Job</code> interface reveals (I think) the following flow for <code class="language-plaintext highlighter-rouge">Job</code> creation after a <code class="language-plaintext highlighter-rouge">query.Plan</code> is created (primarily with <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/internal/search/query/query.go?L171:6"><code class="language-plaintext highlighter-rouge">query.Pipeline</code></a>, which handles query parsing, validation, transformation, and so on):</p>
<pre><code class="language-mermaid">graph TD
FromExpandedPlan --> ToEvaluateJob
ToEvaluateJob --> ToSearchJob
ToEvaluateJob -- "has pattern (AND or OR)" --> toPatternExpressionJob
toPatternExpressionJob --> ToSearchJob
toPatternExpressionJob --> toOrJob
toPatternExpressionJob --> toAndJob
toOrJob --> toPatternExpressionJob
toAndJob --> toPatternExpressionJob
ToSearchJob --> Job
ToSearchJob -- has pattern --> optimizeJobs
optimizeJobs --> Job
</code></pre>
<p>The <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/internal/search/job/job.go?L51:6"><code class="language-plaintext highlighter-rouge">ToSearchJob</code> function</a>, which appears to handle the bulk of creation of search jobs, with the additional layers applying a variety of processing.</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// ToSearchJob converts a query parse tree to the _internal_ representation</span>
<span class="c">// needed to run a search routine. To understand why this conversion matters, think</span>
<span class="c">// about the fact that the query parse tree doesn't know anything about our</span>
<span class="c">// backends or architecture. It doesn't decide certain defaults, like whether we</span>
<span class="c">// should return multiple result types (pattern matches content, or a file name,</span>
<span class="c">// or a repo name). If we want to optimise a Sourcegraph query parse tree for a</span>
<span class="c">// particular backend (e.g., skip repository resolution and just run a Zoekt</span>
<span class="c">// query on all indexed repositories) then we need to convert our tree to</span>
<span class="c">// Zoekt's internal inputs and representation. These concerns are all handled by</span>
<span class="c">// toSearchJob.</span>
<span class="k">func</span> <span class="n">ToSearchJob</span><span class="p">(</span><span class="n">jargs</span> <span class="o">*</span><span class="n">Args</span><span class="p">,</span> <span class="n">q</span> <span class="n">query</span><span class="o">.</span><span class="n">Q</span><span class="p">,</span> <span class="n">db</span> <span class="n">database</span><span class="o">.</span><span class="n">DB</span><span class="p">)</span> <span class="p">(</span><span class="n">Job</span><span class="p">,</span> <span class="kt">error</span><span class="p">)</span> <span class="p">{</span>
<span class="n">b</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">query</span><span class="o">.</span><span class="n">ToBasicQuery</span><span class="p">(</span><span class="n">q</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="no">nil</span><span class="p">,</span> <span class="n">err</span>
<span class="p">}</span>
<span class="n">types</span><span class="p">,</span> <span class="n">_</span> <span class="o">:=</span> <span class="n">q</span><span class="o">.</span><span class="n">StringValues</span><span class="p">(</span><span class="n">query</span><span class="o">.</span><span class="n">FieldType</span><span class="p">)</span>
<span class="n">resultTypes</span> <span class="o">:=</span> <span class="n">search</span><span class="o">.</span><span class="n">ComputeResultTypes</span><span class="p">(</span><span class="n">types</span><span class="p">,</span> <span class="n">b</span><span class="o">.</span><span class="n">PatternString</span><span class="p">(),</span> <span class="n">jargs</span><span class="o">.</span><span class="n">SearchInputs</span><span class="o">.</span><span class="n">PatternType</span><span class="p">)</span>
<span class="c">// ...</span>
<span class="k">var</span> <span class="n">requiredJobs</span><span class="p">,</span> <span class="n">optionalJobs</span> <span class="p">[]</span><span class="n">Job</span>
<span class="n">addJob</span> <span class="o">:=</span> <span class="k">func</span><span class="p">(</span><span class="n">required</span> <span class="kt">bool</span><span class="p">,</span> <span class="n">job</span> <span class="n">Job</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="n">required</span> <span class="p">{</span>
<span class="n">requiredJobs</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">requiredJobs</span><span class="p">,</span> <span class="n">job</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">optionalJobs</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">optionalJobs</span><span class="p">,</span> <span class="n">job</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c">// ... various conditional calls to addJob</span>
<span class="p">}</span>
</code></pre></div></div>
<p>So to start off, we add a new field type <code class="language-plaintext highlighter-rouge">result.TypeNotebook = "notebook"</code>, and attach a new <code class="language-plaintext highlighter-rouge">Job</code> when a query includes <code class="language-plaintext highlighter-rouge">type: notebook</code>:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">resultTypes</span><span class="o">.</span><span class="n">Has</span><span class="p">(</span><span class="n">result</span><span class="o">.</span><span class="n">TypeNotebook</span><span class="p">)</span> <span class="p">{</span>
<span class="n">notebookSearchJob</span> <span class="o">:=</span> <span class="o">&</span><span class="n">notebook</span><span class="o">.</span><span class="n">SearchJob</span><span class="p">{</span>
<span class="n">PatternString</span><span class="o">:</span> <span class="n">b</span><span class="o">.</span><span class="n">PatternString</span><span class="p">(),</span>
<span class="p">}</span>
<span class="n">addJob</span><span class="p">(</span><span class="no">true</span><span class="p">,</span> <span class="n">notebookSearchJob</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>For now, we want to create a stub implementation that provides a few hard-coded notebooks that sends a few results over to the <code class="language-plaintext highlighter-rouge">streaming.Sender</code> provided in the <code class="language-plaintext highlighter-rouge">(Job).Run</code> interface. This requires implementing the <code class="language-plaintext highlighter-rouge">result.Match</code> interface:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">type</span> <span class="n">Match</span> <span class="k">interface</span> <span class="p">{</span>
<span class="n">ResultCount</span><span class="p">()</span> <span class="kt">int</span>
<span class="c">// Limit truncates the match such that, after limiting,</span>
<span class="c">// `Match.ResultCount() == limit`. It should never be called with</span>
<span class="c">// `limit <= 0`, since a single match cannot be truncated to zero results.</span>
<span class="n">Limit</span><span class="p">(</span><span class="kt">int</span><span class="p">)</span> <span class="kt">int</span>
<span class="n">Select</span><span class="p">(</span><span class="n">filter</span><span class="o">.</span><span class="n">SelectPath</span><span class="p">)</span> <span class="n">Match</span>
<span class="n">RepoName</span><span class="p">()</span> <span class="n">types</span><span class="o">.</span><span class="n">MinimalRepo</span>
<span class="c">// Key returns a key which uniquely identifies this match.</span>
<span class="n">Key</span><span class="p">()</span> <span class="n">Key</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Right off the bat, it becomes clear that Sourcegraph’s search internals are heavily geared towards repository-oriented results, with the top-level <code class="language-plaintext highlighter-rouge">RepoName</code> being part of the <code class="language-plaintext highlighter-rouge">Match</code> interface. Repository matches, file content results, symbols, commits, diffs, and so on all return results that are part of a repository. Notebooks, on the other hand, are an entirely separate entity within the Sourcegraph application, and notebooks that are tracked in the database (it is also possible to create notebooks with <code class="language-plaintext highlighter-rouge">.snb.md</code> files within repositories, but we ignore that case for now) are not strictly associated with any repository.</p>
<p>This is even more evident within the <code class="language-plaintext highlighter-rouge">Key</code> type, which requires an unique combination <code class="language-plaintext highlighter-rouge">Repo</code>, <code class="language-plaintext highlighter-rouge">Rev</code>, <code class="language-plaintext highlighter-rouge">Path</code>, <code class="language-plaintext highlighter-rouge">AuthorDate</code>, <code class="language-plaintext highlighter-rouge">Commit</code>, <code class="language-plaintext highlighter-rouge">Path</code>, and <code class="language-plaintext highlighter-rouge">TypeRank</code> - none of which are fields that we can use to uniquely identify a search notebook. We could use <code class="language-plaintext highlighter-rouge">Path</code> as the notebook name, but that’s not strictly unique either.</p>
<p>To work around these issues for now, we just return a zero-value <code class="language-plaintext highlighter-rouge">RepoName</code> and add a new field <code class="language-plaintext highlighter-rouge">ID</code> to the <code class="language-plaintext highlighter-rouge">Key</code> type:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">type</span> <span class="n">Key</span> <span class="k">struct</span> <span class="p">{</span>
<span class="c">// ...</span>
<span class="c">// ID is an arbitrary identifier that can be used to distinguish this result,</span>
<span class="c">// e.g. if the result type is not associated with a repository.</span>
<span class="n">ID</span> <span class="kt">string</span>
<span class="c">// ...</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">type</span> <span class="n">NotebookMatch</span> <span class="k">struct</span> <span class="p">{</span>
<span class="n">ID</span> <span class="kt">int64</span>
<span class="n">Title</span> <span class="kt">string</span>
<span class="n">Namespace</span> <span class="kt">string</span>
<span class="n">Private</span> <span class="kt">bool</span>
<span class="n">Stars</span> <span class="kt">int</span>
<span class="p">}</span>
<span class="k">func</span> <span class="p">(</span><span class="n">n</span> <span class="n">NotebookMatch</span><span class="p">)</span> <span class="n">RepoName</span><span class="p">()</span> <span class="n">types</span><span class="o">.</span><span class="n">MinimalRepo</span> <span class="p">{</span>
<span class="c">// This result type is not associated with any repository.</span>
<span class="k">return</span> <span class="n">types</span><span class="o">.</span><span class="n">MinimalRepo</span><span class="p">{}</span>
<span class="p">}</span>
<span class="k">func</span> <span class="p">(</span><span class="n">n</span> <span class="n">NotebookMatch</span><span class="p">)</span> <span class="n">Limit</span><span class="p">(</span><span class="n">limit</span> <span class="kt">int</span><span class="p">)</span> <span class="kt">int</span> <span class="p">{</span>
<span class="c">// Always represents one result and limit > 0 so we just return limit - 1.</span>
<span class="k">return</span> <span class="n">limit</span> <span class="o">-</span> <span class="m">1</span>
<span class="p">}</span>
<span class="k">func</span> <span class="p">(</span><span class="n">n</span> <span class="o">*</span><span class="n">NotebookMatch</span><span class="p">)</span> <span class="n">URL</span><span class="p">()</span> <span class="o">*</span><span class="n">url</span><span class="o">.</span><span class="n">URL</span> <span class="p">{</span>
<span class="k">return</span> <span class="o">&</span><span class="n">url</span><span class="o">.</span><span class="n">URL</span><span class="p">{</span><span class="n">Path</span><span class="o">:</span> <span class="s">"/notebooks/"</span> <span class="o">+</span> <span class="n">n</span><span class="o">.</span><span class="n">marshalNotebookID</span><span class="p">()}</span>
<span class="p">}</span>
<span class="k">func</span> <span class="p">(</span><span class="n">n</span> <span class="o">*</span><span class="n">NotebookMatch</span><span class="p">)</span> <span class="n">Key</span><span class="p">()</span> <span class="n">Key</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">Key</span><span class="p">{</span>
<span class="n">ID</span><span class="o">:</span> <span class="n">n</span><span class="o">.</span><span class="n">marshalNotebookID</span><span class="p">(),</span>
<span class="n">TypeRank</span><span class="o">:</span> <span class="n">rankRepoMatch</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c">// other interface functions no-op for now</span>
</code></pre></div></div>
<p>With our new types, we can create a stub job for searching search notebooks:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">type</span> <span class="n">SearchJob</span> <span class="k">struct</span> <span class="p">{}</span>
<span class="k">func</span> <span class="p">(</span><span class="n">s</span> <span class="o">*</span><span class="n">SearchJob</span><span class="p">)</span> <span class="n">Run</span><span class="p">(</span><span class="n">ctx</span> <span class="n">context</span><span class="o">.</span><span class="n">Context</span><span class="p">,</span> <span class="n">db</span> <span class="n">database</span><span class="o">.</span><span class="n">DB</span><span class="p">,</span> <span class="n">stream</span> <span class="n">streaming</span><span class="o">.</span><span class="n">Sender</span><span class="p">)</span> <span class="p">(</span><span class="o">*</span><span class="n">search</span><span class="o">.</span><span class="n">Alert</span><span class="p">,</span> <span class="kt">error</span><span class="p">)</span> <span class="p">{</span>
<span class="n">stream</span><span class="o">.</span><span class="n">Send</span><span class="p">(</span><span class="n">streaming</span><span class="o">.</span><span class="n">SearchEvent</span><span class="p">{</span>
<span class="n">Results</span><span class="o">:</span> <span class="n">result</span><span class="o">.</span><span class="n">Matches</span><span class="p">{</span>
<span class="o">&</span><span class="n">result</span><span class="o">.</span><span class="n">NotebookMatch</span><span class="p">{</span>
<span class="n">Title</span><span class="o">:</span> <span class="s">"FOOBAR"</span><span class="p">,</span>
<span class="n">Namespace</span><span class="o">:</span> <span class="s">"sourcegraph"</span><span class="p">,</span>
<span class="n">ID</span><span class="o">:</span> <span class="m">1</span><span class="p">,</span>
<span class="n">Stars</span><span class="o">:</span> <span class="m">64</span><span class="p">,</span>
<span class="n">Private</span><span class="o">:</span> <span class="no">false</span><span class="p">,</span>
<span class="p">},</span>
<span class="o">&</span><span class="n">result</span><span class="o">.</span><span class="n">NotebookMatch</span><span class="p">{</span>
<span class="n">Title</span><span class="o">:</span> <span class="s">"BAZ"</span><span class="p">,</span>
<span class="n">Namespace</span><span class="o">:</span> <span class="s">"robert"</span><span class="p">,</span>
<span class="n">ID</span><span class="o">:</span> <span class="m">2</span><span class="p">,</span>
<span class="n">Stars</span><span class="o">:</span> <span class="m">0</span><span class="p">,</span>
<span class="n">Private</span><span class="o">:</span> <span class="no">true</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">},</span>
<span class="p">})</span>
<span class="k">return</span> <span class="no">nil</span><span class="p">,</span> <span class="no">nil</span>
<span class="p">}</span>
<span class="k">func</span> <span class="p">(</span><span class="o">*</span><span class="n">SearchJob</span><span class="p">)</span> <span class="n">Name</span><span class="p">()</span> <span class="kt">string</span> <span class="p">{</span> <span class="k">return</span> <span class="s">"NotebookSearch"</span> <span class="p">}</span>
</code></pre></div></div>
<p>The workarounds above caused some funky behaviour, such as repository permissions post-processing rejecting notebook results as not being associated with a repository the current <a href="https://sourcegraph.com/notebooks/Tm90ZWJvb2s6OTI=">actor (user)</a> has access to, so I just hacked in some a condition to ignore zero-value <code class="language-plaintext highlighter-rouge">RepoName</code>s in those checks to avoid dropping our notebook results.</p>
<p>We can test the evaluation of the query <code class="language-plaintext highlighter-rouge">type:notebook select:notebook.block.md foobar</code> to see our new search job type being registered (after implementing the appropriate printers):</p>
<pre><code class="language-mermaid">flowchart TB
0([TIMEOUT])
0---1
1[20s]
0---2
2([LIMIT])
2---3
3[500]
2---4
4([SELECT])
4---5
5[notebook.block.md]
4---6
6([PARALLEL])
6---7
7([NotebookSearch])
6---8
8([ComputeExcludedRepos])
</code></pre>
<p>In this case, the <code class="language-plaintext highlighter-rouge">select:</code> term is just thrown in to demonstrate that it’s a job that occurs <em>on top</em> of a child job, which contains the <code class="language-plaintext highlighter-rouge">NotebookSearch</code> job we created. This will be important <a href="#implementing-notebook-blocks-results">later</a>)!</p>
<h2 id="sending-results-over-the-wire">Sending results over the wire</h2>
<p>That’s not the end of it! Distinct from plans, jobs, and matches, we also have <em>event</em> types, which are the types that get transmitted over the wire to search clients.</p>
<p>For the most part, this is a very thin layer that just simplifies the internal match types for consumption, and hydrates events with repository metadata from a cache (such how many stars the associated repository has, and when the repository was last updated) or decorations. For our new notebook results, we don’t really need to support any of that yet - we can simply map results more or less directly to a new event type.</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">fromNotebook</span><span class="p">(</span><span class="n">notebook</span> <span class="o">*</span><span class="n">result</span><span class="o">.</span><span class="n">NotebookMatch</span><span class="p">)</span> <span class="o">*</span><span class="n">streamhttp</span><span class="o">.</span><span class="n">EventNotebookMatch</span> <span class="p">{</span>
<span class="k">return</span> <span class="o">&</span><span class="n">streamhttp</span><span class="o">.</span><span class="n">EventNotebookMatch</span><span class="p">{</span>
<span class="n">Type</span><span class="o">:</span> <span class="n">streamhttp</span><span class="o">.</span><span class="n">NotebookMatchType</span><span class="p">,</span>
<span class="n">ID</span><span class="o">:</span> <span class="n">notebook</span><span class="o">.</span><span class="n">Key</span><span class="p">()</span><span class="o">.</span><span class="n">ID</span><span class="p">,</span>
<span class="n">Title</span><span class="o">:</span> <span class="n">notebook</span><span class="o">.</span><span class="n">Title</span><span class="p">,</span>
<span class="n">Namespace</span><span class="o">:</span> <span class="n">notebook</span><span class="o">.</span><span class="n">Namespace</span><span class="p">,</span>
<span class="n">URL</span><span class="o">:</span> <span class="n">notebook</span><span class="o">.</span><span class="n">URL</span><span class="p">()</span><span class="o">.</span><span class="n">String</span><span class="p">(),</span>
<span class="n">Stars</span><span class="o">:</span> <span class="n">notebook</span><span class="o">.</span><span class="n">Stars</span><span class="p">,</span>
<span class="n">Private</span><span class="o">:</span> <span class="n">notebook</span><span class="o">.</span><span class="n">Private</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>At this point, we basically have everything we need to see our results in the API results! We can confirm by <a href="https://docs.sourcegraph.com/dev/background-information/sg#sg-start-start-dev-environments">spinning up Sourcegraph locally with <code class="language-plaintext highlighter-rouge">sg start</code></a>, executing a search, and inspecting the response of the network request to <code class="language-plaintext highlighter-rouge">/.api/stream</code> within a browser for our placeholder notebook results:</p>
<figure>
<img src="/assets/images/posts/extending-search/notebooks-network-stub.png" />
<figcaption>
Look closely at the '<code>matches</code>' entry for our hard-coded notebooks!
</figcaption>
</figure>
<h2 id="querying-the-database-for-real-results">Querying the database for real results</h2>
<p>Notebooks live in the Sourcegraph database, so to replace our stub results we can make a query to look for notebooks that returns relevant matches based on the provided query string.</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span>
<span class="n">notebooks</span><span class="p">.</span><span class="n">id</span><span class="p">,</span>
<span class="n">notebooks</span><span class="p">.</span><span class="n">title</span><span class="p">,</span>
<span class="k">NOT</span> <span class="k">public</span> <span class="k">as</span> <span class="n">private</span><span class="p">,</span> <span class="c1">-- invert for consistency with other match types</span>
<span class="c1">-- apply post-processing after query to merge namespace_user and namespace_org into a</span>
<span class="c1">-- single 'Namespace' field (only one can be set at a time)</span>
<span class="n">users</span><span class="p">.</span><span class="n">username</span> <span class="k">as</span> <span class="n">namespace_user</span><span class="p">,</span>
<span class="n">orgs</span><span class="p">.</span><span class="n">name</span> <span class="k">as</span> <span class="n">namespace_org</span><span class="p">,</span>
<span class="p">(</span>
<span class="k">SELECT</span> <span class="k">COUNT</span><span class="p">(</span><span class="o">*</span><span class="p">)</span>
<span class="k">FROM</span> <span class="n">notebook_stars</span>
<span class="k">WHERE</span> <span class="n">notebook_id</span> <span class="o">=</span> <span class="n">notebooks</span><span class="p">.</span><span class="n">id</span>
<span class="p">)</span> <span class="k">as</span> <span class="n">stars</span>
<span class="k">FROM</span>
<span class="n">notebooks</span>
<span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">users</span> <span class="k">on</span> <span class="n">users</span><span class="p">.</span><span class="n">id</span> <span class="o">=</span> <span class="n">notebooks</span><span class="p">.</span><span class="n">namespace_user_id</span>
<span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">orgs</span> <span class="k">on</span> <span class="n">orgs</span><span class="p">.</span><span class="n">id</span> <span class="o">=</span> <span class="n">notebooks</span><span class="p">.</span><span class="n">namespace_org_id</span>
<span class="k">WHERE</span>
<span class="p">(</span><span class="o">%</span><span class="n">s</span><span class="p">)</span> <span class="c1">-- permission conditions</span>
<span class="k">AND</span> <span class="p">(</span><span class="o">%</span><span class="n">s</span><span class="p">)</span> <span class="c1">-- query conditions</span>
<span class="k">ORDER</span> <span class="k">BY</span>
<span class="n">stars</span> <span class="k">DESC</span>
<span class="k">LIMIT</span>
<span class="mi">25</span>
</code></pre></div></div>
<p>To generate query conditions, we use the <code class="language-plaintext highlighter-rouge">notebook.SearchJob</code> evaluated in <code class="language-plaintext highlighter-rouge">ToSearchJob</code> as the sole parameter. The idea is to extend <code class="language-plaintext highlighter-rouge">SearchJob</code> to contain all the parameters that can be used to adjust the generated query (such as pattern types, e.g. regexp, or additional fields, such as inclusion and exclusion of notebooks with <code class="language-plaintext highlighter-rouge">notebook:</code> and <code class="language-plaintext highlighter-rouge">-notebook</code>, and so on). For now, we generate simple queries solely based on the <code class="language-plaintext highlighter-rouge">PatternString</code> parameter:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">makeQueryConds</span><span class="p">(</span><span class="n">job</span> <span class="o">*</span><span class="n">SearchJob</span><span class="p">)</span> <span class="o">*</span><span class="n">sqlf</span><span class="o">.</span><span class="n">Query</span> <span class="p">{</span>
<span class="n">conds</span> <span class="o">:=</span> <span class="p">[]</span><span class="o">*</span><span class="n">sqlf</span><span class="o">.</span><span class="n">Query</span><span class="p">{}</span>
<span class="c">// Allow querying against the 'full title'</span>
<span class="k">const</span> <span class="n">concatTitleQuery</span> <span class="o">=</span> <span class="s">"CONCAT(users.username, orgs.name, notebooks.title)"</span>
<span class="k">if</span> <span class="n">job</span><span class="o">.</span><span class="n">PatternString</span> <span class="o">!=</span> <span class="s">""</span> <span class="p">{</span>
<span class="n">titleQuery</span> <span class="o">:=</span> <span class="s">"%("</span> <span class="o">+</span> <span class="n">job</span><span class="o">.</span><span class="n">PatternString</span> <span class="o">+</span> <span class="s">")%"</span>
<span class="n">conds</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">conds</span><span class="p">,</span> <span class="n">sqlf</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"%s ILIKE %s"</span><span class="p">,</span>
<span class="n">concatTitleQuery</span><span class="p">,</span> <span class="n">titleQuery</span><span class="p">))</span>
<span class="p">}</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">job</span><span class="o">.</span><span class="n">PatternString</span><span class="p">)</span> <span class="o">></span> <span class="m">0</span> <span class="p">{</span>
<span class="c">// Query against notebook contents, embedded as a tsvector field.</span>
<span class="n">conds</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">conds</span><span class="p">,</span> <span class="n">sqlf</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"notebooks.blocks_tsvector @@ to_tsquery('english', %s)"</span><span class="p">,</span>
<span class="n">toPostgresTextSearchQuery</span><span class="p">(</span><span class="n">job</span><span class="o">.</span><span class="n">PatternString</span><span class="p">)))</span>
<span class="p">}</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">conds</span><span class="p">)</span> <span class="o">==</span> <span class="m">0</span> <span class="p">{</span>
<span class="c">// If no conditions are present, append a catch-all condition to avoid a SQL syntax error</span>
<span class="n">conds</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">conds</span><span class="p">,</span> <span class="n">sqlf</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"1 = 1"</span><span class="p">))</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">sqlf</span><span class="o">.</span><span class="n">Join</span><span class="p">(</span><span class="n">conds</span><span class="p">,</span> <span class="s">"</span><span class="se">\n</span><span class="s"> OR"</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">CONCAT</code> means that we cannot use indexes to hasten the query, but this is a hackathon so oh well. I decided to keep it in because I felt like a query for <code class="language-plaintext highlighter-rouge">$namespace $topic</code> felt like a very natural query to want to make, and I wanted to the demo supported that.</p>
<p>After writing a bit more boilerplate to execute the database query and scan the resulting rows, we can update our search job to return real results instead:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">s</span> <span class="o">*</span><span class="n">SearchJob</span><span class="p">)</span> <span class="n">Run</span><span class="p">(</span><span class="n">ctx</span> <span class="n">context</span><span class="o">.</span><span class="n">Context</span><span class="p">,</span> <span class="n">db</span> <span class="n">database</span><span class="o">.</span><span class="n">DB</span><span class="p">,</span> <span class="n">stream</span> <span class="n">streaming</span><span class="o">.</span><span class="n">Sender</span><span class="p">)</span> <span class="p">(</span><span class="o">*</span><span class="n">search</span><span class="o">.</span><span class="n">Alert</span><span class="p">,</span> <span class="kt">error</span><span class="p">)</span> <span class="p">{</span>
<span class="n">store</span> <span class="o">:=</span> <span class="n">Search</span><span class="p">(</span><span class="n">db</span><span class="p">)</span>
<span class="n">notebooks</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">store</span><span class="o">.</span><span class="n">SearchNotebooks</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="no">nil</span><span class="p">,</span> <span class="n">errors</span><span class="o">.</span><span class="n">Wrap</span><span class="p">(</span><span class="n">err</span><span class="p">,</span> <span class="s">"NotebookSearch"</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">matches</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([]</span><span class="n">result</span><span class="o">.</span><span class="n">Match</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">notebooks</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">n</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">notebooks</span> <span class="p">{</span>
<span class="n">matches</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">n</span>
<span class="p">}</span>
<span class="n">stream</span><span class="o">.</span><span class="n">Send</span><span class="p">(</span><span class="n">streaming</span><span class="o">.</span><span class="n">SearchEvent</span><span class="p">{</span>
<span class="n">Results</span><span class="o">:</span> <span class="n">matches</span><span class="p">,</span>
<span class="p">})</span>
<span class="k">return</span> <span class="no">nil</span><span class="p">,</span> <span class="no">nil</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We can test this out by creating a few notebooks in our local Sourcegraph instance and inspecting the network requests in-browser again to see real notebooks being returned!</p>
<h2 id="implementing-notebook-blocks-results">Implementing notebook blocks results</h2>
<p>Seeing the notebook titles that match your query is great and all, but to demonstrate the potential of this capability we wanted to make sure users can also see notebook <em>content</em> results - in other words, the matching notebook blocks - for their query.</p>
<p>For now, we decided to implement this such that notebook blocks only get returned with the <code class="language-plaintext highlighter-rouge">select:notebook.block</code> parameter. The Sourcegraph query language already features selections like <code class="language-plaintext highlighter-rouge">select:repo</code> or <code class="language-plaintext highlighter-rouge">select:commit.diff.added</code>, so this approach felt like it fitted in with how other search types are implemented.</p>
<p>Selections are part of the <code class="language-plaintext highlighter-rouge">Match</code> interface we previously implemented, and they work via <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/internal/search/job/select.go?L24-30"><code class="language-plaintext highlighter-rouge">selectJob</code>, which wraps the <code class="language-plaintext highlighter-rouge">streaming.Sender</code> with another <code class="language-plaintext highlighter-rouge">streaming.Sender</code></a> that <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/internal/search/streaming/stream.go?L94">calls <code class="language-plaintext highlighter-rouge">Select</code> on each result it receives</a> before passing it to the underlying stream.</p>
<p>This means that all we have to do is also query for blocks within our notebooks database query, and only expose the blocks within the <code class="language-plaintext highlighter-rouge">Select</code> implementation. To start off, we extend our <code class="language-plaintext highlighter-rouge">NotebookMatch</code> with a <code class="language-plaintext highlighter-rouge">Blocks</code> field, and implement <code class="language-plaintext highlighter-rouge">Select</code> such that we generate a new <code class="language-plaintext highlighter-rouge">NotebookBlocksMatch</code> type:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">type</span> <span class="n">NotebookMatch</span> <span class="k">struct</span> <span class="p">{</span>
<span class="c">// ... as before</span>
<span class="n">Blocks</span> <span class="n">NotebookBlocks</span> <span class="s">`json:"-"`</span>
<span class="p">}</span>
<span class="c">/// ... as before</span>
<span class="k">func</span> <span class="p">(</span><span class="n">n</span> <span class="o">*</span><span class="n">NotebookMatch</span><span class="p">)</span> <span class="n">Select</span><span class="p">(</span><span class="n">path</span> <span class="n">filter</span><span class="o">.</span><span class="n">SelectPath</span><span class="p">)</span> <span class="n">Match</span> <span class="p">{</span>
<span class="c">// Only support 'select:notebook.*' on this result type</span>
<span class="k">if</span> <span class="n">path</span><span class="o">.</span><span class="n">Root</span><span class="p">()</span> <span class="o">!=</span> <span class="n">filter</span><span class="o">.</span><span class="n">Notebook</span> <span class="p">{</span>
<span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span>
<span class="k">switch</span> <span class="nb">len</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="m">1</span><span class="o">:</span>
<span class="k">return</span> <span class="n">n</span> <span class="c">// This is just 'select:notebook', so return self</span>
<span class="k">case</span> <span class="m">2</span><span class="p">,</span> <span class="m">3</span><span class="o">:</span> <span class="c">// Support 'select:notebook.block' and 'select:notebook.block.*'</span>
<span class="k">if</span> <span class="n">path</span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="o">==</span> <span class="s">"block"</span> <span class="p">{</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">n</span><span class="o">.</span><span class="n">Blocks</span><span class="p">)</span> <span class="o">==</span> <span class="m">0</span> <span class="p">{</span>
<span class="k">return</span> <span class="no">nil</span> <span class="c">// No results!</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">(</span><span class="o">&</span><span class="n">NotebookBlocksMatch</span><span class="p">{</span>
<span class="n">Notebook</span><span class="o">:</span> <span class="o">*</span><span class="n">n</span><span class="p">,</span>
<span class="n">Blocks</span><span class="o">:</span> <span class="n">n</span><span class="o">.</span><span class="n">Blocks</span><span class="p">,</span>
<span class="p">})</span><span class="o">.</span><span class="n">Select</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="c">// Allow blocks to continue selecting for 'select:notebook.block.*'</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span>
</code></pre></div></div>
<p>To support <code class="language-plaintext highlighter-rouge">select:notebook.blocks.$TYPE</code>, where <code class="language-plaintext highlighter-rouge">$TYPE</code> is a block type (such as Markdown, query, symbol, and so on), the <code class="language-plaintext highlighter-rouge">NotebookBlocksMatch</code> type must also implement <code class="language-plaintext highlighter-rouge">Select</code> to only provide blocks of the requested type:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">n</span> <span class="o">*</span><span class="n">NotebookBlocksMatch</span><span class="p">)</span> <span class="n">Select</span><span class="p">(</span><span class="n">path</span> <span class="n">filter</span><span class="o">.</span><span class="n">SelectPath</span><span class="p">)</span> <span class="n">Match</span> <span class="p">{</span>
<span class="c">// Only support 'select:notebook.*' on this result type</span>
<span class="k">if</span> <span class="n">path</span><span class="o">.</span><span class="n">Root</span><span class="p">()</span> <span class="o">!=</span> <span class="n">filter</span><span class="o">.</span><span class="n">Notebook</span> <span class="p">{</span>
<span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span>
<span class="k">switch</span> <span class="nb">len</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="m">2</span><span class="o">:</span>
<span class="k">if</span> <span class="n">path</span><span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="o">==</span> <span class="s">"block"</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">n</span> <span class="c">// This is just 'select:notebook.block', so return self</span>
<span class="p">}</span>
<span class="k">case</span> <span class="m">3</span><span class="o">:</span>
<span class="c">// Filter by the requested block type, which is the third path parameter. For example,</span>
<span class="c">// 'select:notebook.block.md' will filter for blocks of type 'md'.</span>
<span class="n">blockType</span> <span class="o">:=</span> <span class="n">path</span><span class="p">[</span><span class="m">2</span><span class="p">]</span>
<span class="k">var</span> <span class="n">blocks</span> <span class="n">NotebookBlocks</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">b</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">n</span><span class="o">.</span><span class="n">Blocks</span> <span class="p">{</span>
<span class="k">if</span> <span class="n">b</span><span class="p">[</span><span class="s">"type"</span><span class="p">]</span> <span class="o">==</span> <span class="n">blockType</span> <span class="p">{</span>
<span class="n">blocks</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">blocks</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">blocks</span><span class="p">)</span> <span class="o">==</span> <span class="m">0</span> <span class="p">{</span>
<span class="k">return</span> <span class="no">nil</span> <span class="c">// No results!</span>
<span class="p">}</span>
<span class="k">return</span> <span class="o">&</span><span class="n">NotebookBlocksMatch</span><span class="p">{</span>
<span class="n">Notebook</span><span class="o">:</span> <span class="n">n</span><span class="o">.</span><span class="n">Notebook</span><span class="p">,</span>
<span class="n">Blocks</span><span class="o">:</span> <span class="n">blocks</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span>
</code></pre></div></div>
<p>And as before, we need to implement an event type <code class="language-plaintext highlighter-rouge">EventNotebookBlockMatch</code> and the relevant adapters as well.</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">fromNotebookBlocks</span><span class="p">(</span><span class="n">blocks</span> <span class="o">*</span><span class="n">result</span><span class="o">.</span><span class="n">NotebookBlocksMatch</span><span class="p">)</span> <span class="o">*</span><span class="n">streamhttp</span><span class="o">.</span><span class="n">EventNotebookBlockMatch</span> <span class="p">{</span>
<span class="k">return</span> <span class="o">&</span><span class="n">streamhttp</span><span class="o">.</span><span class="n">EventNotebookBlockMatch</span><span class="p">{</span>
<span class="n">Type</span><span class="o">:</span> <span class="n">streamhttp</span><span class="o">.</span><span class="n">NotebookBlockMatchType</span><span class="p">,</span>
<span class="n">Notebook</span><span class="o">:</span> <span class="o">*</span><span class="n">fromNotebook</span><span class="p">(</span><span class="o">&</span><span class="n">blocks</span><span class="o">.</span><span class="n">Notebook</span><span class="p">),</span>
<span class="n">Blocks</span><span class="o">:</span> <span class="n">blocks</span><span class="o">.</span><span class="n">Blocks</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>For the database layer, we now need to add blocks to our result type. Blocks are currently store as a JSON blob within the <code class="language-plaintext highlighter-rouge">notebooks.blocks</code> column, so adding that to our <code class="language-plaintext highlighter-rouge">SELECT</code> and including it in the result scan is fairly straight-forward.</p>
<p>However, this does mean that we can’t only select relevant blocks within the database query. A better long-term solution to this is likely to split <code class="language-plaintext highlighter-rouge">notebooks.blocks</code> out into a separate table and joining it at query time, but that’s a lot of work for a hackathon so I decided to go for a cheap hack: post-filtering! This isn’t too bad for now because the <code class="language-plaintext highlighter-rouge">notebooks.blocks_tsvector @@ to_tsquery</code> in our query conditions means that the returned notebooks are likely to have a matching block, but it definitely isn’t very pretty.</p>
<p>Even worse, blocks of various types have varying shapes (i.e. there’s no single <code class="language-plaintext highlighter-rouge">block.text</code> field we can filter on), and I didn’t want to special-case each block type for now. A closer look at <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/migrations/frontend/1528395957/up.sql?L1-2"><code class="language-plaintext highlighter-rouge">notebooks.blocks_tsvector</code></a> reveals it is backed by <a href="https://www.postgresql.org/docs/current/functions-textsearch.html">a magic Postgres feature</a> that indexes all fields of type <code class="language-plaintext highlighter-rouge">string</code> within the <code class="language-plaintext highlighter-rouge">notebooks.blocks</code> JSON:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ALTER</span> <span class="k">TABLE</span>
<span class="n">notebooks</span>
<span class="k">ADD</span>
<span class="k">COLUMN</span>
<span class="n">IF</span> <span class="k">NOT</span> <span class="k">EXISTS</span>
<span class="n">blocks_tsvector</span> <span class="n">TSVECTOR</span>
<span class="k">GENERATED</span> <span class="n">ALWAYS</span> <span class="k">AS</span>
<span class="p">(</span><span class="n">jsonb_to_tsvector</span><span class="p">(</span><span class="s1">'english'</span><span class="p">,</span> <span class="n">blocks</span><span class="p">,</span> <span class="s1">'["string"]'</span><span class="p">))</span> <span class="n">STORED</span><span class="p">;</span>
</code></pre></div></div>
<p>It is a neat implementation that does not require any knowledge of blocks fields, but sadly there does not seem to be an equivalent function built with Go for us to post-filter with. So I just marshal each block as JSON and do a regexp search over the whole thing:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">s</span> <span class="o">*</span><span class="n">notebooksSearchStore</span><span class="p">)</span> <span class="n">SearchNotebooks</span><span class="p">(</span><span class="n">ctx</span> <span class="n">context</span><span class="o">.</span><span class="n">Context</span><span class="p">,</span> <span class="n">job</span> <span class="o">*</span><span class="n">SearchJob</span><span class="p">)</span> <span class="p">([]</span><span class="o">*</span><span class="n">result</span><span class="o">.</span><span class="n">NotebookMatch</span><span class="p">,</span> <span class="kt">error</span><span class="p">)</span> <span class="p">{</span>
<span class="c">// ... query for notebooks</span>
<span class="c">// do our post-filtering</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">job</span><span class="o">.</span><span class="n">PatternString</span><span class="p">)</span> <span class="o">></span> <span class="m">0</span> <span class="p">{</span>
<span class="n">searchRe</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">regexp</span><span class="o">.</span><span class="n">Compile</span><span class="p">(</span><span class="s">"(?i).*("</span> <span class="o">+</span> <span class="n">job</span><span class="o">.</span><span class="n">PatternString</span> <span class="o">+</span> <span class="s">").*"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="no">nil</span><span class="p">,</span> <span class="n">err</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">n</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">notebooks</span> <span class="p">{</span>
<span class="k">var</span> <span class="n">matchBlocks</span> <span class="n">result</span><span class="o">.</span><span class="n">NotebookBlocks</span>
<span class="c">// filter notebook blocks</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">block</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">n</span><span class="o">.</span><span class="n">Blocks</span> <span class="p">{</span>
<span class="n">b</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">json</span><span class="o">.</span><span class="n">Marshal</span><span class="p">(</span><span class="n">block</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">continue</span>
<span class="p">}</span>
<span class="c">// regexp match over the marshalled block</span>
<span class="k">if</span> <span class="n">searchRe</span><span class="o">.</span><span class="n">Match</span><span class="p">(</span><span class="n">b</span><span class="p">)</span> <span class="p">{</span>
<span class="n">matchBlocks</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">matchBlocks</span><span class="p">,</span> <span class="n">block</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">n</span><span class="o">.</span><span class="n">Blocks</span> <span class="o">=</span> <span class="n">matchBlocks</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">notebooks</span><span class="p">,</span> <span class="no">nil</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Hey, it’s a hackathon!</p>
<p>Similarly to before, we can verify this works end-to-end by running a <code class="language-plaintext highlighter-rouge">type:notebook select:notebook.block</code> query and inspecting the response:</p>
<figure>
<img src="/assets/images/posts/extending-search/blocks-network.png" />
</figure>
<h2 id="rendering-search-notebook-results">Rendering search notebook results</h2>
<p>Rendering results in the network tab is great and all, but we want to demo something pretty as well! We start off by adding types in the web app that correspond to our new event types:</p>
<div class="language-ts highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">export</span> <span class="kd">type</span> <span class="nx">SearchType</span> <span class="o">=</span> <span class="cm">/* ... */</span> <span class="o">|</span> <span class="dl">'</span><span class="s1">notebook</span><span class="dl">'</span> <span class="o">|</span> <span class="kc">null</span>
<span class="k">export</span> <span class="kd">type</span> <span class="nx">SearchMatch</span> <span class="o">=</span> <span class="cm">/* ... */</span> <span class="o">|</span> <span class="nx">NotebookMatch</span> <span class="o">|</span> <span class="nx">NotebookBlocksMatch</span>
<span class="k">export</span> <span class="kr">interface</span> <span class="nx">NotebookMatch</span> <span class="p">{</span>
<span class="nl">type</span><span class="p">:</span> <span class="dl">'</span><span class="s1">notebook</span><span class="dl">'</span>
<span class="nx">id</span><span class="p">:</span> <span class="kr">string</span>
<span class="nx">title</span><span class="p">:</span> <span class="kr">string</span>
<span class="k">namespace</span><span class="p">:</span> <span class="kr">string</span>
<span class="nx">url</span><span class="p">:</span> <span class="kr">string</span>
<span class="nx">stars</span><span class="p">?:</span> <span class="kr">number</span>
<span class="k">private</span><span class="p">:</span> <span class="nx">boolean</span>
<span class="p">}</span>
<span class="k">export</span> <span class="kr">interface</span> <span class="nx">NotebookBlocksMatch</span> <span class="p">{</span>
<span class="nl">type</span><span class="p">:</span> <span class="dl">'</span><span class="s1">notebook.block</span><span class="dl">'</span>
<span class="nx">notebook</span><span class="p">:</span> <span class="nx">NotebookMatch</span>
<span class="c1">// TODO lots of variants of these types, leave as any for now and massage the data</span>
<span class="c1">// as needed</span>
<span class="nx">blocks</span><span class="p">:</span> <span class="kr">any</span><span class="p">[]</span>
<span class="p">}</span>
</code></pre></div></div>
<p>To extend <code class="language-plaintext highlighter-rouge">type:</code> completions in the search bar, we update <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/client/shared/src/search/query/filters.ts?L289-292"><code class="language-plaintext highlighter-rouge">FILTERS</code></a>:</p>
<div class="language-ts highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">export</span> <span class="kd">const</span> <span class="nx">FILTERS</span><span class="p">:</span> <span class="nb">Record</span><span class="o"><</span><span class="nx">NegatableFilter</span><span class="p">,</span> <span class="nx">NegatableFilterDefinition</span><span class="o">></span> <span class="o">&</span>
<span class="nb">Record</span><span class="o"><</span><span class="nx">Exclude</span><span class="o"><</span><span class="nx">FilterType</span><span class="p">,</span> <span class="nx">NegatableFilter</span><span class="o">></span><span class="p">,</span> <span class="nx">BaseFilterDefinition</span><span class="o">></span> <span class="o">=</span> <span class="p">{</span>
<span class="cm">/* ... */</span>
<span class="p">[</span><span class="nx">FilterType</span><span class="p">.</span><span class="kd">type</span><span class="p">]:</span> <span class="p">{</span>
<span class="na">description</span><span class="p">:</span> <span class="dl">'</span><span class="s1">Limit results to the specified type.</span><span class="dl">'</span><span class="p">,</span>
<span class="na">discreteValues</span><span class="p">:</span> <span class="p">()</span> <span class="o">=></span> <span class="p">[</span><span class="cm">/* ... */</span><span class="p">,</span> <span class="dl">'</span><span class="s1">notebook</span><span class="dl">'</span><span class="p">].</span><span class="nx">map</span><span class="p">(</span><span class="nx">value</span> <span class="o">=></span> <span class="p">({</span> <span class="na">label</span><span class="p">:</span> <span class="nx">value</span> <span class="p">})),</span>
<span class="p">},</span>
<span class="cm">/* ... */</span>
<span class="p">}</span>
</code></pre></div></div>
<p>And similarly for <code class="language-plaintext highlighter-rouge">select:</code> completions, we update <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/client/shared/src/search/query/selectFilter.ts?L8:14"><code class="language-plaintext highlighter-rouge">SELECTORS</code></a>:</p>
<div class="language-ts highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">export</span> <span class="kd">const</span> <span class="nx">SELECTORS</span><span class="p">:</span> <span class="nx">Access</span><span class="p">[]</span> <span class="o">=</span> <span class="p">[</span>
<span class="cm">/* ... */</span>
<span class="p">{</span>
<span class="na">name</span><span class="p">:</span> <span class="dl">'</span><span class="s1">notebook</span><span class="dl">'</span><span class="p">,</span>
<span class="na">fields</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="na">name</span><span class="p">:</span> <span class="dl">'</span><span class="s1">block</span><span class="dl">'</span><span class="p">,</span>
<span class="na">fields</span><span class="p">:</span> <span class="p">[{</span> <span class="na">name</span><span class="p">:</span> <span class="dl">'</span><span class="s1">md</span><span class="dl">'</span> <span class="p">},</span> <span class="p">{</span> <span class="na">name</span><span class="p">:</span> <span class="dl">'</span><span class="s1">query</span><span class="dl">'</span> <span class="p">},</span> <span class="p">{</span> <span class="na">name</span><span class="p">:</span> <span class="dl">'</span><span class="s1">file</span><span class="dl">'</span> <span class="p">},</span> <span class="p">{</span> <span class="na">name</span><span class="p">:</span> <span class="dl">'</span><span class="s1">symbol</span><span class="dl">'</span> <span class="p">}],</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">},</span>
<span class="p">]</span>
</code></pre></div></div>
<figure>
<img src="/assets/images/posts/extending-search/select-suggest.png" />
<figcaption>
Suggestions!
</figcaption>
</figure>
<p>And now things get a bit hacky. For plain notebook results, we can leverage the same components used for repository matches with reasonable results by extending <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/client/search-ui/src/results/StreamingSearchResultsList.tsx?L67:14">the <code class="language-plaintext highlighter-rouge">StreamingSearchResultsList</code> component</a>:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">export</span> <span class="kd">const</span> <span class="nx">StreamingSearchResultsList</span><span class="p">:</span> <span class="nx">React</span><span class="p">.</span><span class="nx">FunctionComponent</span><span class="o"><</span><span class="nx">StreamingSearchResultsListProps</span><span class="o">></span> <span class="o">=</span> <span class="p">({</span>
<span class="cm">/* ... */</span>
<span class="p">})</span> <span class="o">=></span> <span class="p">{</span>
<span class="cm">/* ... */</span>
<span class="kd">const</span> <span class="nx">renderResult</span> <span class="o">=</span> <span class="nx">useCallback</span><span class="p">(</span>
<span class="p">(</span><span class="na">result</span><span class="p">:</span> <span class="nx">SearchMatch</span><span class="p">,</span> <span class="na">index</span><span class="p">:</span> <span class="kr">number</span><span class="p">):</span> <span class="nx">JSX</span><span class="p">.</span><span class="nx">Element</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">switch</span> <span class="p">(</span><span class="nx">result</span><span class="p">.</span><span class="kd">type</span><span class="p">)</span> <span class="p">{</span>
<span class="cm">/* ... */</span>
<span class="k">case</span> <span class="dl">'</span><span class="s1">notebook</span><span class="dl">'</span><span class="p">:</span>
<span class="k">return</span> <span class="p">(</span>
<span class="p"><</span><span class="nc">SearchResult</span>
<span class="na">icon</span><span class="p">=</span><span class="si">{</span><span class="nx">NotebookIcon</span><span class="si">}</span>
<span class="na">result</span><span class="p">=</span><span class="si">{</span><span class="nx">result</span><span class="si">}</span>
<span class="na">repoName</span><span class="p">=</span><span class="si">{</span><span class="s2">`</span><span class="p">${</span><span class="nx">result</span><span class="p">.</span><span class="k">namespace</span><span class="p">}</span><span class="s2"> / </span><span class="p">${</span><span class="nx">result</span><span class="p">.</span><span class="nx">title</span><span class="p">}</span><span class="s2">`</span><span class="si">}</span>
<span class="na">platformContext</span><span class="p">=</span><span class="si">{</span><span class="nx">platformContext</span><span class="si">}</span>
<span class="na">onSelect</span><span class="p">=</span><span class="si">{</span><span class="p">()</span> <span class="o">=></span> <span class="nx">logSearchResultClicked</span><span class="p">(</span><span class="nx">index</span><span class="p">,</span> <span class="dl">'</span><span class="s1">notebook</span><span class="dl">'</span><span class="p">)</span><span class="si">}</span>
<span class="p">/></span>
<span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="k">return</span> <span class="p">(</span><span class="cm">/* ... */</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<figure>
<img src="/assets/images/posts/extending-search/notebook-results.png" />
</figure>
<p>For notebook blocks, things started to get <em>really</em> hacky. I had originally expected to just render the parameters encoded in the block (for example, the query in a query block). However, <a href="https://github.com/tsenart">@tsenart</a> pointed out that maybe we could render the blocks <em>exactly</em> as it is rendered within a notebook. I thought this would be brilliant! Surely it would be as easy as simply importing the correct component and providing it with the blocks in a block match - how messy could this be?</p>
<p>Well, using <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@73a484e/-/blob/client/web/src/notebooks/notebook/NotebookComponent.tsx?L99:14"><code class="language-plaintext highlighter-rouge">NotebookComponent</code></a> ended up looking like this:</p>
<div class="language-tsx highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">case</span> <span class="dl">'</span><span class="s1">notebook.block</span><span class="dl">'</span><span class="p">:</span>
<span class="k">return</span> <span class="p">(</span>
<span class="p"><</span><span class="nc">ResultContainer</span>
<span class="na">icon</span><span class="p">=</span><span class="si">{</span><span class="nx">NotebookIcon</span><span class="si">}</span>
<span class="na">title</span><span class="p">=</span><span class="si">{</span>
<span class="p"><</span><span class="nc">Link</span> <span class="na">to</span><span class="p">=</span><span class="si">{</span><span class="nx">result</span><span class="p">.</span><span class="nx">notebook</span><span class="p">.</span><span class="nx">url</span><span class="si">}</span><span class="p">></span>
<span class="si">{</span><span class="nx">result</span><span class="p">.</span><span class="nx">notebook</span><span class="p">.</span><span class="k">namespace</span><span class="si">}</span> / <span class="si">{</span><span class="nx">result</span><span class="p">.</span><span class="nx">notebook</span><span class="p">.</span><span class="nx">title</span><span class="si">}</span>
<span class="p"></</span><span class="nc">Link</span><span class="p">></span>
<span class="si">}</span>
<span class="na">collapsible</span><span class="p">=</span><span class="si">{</span><span class="kc">false</span><span class="si">}</span>
<span class="na">defaultExpanded</span><span class="p">=</span><span class="si">{</span><span class="kc">true</span><span class="si">}</span>
<span class="na">resultType</span><span class="p">=</span><span class="si">{</span><span class="nx">result</span><span class="p">.</span><span class="kd">type</span><span class="si">}</span>
<span class="na">onResultClicked</span><span class="p">=</span><span class="si">{</span><span class="nx">noop</span><span class="si">}</span>
<span class="na">expandedChildren</span><span class="p">=</span><span class="si">{</span>
<span class="p"><</span><span class="nt">div</span> <span class="na">className</span><span class="p">=</span><span class="si">{</span><span class="nx">styles</span><span class="p">.</span><span class="nx">notebookBlockResult</span><span class="si">}</span><span class="p">></span>
<span class="p"><</span><span class="nc">NotebookComponent</span>
<span class="na">key</span><span class="p">=</span><span class="si">{</span><span class="s2">`</span><span class="p">${</span><span class="nx">result</span><span class="p">.</span><span class="nx">notebook</span><span class="p">.</span><span class="nx">id</span><span class="p">}</span><span class="s2">-blocks`</span><span class="si">}</span>
<span class="na">isEmbedded</span><span class="p">=</span><span class="si">{</span><span class="kc">true</span><span class="si">}</span>
<span class="na">noRunButton</span><span class="p">=</span><span class="si">{</span><span class="kc">true</span><span class="si">}</span>
<span class="c1">// TODO HACK: DB, component, and GraphQL block types</span>
<span class="c1">// don't align so we need to massage it into a type</span>
<span class="c1">// this component finds acceptable</span>
<span class="na">blocks</span><span class="p">=</span><span class="si">{</span><span class="nx">result</span><span class="p">.</span><span class="nx">blocks</span><span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="nx">b</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">b</span><span class="p">.</span><span class="nx">queryInput</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="p">{</span> <span class="p">...</span><span class="nx">b</span><span class="p">,</span> <span class="na">input</span><span class="p">:</span> <span class="p">{</span> <span class="na">query</span><span class="p">:</span> <span class="nx">b</span><span class="p">.</span><span class="nx">queryInput</span><span class="p">.</span><span class="nx">text</span> <span class="p">}</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">{</span>
<span class="p">...</span><span class="nx">b</span><span class="p">,</span>
<span class="na">input</span><span class="p">:</span>
<span class="nx">b</span><span class="p">.</span><span class="nx">markdownInput</span> <span class="o">||</span> <span class="nx">b</span><span class="p">.</span><span class="nx">fileInput</span> <span class="o">||</span> <span class="nx">b</span><span class="p">.</span><span class="nx">symbolInput</span> <span class="o">||</span> <span class="nx">b</span><span class="p">.</span><span class="nx">computeInput</span><span class="p">,</span>
<span class="p">}</span>
<span class="p">})</span><span class="si">}</span>
<span class="na">authenticatedUser</span><span class="p">=</span><span class="si">{</span><span class="kc">null</span><span class="si">}</span>
<span class="na">globbing</span><span class="p">=</span><span class="si">{</span><span class="kc">false</span><span class="si">}</span>
<span class="na">isReadOnly</span><span class="p">=</span><span class="si">{</span><span class="kc">true</span><span class="si">}</span>
<span class="na">extensionsController</span><span class="p">=</span><span class="si">{</span><span class="nx">extensionsController</span><span class="si">}</span>
<span class="na">hoverifier</span><span class="p">=</span><span class="si">{</span><span class="nx">hoverifier</span><span class="si">}</span>
<span class="na">platformContext</span><span class="p">=</span><span class="si">{</span><span class="nx">platformContext</span><span class="si">}</span>
<span class="na">exportedFileName</span><span class="p">=</span><span class="si">{</span><span class="nx">result</span><span class="p">.</span><span class="nx">notebook</span><span class="p">.</span><span class="nx">title</span><span class="si">}</span>
<span class="na">onSerializeBlocks</span><span class="p">=</span><span class="si">{</span><span class="nx">noop</span><span class="si">}</span>
<span class="na">onCopyNotebook</span><span class="p">=</span><span class="si">{</span><span class="p">()</span> <span class="o">=></span> <span class="nx">NEVER</span><span class="si">}</span>
<span class="na">streamSearch</span><span class="p">=</span><span class="si">{</span><span class="p">()</span> <span class="o">=></span> <span class="nx">NEVER</span><span class="si">}</span> <span class="c1">// TODO make this jump to new search page instead</span>
<span class="na">isLightTheme</span><span class="p">=</span><span class="si">{</span><span class="nx">isLightTheme</span><span class="si">}</span>
<span class="na">telemetryService</span><span class="p">=</span><span class="si">{</span><span class="nx">telemetryService</span><span class="si">}</span>
<span class="na">fetchHighlightedFileLineRanges</span><span class="p">=</span><span class="si">{</span><span class="nx">fetchHighlightedFileLineRanges</span><span class="si">}</span>
<span class="na">searchContextsEnabled</span><span class="p">=</span><span class="si">{</span><span class="nx">searchContextsEnabled</span><span class="si">}</span>
<span class="na">settingsCascade</span><span class="p">=</span><span class="si">{</span><span class="nx">settingsCascade</span><span class="si">}</span>
<span class="na">isSourcegraphDotCom</span><span class="p">=</span><span class="si">{</span><span class="nx">isSourcegraphDotCom</span><span class="si">}</span>
<span class="na">showSearchContext</span><span class="p">=</span><span class="si">{</span><span class="nx">showSearchContext</span><span class="si">}</span>
<span class="p">/></span>
<span class="p"></</span><span class="nt">div</span><span class="p">></span>
<span class="si">}</span>
<span class="p">/></span>
<span class="p">)</span>
</code></pre></div></div>
<p>Gnarly, eh? All these fields required me to do all sorts of things to <code class="language-plaintext highlighter-rouge">StreamingSearchResultsListProps</code> to get the props needed. Full disclaimer: I am far from a professional when it comes to web apps and React, so I’m sure there’s a better way to do this than prop drilling, but oh well. The <code class="language-plaintext highlighter-rouge">NotebookComponent</code> also doesn’t feel like it was meant for this kind of import and use, given notebooks is a pretty new product and the whole philosophy of iterate fast and polish later and all.</p>
<p>That said, once the compiler stopped complaining the results were great - everything kind of <em>just worked</em>, and looked pretty good after some CSS adjustments! Even running query blocks worked nicely.</p>
<figure>
<img src="/assets/images/posts/extending-search/block-search.png" />
</figure>
<p>Of course, this begs the question - what if you make a notebook search, within a search notebook? Well, that works too!</p>
<figure>
<video autoplay="" loop="" muted="" playsinline="">
<source src="/assets/images/posts/extending-search/recursive-notebook.mp4" type="video/mp4" />
</video>
<figcaption>
Search-notebooks-ception?
</figcaption>
</figure>
<p>You can also check out a brief final demo I made of the state of the project at the end of the hackathon for how this all ties together:</p>
<p><a href="https://www.loom.com/share/23c8d3f23bf942f3ba24896472047f5b"><img src="https://cdn.loom.com/sessions/thumbnails/23c8d3f23bf942f3ba24896472047f5b-1648802342917-with-play.gif" alt="demo" /></a></p>
<p>You can also check out the (messy) (and incomplete) code here: <a href="https://github.com/sourcegraph/sourcegraph/pull/33316">sourcegraph#33316</a></p>
<p><br /></p>
<h2 id="wrap-up">Wrap-up</h2>
<p>Thanks for reading! I hope this was an interesting glimpse at how search works at Sourcegraph. I’m not sure if this will ever make it into the product, but regardless, this was a really fun foray into a part of the codebase I’ve only interacted with at a surface level through my <a href="https://github.com/bobheadxi/raycast-sourcegraph">Sourcegraph for Raycast extension project</a>, and learning about the abstractions used to power code search (and more!) was fascinating, and a nice change of pace from <a href="/experience/sourcegraph">my usual work</a>!</p>
<p><br /></p>
<h2 id="about-sourcegraph">About Sourcegraph</h2>
<p>Sourcegraph builds universal code search for every developer and company so they can innovate faster. We help developers and companies with billions of lines of code create the software you use every day.
Learn more about Sourcegraph <a href="https://about.sourcegraph.com/">here</a>.</p>
<p>Interested in joining? <a href="https://about.sourcegraph.com/jobs/">We’re hiring</a>!</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:kudos" role="doc-endnote">
<p>So somewhat embarrassingly, on one of my iterations of this project <a href="https://github.com/sourcegraph/sourcegraph/pull/33161">I complained a bit about the tedium of the many layers in the search backend</a>, at which point I was educated by <a href="https://comby.dev/">Comby (structural search)</a> creator <a href="https://github.com/rvantonder">@rvantonder</a> on how <a href="https://github.com/sourcegraph/sourcegraph/pull/33161#issuecomment-1081441870">cleaning up the search internals is an ongoing effort and has improved significantly over the past year</a>. One of my biggest takeaways from this project is that search a very complex system and that building a suitable abstraction for the myriad of types of search that Sourcegraph already features is a monumental undertaking! <a href="#fnref:kudos" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:timeout" role="doc-endnote">
<p>By default, Sourcegraph search is limited to optimise for fast results. This extensiveness of a search is configurable through the <code class="language-plaintext highlighter-rouge">count:</code> and <code class="language-plaintext highlighter-rouge">timeout:</code>, as well as a special <code class="language-plaintext highlighter-rouge">count:all</code> mode, as described in our documentation: <a href="https://docs.sourcegraph.com/code_search/how-to/exhaustive">Exhaustive search</a>. <a href="#fnref:timeout" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>robertSourcegraph recently held a brief internal hackathon where we got to work on a variety of ideas related to our freshly minted “Sourcegraph use cases”. One idea that was raised was extending Sourcegraph’s core code search functionality to allow queries over search notebooks, a new product that enables live and persistent documentation based on code search, to aid in content discovery for onboarding.Self-documenting and self-updating tooling2022-02-20T00:00:00+00:002022-02-20T00:00:00+00:00https://bobheadxi.dev/self-documenting-self-updating<p>In a rapidly moving organization, documentation drift is inevitable as the underlying tools undergoes changes to suit changing needs, especially for internal tools where leaning on tribal knowledge can often be more efficient in the short term. As each component grows in complexity, however, this introduces debt that makes for a confusing onboarding process, a poor developer experience, and makes building integrations more difficult.</p>
<p>One approach for keeping documentation debt at bay is to choose tools that come with automated writing of documentation built-in. You can design your code in such a way that code documentation generators can also double as user guides (which I explored with <a href="/introducing-new-launch-pad-site/">my rewrite of the UBC Launch Pad website</a>’s generated <a href="https://ubclaunchpad.com/config">configuration documentation</a>), or specifications that can generate both code and documentation (which I tried with <a href="/building-inertia/">Inertia</a>’s <a href="https://inertia.ubclaunchpad.com/api/">API reference</a>). Some libraries, like Cobra, a Go library for build CLIs, can also generate reference documentation for commands (such as <a href="/building-inertia/">Inertia</a>’s <a href="https://inertia.ubclaunchpad.com/cli/inertia_$%7Bremote_name%7D.html">CLI reference</a>). This allows you to meet your users where they are - for example, the less technically oriented can check out a website while the more hands-on users can find what they need within the code or in the command line - while maintaining a single source of truth that keeps everything up to date.</p>
<p>Of course, in addition to generated documentation you do still need to write documentation to tie the pieces together - for example, the <a href="https://github.com/ubclaunchpad/ubclaunchpad.com/blob/master/README.md">UBC Launch Pad website still had a brief intro guide</a> and we did put together a <a href="https://inertia.ubclaunchpad.com/">usage guide for Inertia</a>, but generated documentation helps you ensure the nitty gritty stays up to date, and focus on high-level guidance in your handcrafted writing.</p>
<p>At <a href="/experience/sourcegraph">Sourcegraph</a>, I’ve been exploring avenues for taking this even further. Once you move away from off-the-shelf generators and invest in leveraging your code to generate exactly what you need, you can build a pretty neat ecosystem of not just documentation generators, but also interesting integrations and tooling that is always up to date by design. In this article, I’ll talk about some of the things we’ve built with this approach in mind: Sourcegraph’s <a href="#observability-ecosystem">observability ecosystem</a> and <a href="#continuous-integration-pipelines">continuous integration pipelines</a>.</p>
<p><br /></p>
<h2 id="observability-ecosystem">Observability ecosystem</h2>
<p>The Sourcegraph product has shipped with Prometheus metrics and Grafana dashboards for quite a while, used both by Sourcegraph for <a href="https://sourcegraph.com">Sourcegraph Cloud</a> and by self-hosted customers to operate Sourcegraph instances. These have been created from our own Go-based specification since before I started working here. The spec would look something like this (truncated for brevity):</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">GitServer</span><span class="p">()</span> <span class="o">*</span><span class="n">Container</span> <span class="p">{</span>
<span class="k">return</span> <span class="o">&</span><span class="n">Container</span><span class="p">{</span>
<span class="n">Name</span><span class="o">:</span> <span class="s">"gitserver"</span><span class="p">,</span>
<span class="n">Title</span><span class="o">:</span> <span class="s">"Git Server"</span><span class="p">,</span>
<span class="n">Description</span><span class="o">:</span> <span class="s">"Stores, manages, and operates Git repositories."</span><span class="p">,</span>
<span class="n">Groups</span><span class="o">:</span> <span class="p">[]</span><span class="n">Group</span><span class="p">{{</span>
<span class="n">Title</span><span class="o">:</span> <span class="s">"General"</span><span class="p">,</span>
<span class="n">Rows</span><span class="o">:</span> <span class="p">[]</span><span class="n">Row</span><span class="p">{{</span>
<span class="c">// Each dashboard panel and alert is associated with an "observable"</span>
<span class="n">Observable</span><span class="p">{</span>
<span class="n">Name</span><span class="o">:</span> <span class="s">"disk_space_remaining"</span><span class="p">,</span>
<span class="n">Description</span><span class="o">:</span> <span class="s">"disk space remaining by instance"</span><span class="p">,</span>
<span class="n">Query</span><span class="o">:</span> <span class="s">`(src_gitserver_disk_space_available / src_gitserver_disk_space_total)*100`</span><span class="p">,</span>
<span class="c">// Configure Prometheus alerts</span>
<span class="n">Warning</span><span class="o">:</span> <span class="n">Alert</span><span class="p">{</span><span class="n">LessOrEqual</span><span class="o">:</span> <span class="m">25</span><span class="p">},</span>
<span class="c">// Configure Grafana panel</span>
<span class="n">PanelOptions</span><span class="o">:</span> <span class="n">PanelOptions</span><span class="p">()</span><span class="o">.</span><span class="n">LegendFormat</span><span class="p">(</span><span class="s">"{{instance}}"</span><span class="p">)</span><span class="o">.</span><span class="n">Unit</span><span class="p">(</span><span class="n">Percentage</span><span class="p">),</span>
<span class="c">// Some options, like this one, makes changes to both how the panel</span>
<span class="c">// is rendered as well as when the alert fires</span>
<span class="n">DataMayNotExist</span><span class="o">:</span> <span class="no">true</span><span class="p">,</span>
<span class="c">// Configure documentation about possible solutions if the alert fires</span>
<span class="n">PossibleSolutions</span><span class="o">:</span> <span class="s">`
- **Provision more disk space:** Sourcegraph will begin deleting...
`</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">}},</span>
<span class="p">}},</span>
<span class="p">},</span>
<span class="p">}</span>
</code></pre></div></div>
<figure>
<figcaption>
Explore
<a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@3.17/-/blob/monitoring/git_server.go">what our monitoring generator looked like in Sourcegraph 3.17</a>
(circa mid-2020)
</figcaption>
</figure>
<p>From here, a program will import the definitions and generate the appropriate Prometheus <a href="https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/">recording rules</a>, Grafana <a href="https://grafana.com/docs/grafana/latest/dashboards/json-model/">dashboard specs</a>, and a simple customer-facing “alert solutions” page. Any changes that engineers made to their monitoring definitions using the specification would automatically update everything that needed to be updated, no additional work needed.</p>
<p>For example, the Grafana dashboard spec generation automatically calculates appropriate widths and heights for each panel you add, ensuring they are evenly distributed and include lines that indicate Prometheus alert thresholds, a uniform look and feel, and more.</p>
<p>I loved this idea, so I ran with it and worked on a series of changes that expanded the capabilities of this system significantly. Today, our monitoring specification powers:</p>
<ul>
<li>Multiple reference pages: a <a href="https://docs.sourcegraph.com/admin/observability/alert_solutions">revamped alerts reference</a> and a page that <a href="https://docs.sourcegraph.com/admin/observability/dashboards">focuses on background information about each dashboard panel</a>, that both customers and engineers at Sourcegraph can reference. It now also includes information about which teams own what dashboards and alerts to help customer support better triage support requests and how to easily silence alerts through our new integration with Alertmanager.</li>
</ul>
<p><img src="/assets/images/posts/self-documenting/alert-reference.png" /></p>
<ul>
<li>Grafana dashboards that now automatically includes links to the generated documentation, annotation layers for generated alerts, improved alert overview graphs, and more.</li>
</ul>
<figure>
<video autoplay="" loop="" muted="" playsinline="">
<source src="/assets/images/posts/self-documenting/dashboard-annotations.webm" type="video/webm" />
<source src="/assets/images/posts/self-documenting/dashboard-annotations.mp4" type="video/mp4" />
</video>
<figcaption>
Version and alert annotations in Sourcegraph's generated dashboards. Dashboard like these are automatically provided by defining observables using our monitoring specification, alongside everything else mentioned previously.
</figcaption>
</figure>
<ul>
<li>Prometheus integration that now generates more granular <a href="https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/">alert rules</a> that include additional metadata such as the ID of the associated generated dashboard panel, the team that owns the alert, and more.</li>
<li>An entirely new Alertmanager integration (<a href="/docker-sidecar/">related blog post</a>) that allows you to <a href="https://docs.sourcegraph.com/admin/observability/alerting#setting-up-alerting">easily configure alert notifications via the Sourcegraph application</a>, which automatically sets up the appropriate routes and configures messages to include relevant information for triaging alerts: a helpful summary, links to documentation, and links to the relevant dashboard panel in the time window of the alert. This leverages the aforementioned generated Prometheus metrics!</li>
</ul>
<figure>
<img src="/assets/images/posts/self-documenting/alert-notification.png" />
<figcaption>
Automatically configured alert notification messages feature a helpful summary and links to diagnose the issue further for a variety of supported notification services, such as Slack and OpsGenie.
</figcaption>
</figure>
<p>The API has changed as well to improve its flexibility and enable many of the features listed above. Nowadays, a monitoring specification might look like this (also truncated for brevity):</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Definitions are separated from the API so everything is imported from 'monitoring' now,</span>
<span class="c">// which allows for a more tightly controlled API.</span>
<span class="k">func</span> <span class="n">GitServer</span><span class="p">()</span> <span class="o">*</span><span class="n">monitoring</span><span class="o">.</span><span class="n">Container</span> <span class="p">{</span>
<span class="k">return</span> <span class="o">&</span><span class="n">monitoring</span><span class="o">.</span><span class="n">Container</span><span class="p">{</span>
<span class="n">Name</span><span class="o">:</span> <span class="s">"gitserver"</span><span class="p">,</span>
<span class="n">Title</span><span class="o">:</span> <span class="s">"Git Server"</span><span class="p">,</span>
<span class="n">Description</span><span class="o">:</span> <span class="s">"Stores, manages, and operates Git repositories."</span><span class="p">,</span>
<span class="c">// Easily create template variables without diving into the underlying JSON spec</span>
<span class="n">Variables</span><span class="o">:</span> <span class="p">[]</span><span class="n">monitoring</span><span class="o">.</span><span class="n">ContainerVariable</span><span class="p">{{</span>
<span class="n">Label</span><span class="o">:</span> <span class="s">"Shard"</span><span class="p">,</span>
<span class="n">Name</span><span class="o">:</span> <span class="s">"shard"</span><span class="p">,</span>
<span class="n">OptionsQuery</span><span class="o">:</span> <span class="s">"label_values(src_gitserver_exec_running, instance)"</span><span class="p">,</span>
<span class="n">Multi</span><span class="o">:</span> <span class="no">true</span><span class="p">,</span>
<span class="p">}},</span>
<span class="n">Groups</span><span class="o">:</span> <span class="p">[]</span><span class="n">monitoring</span><span class="o">.</span><span class="n">Group</span><span class="p">{{</span>
<span class="n">Title</span><span class="o">:</span> <span class="s">"General"</span><span class="p">,</span>
<span class="n">Rows</span><span class="o">:</span> <span class="p">[]</span><span class="n">monitoring</span><span class="o">.</span><span class="n">Row</span><span class="p">{{</span>
<span class="p">{</span>
<span class="n">Name</span><span class="o">:</span> <span class="s">"disk_space_remaining"</span><span class="p">,</span>
<span class="n">Description</span><span class="o">:</span> <span class="s">"disk space remaining by instance"</span><span class="p">,</span>
<span class="n">Query</span><span class="o">:</span> <span class="s">`(src_gitserver_disk_space_available / src_gitserver_disk_space_total)*100`</span><span class="p">,</span>
<span class="c">// Alerting API expanded with additional options to leverage more</span>
<span class="c">// Prometheus features</span>
<span class="n">Warning</span><span class="o">:</span> <span class="n">monitoring</span><span class="o">.</span><span class="n">Alert</span><span class="p">()</span><span class="o">.</span><span class="n">LessOrEqual</span><span class="p">(</span><span class="m">25</span><span class="p">)</span><span class="o">.</span><span class="n">For</span><span class="p">(</span><span class="n">time</span><span class="o">.</span><span class="n">Minute</span><span class="p">),</span>
<span class="n">Panel</span><span class="o">:</span> <span class="n">monitoring</span><span class="o">.</span><span class="n">Panel</span><span class="p">()</span><span class="o">.</span><span class="n">LegendFormat</span><span class="p">(</span><span class="s">"{{instance}}"</span><span class="p">)</span><span class="o">.</span>
<span class="n">Unit</span><span class="p">(</span><span class="n">monitoring</span><span class="o">.</span><span class="n">Percentage</span><span class="p">)</span><span class="o">.</span>
<span class="c">// Functional configuration API that allows you to provide a</span>
<span class="c">// callback to configure the underlying Grafana panel further, or</span>
<span class="c">// use one of the shared options to share common options</span>
<span class="n">With</span><span class="p">(</span><span class="n">monitoring</span><span class="o">.</span><span class="n">PanelOptions</span><span class="o">.</span><span class="n">LegendOnRight</span><span class="p">()),</span>
<span class="c">// Owners can now be defined on observables, which allows support</span>
<span class="c">// to help triage customer queries and is used internally to route</span>
<span class="c">// pager alerts</span>
<span class="n">Owner</span><span class="o">:</span> <span class="n">monitoring</span><span class="o">.</span><span class="n">ObservableOwnerCoreApplication</span><span class="p">,</span>
<span class="c">// Documentation fields are still around, but an 'Interpretation' can</span>
<span class="c">// now also be provided for more obscure background on observables,</span>
<span class="c">// especially if they aren't tied to an alert</span>
<span class="n">PossibleSolutions</span><span class="o">:</span> <span class="s">`
- **Provision more disk space:** Sourcegraph will begin deleting...
`</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">}},</span>
<span class="p">}},</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<figure>
<figcaption>
Explore
<a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/monitoring/definitions/git_server.go">what our monitoring generator looks like today</a>!
</figcaption>
</figure>
<p>Since the specification is built on a typed language, the API itself is self-documenting in that authors of monitoring definitions can easily access what options are available and what each does through <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/docs/monitoring/monitoring">generated API docs</a> or code intelligence available in Sourcegraph or in your IDE, making it very easy to pick up and work with.</p>
<p><img src="../../assets/images/posts/self-documenting/monitoring-api-hover.png" alt="" /></p>
<figure>
<img src="/assets/images/posts/self-documenting/monitoring-api-docs.png" />
<figcaption>
Example <a href="https://about.sourcegraph.com/blog/api-documentation-for-all-your-code/">Sourcegraph API docs</a> of the monitoring API, though similar docs can also be generated by other language-specific tools.
</figcaption>
</figure>
<p>We also now have a tool, <a href="https://docs.sourcegraph.com/dev/background-information/sg"><code class="language-plaintext highlighter-rouge">sg</code></a>, that enables us to spin up just the monitoring stack, complete with hot-reloading of Grafana dashboards, Prometheus configuration, and with a single command: <code class="language-plaintext highlighter-rouge">sg start monitoring</code>. You can even easily <a href="https://docs.sourcegraph.com/dev/how-to/monitoring_local_dev#grafana">test your dashboards against production metrics</a>! This is all enabled by having a single tool and set of specifications as the source of truth for all our monitoring integrations.</p>
<p>This all comes together to form a cohesive monitoring development and usage ecosystem that is tightly integrated, encodes best practices, self-documenting (both in the content it generates as well as the APIs available), and easy to extend.</p>
<p>Learn more about our observability ecosystem in our <a href="https://docs.sourcegraph.com/dev/background-information/observability">developer documentation</a>, and check out the <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/monitoring/monitoring">monitoring generator source code here</a>.</p>
<p><br /></p>
<h2 id="continuous-integration-pipelines">Continuous integration pipelines</h2>
<p>At Sourcegraph, our core continuous integration pipeline are - you guessed it - generated! Our <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/enterprise/dev/ci">pipeline generator program</a> analyses a build’s variables (changes, branch names, commit messages, environment variables, and more) in order to create a pipeline to run on our <a href="https://buildkite.com/">Buildkite</a> agent fleet.</p>
<p>Typically, <a href="https://buildkite.com/docs/pipelines/defining-steps">Buildkite pipelines</a> are specified similarly to <a href="https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions">GitHub Action workflows</a> - by committing a YAML file to your repository that build agents pick up and run. This YAML file will specify what commands should get run over your codebase, and will usually support some simple conditions.</p>
<p>These conditions are not very ergonomic to specify, however, and will often be limited in functionality - so instead, we generate the entire pipeline on the fly:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">steps</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">group</span><span class="pi">:</span> <span class="s2">"</span><span class="s">Pipeline</span><span class="nv"> </span><span class="s">setup"</span>
<span class="na">steps</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">label</span><span class="pi">:</span> <span class="s1">'</span><span class="s">:hammer_and_wrench:</span><span class="nv"> </span><span class="s">:pipeline:</span><span class="nv"> </span><span class="s">Generate</span><span class="nv"> </span><span class="s">pipeline'</span>
<span class="c1"># Prioritise generating pipelines so that jobs can get generated and queued up as soon</span>
<span class="c1"># as possible, so as to better assess pipeline load e.g. to scale the Buildkite fleet.</span>
<span class="na">priority</span><span class="pi">:</span> <span class="m">10</span>
<span class="na">command</span><span class="pi">:</span> <span class="pi">|</span>
<span class="s">echo "--- generate pipeline"</span>
<span class="s">go run ./enterprise/dev/ci/gen-pipeline.go | tee generated-pipeline.yml</span>
<span class="s">echo "--- upload pipeline"</span>
<span class="s">buildkite-agent pipeline upload generated-pipeline.yml</span>
</code></pre></div></div>
<p>The pipeline generator has also been around at Sourcegraph since long before I joined, but I’ve since done some significant refactors to it, including refactoring some of its core functionality - what we call “run types” and “diff types”, which are used to determine the appropriate pipeline go generate for any given build. This allows us to do a <em>ton</em> of cool things.</p>
<p>First, some background on the technical details. A run type is specified as follows:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// RunTypeMatcher defines the requirements for any given build to be considered a build of</span>
<span class="c">// this RunType.</span>
<span class="k">type</span> <span class="n">RunTypeMatcher</span> <span class="k">struct</span> <span class="p">{</span>
<span class="c">// Branch loosely matches branches that begin with this value, unless a different type</span>
<span class="c">// of match is indicated (e.g. BranchExact, BranchRegexp)</span>
<span class="n">Branch</span> <span class="kt">string</span>
<span class="n">BranchExact</span> <span class="kt">bool</span>
<span class="n">BranchRegexp</span> <span class="kt">bool</span>
<span class="c">// BranchArgumentRequired indicates the path segment following the branch prefix match is</span>
<span class="c">// expected to be an argument (does not work in conjunction with BranchExact)</span>
<span class="n">BranchArgumentRequired</span> <span class="kt">bool</span>
<span class="c">// TagPrefix matches tags that begin with this value.</span>
<span class="n">TagPrefix</span> <span class="kt">string</span>
<span class="c">// EnvIncludes validates if these key-value pairs are configured in environment.</span>
<span class="n">EnvIncludes</span> <span class="k">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">string</span>
<span class="p">}</span>
</code></pre></div></div>
<p>When matched, a <code class="language-plaintext highlighter-rouge">RunType = iota</code> is associated with the build, which can then be leveraged to determine what kinds of steps to include. For example:</p>
<ul>
<li>Pull requests run a bare-bones pipeline generated from what has changed in your pull requests (read on to learn more) - this enables us to keep feedback loops short on pull requests.</li>
<li>Tagged release builds run our full suite of tests, and publishes finalised images to our public Docker registries.</li>
<li>The <code class="language-plaintext highlighter-rouge">main</code> branch runs our full suite of tests, and publishes preview versions of our images to internal Docker registries. It also generates notifications that can notify build authors if their builds have failed in <code class="language-plaintext highlighter-rouge">main</code>.</li>
<li>Similarly, a “main dry run” run type is available by pushing to a branch prefixed with <code class="language-plaintext highlighter-rouge">main-dry-run/</code> - this runs <em>almost</em> everything that gets run on <code class="language-plaintext highlighter-rouge">main</code>. Useful for double-checking your changes will pass when merged.</li>
<li>Scheduled builds are run with specific environment variables for browser extension releases and release branch health checks.</li>
</ul>
<figure>
<div class="embed search-notebook">
<iframe src="https://sourcegraph.com/embed/notebooks/Tm90ZWJvb2s6MTU5" frameborder="0" sandbox="allow-scripts allow-same-origin allow-popups">
</iframe>
</div>
<figcaption>
A <a href="https://sourcegraph.com/notebooks/Tm90ZWJvb2s6MTU5">search notebook walkthrough of how run types are used</a>!
</figcaption>
</figure>
<p>A “diff type” is generated by a diff detector that can work similarly to GitHub Action’s <code class="language-plaintext highlighter-rouge">on.paths</code>, but also enables a lot more flexibility. For example, we detect basic “Go” diffs like so:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">strings</span><span class="o">.</span><span class="n">HasSuffix</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="s">".go"</span><span class="p">)</span> <span class="o">||</span> <span class="n">p</span> <span class="o">==</span> <span class="s">"go.sum"</span> <span class="o">||</span> <span class="n">p</span> <span class="o">==</span> <span class="s">"go.mod"</span> <span class="p">{</span>
<span class="n">diff</span> <span class="o">|=</span> <span class="n">Go</span>
<span class="p">}</span>
</code></pre></div></div>
<p>However, engineers can also define database migrations that might not change Go code - in these situations, we still want to run Go tests, and we also want to run migration tests. We can centralise this detection like this:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">strings</span><span class="o">.</span><span class="n">HasPrefix</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="s">"migrations/"</span><span class="p">)</span> <span class="p">{</span>
<span class="n">diff</span> <span class="o">|=</span> <span class="p">(</span><span class="n">DatabaseSchema</span> <span class="o">|</span> <span class="n">Go</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Our <code class="language-plaintext highlighter-rouge">Diff = 1 << iota</code> type is constructed by bit-shifting an <code class="language-plaintext highlighter-rouge">iota</code> type, so we can easily check for what diffs have been detected with <code class="language-plaintext highlighter-rouge">diff&target != 0</code>, which is done by a helper function, <code class="language-plaintext highlighter-rouge">(*DiffType).Has</code>.</p>
<figure>
<div class="embed search-notebook">
<iframe src="https://sourcegraph.com/embed/notebooks/Tm90ZWJvb2s6MTYw" frameborder="0" sandbox="allow-scripts allow-same-origin allow-popups">
</iframe>
</div>
<figcaption>
A <a href="https://sourcegraph.com/notebooks/Tm90ZWJvb2s6MTYw">search notebook walkthrough of how diff types are used</a>!
</figcaption>
</figure>
<p>The programmatic generation approach allows for some complex step generation that would be very tedious to manage by hand. Take this example:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">diff</span><span class="o">.</span><span class="n">Has</span><span class="p">(</span><span class="n">changed</span><span class="o">.</span><span class="n">DatabaseSchema</span><span class="p">)</span> <span class="p">{</span>
<span class="n">ops</span><span class="o">.</span><span class="n">Merge</span><span class="p">(</span><span class="n">operations</span><span class="o">.</span><span class="n">NewNamedSet</span><span class="p">(</span><span class="s">"DB backcompat tests"</span><span class="p">,</span>
<span class="n">addGoTestsBackcompat</span><span class="p">(</span><span class="n">opts</span><span class="o">.</span><span class="n">MinimumUpgradeableVersion</span><span class="p">)))</span>
<span class="p">}</span>
</code></pre></div></div>
<p>In this scenario, a group of checks (<code class="language-plaintext highlighter-rouge">operations.NewNamedSet</code>) is created to check that migrations being introduced are backwards-compatible. To make this check, we provide it <code class="language-plaintext highlighter-rouge">MinimunUpgradeableVersion</code> - a variable that is updated automatically the <a href="https://handbook.sourcegraph.com/departments/product-engineering/engineering/tools/release/">Sourcegraph release tool</a> to indicate what version of Sourcegraph all changes should be compatible with. The tests being added look like this:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="n">addGoTestsBackcompat</span><span class="p">(</span><span class="n">minimumUpgradeableVersion</span> <span class="kt">string</span><span class="p">)</span> <span class="k">func</span><span class="p">(</span><span class="n">pipeline</span> <span class="o">*</span><span class="n">bk</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="k">func</span><span class="p">(</span><span class="n">pipeline</span> <span class="o">*</span><span class="n">bk</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">)</span> <span class="p">{</span>
<span class="n">buildGoTests</span><span class="p">(</span><span class="k">func</span><span class="p">(</span><span class="n">description</span><span class="p">,</span> <span class="n">testSuffix</span> <span class="kt">string</span><span class="p">)</span> <span class="p">{</span>
<span class="n">pipeline</span><span class="o">.</span><span class="n">AddStep</span><span class="p">(</span>
<span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">":go::postgres: Backcompat test (%s)"</span><span class="p">,</span> <span class="n">description</span><span class="p">),</span>
<span class="n">bk</span><span class="o">.</span><span class="n">Env</span><span class="p">(</span><span class="s">"MINIMUM_UPGRADEABLE_VERSION"</span><span class="p">,</span> <span class="n">minimumUpgradeableVersion</span><span class="p">),</span>
<span class="n">bk</span><span class="o">.</span><span class="n">Cmd</span><span class="p">(</span><span class="s">"./dev/ci/go-backcompat/test.sh "</span><span class="o">+</span><span class="n">testSuffix</span><span class="p">),</span>
<span class="p">)</span>
<span class="p">})</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">buildGoTests</code> is a helper that generates a set of commands to be run against each of the Sourcegraph repository’s Go packages. It is configured to split out more complex packages into separate jobs so that they can be run in parallel across multiple agents. Right now, the generated commands for <code class="language-plaintext highlighter-rouge">addGoTestsBackcompat</code> look like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> • DB backcompat tests
• :go::postgres: Backcompat test (all)
• :go::postgres: Backcompat test (enterprise/internal/codeintel/stores/dbstore)
• :go::postgres: Backcompat test (enterprise/internal/codeintel/stores/lsifstore)
• :go::postgres: Backcompat test (enterprise/internal/insights)
• :go::postgres: Backcompat test (internal/database)
• :go::postgres: Backcompat test (internal/repos)
• :go::postgres: Backcompat test (enterprise/internal/batches)
• :go::postgres: Backcompat test (cmd/frontend)
• :go::postgres: Backcompat test (enterprise/internal/database)
• :go::postgres: Backcompat test (enterprise/cmd/frontend/internal/batches/resolvers)
</code></pre></div></div>
<p>With just the pretty minimal configuration above, each step is generated with a lot of baked-in configuration, many of which is generated automatically for every build step we have.</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="pi">-</span> <span class="na">agents</span><span class="pi">:</span>
<span class="na">queue</span><span class="pi">:</span> <span class="s">standard</span>
<span class="na">command</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">./tr ./dev/ci/go-backcompat/test.sh only github.com/sourcegraph/sourcegraph/internal/database</span>
<span class="na">env</span><span class="pi">:</span>
<span class="na">MINIMUM_UPGRADEABLE_VERSION</span><span class="pi">:</span> <span class="s">3.36.0</span>
<span class="na">key</span><span class="pi">:</span> <span class="s">gopostgresBackcompattestinternaldatabase</span>
<span class="na">label</span><span class="pi">:</span> <span class="s1">'</span><span class="s">:go::postgres:</span><span class="nv"> </span><span class="s">Backcompat</span><span class="nv"> </span><span class="s">test</span><span class="nv"> </span><span class="s">(internal/database)'</span>
<span class="na">timeout_in_minutes</span><span class="pi">:</span> <span class="s2">"</span><span class="s">60"</span>
</code></pre></div></div>
<p>In this snippet, we have:</p>
<ul>
<li>A default queue to run the job on - this can be feature-flagged to run against experimental agents.</li>
<li>The shared <code class="language-plaintext highlighter-rouge">MINIMUM_UPGRADEABLE_VERSION</code> variable that gets used for other steps as well, such as upgrade tests.</li>
<li>A generated key, useful for identifying steps and creating <a href="https://buildkite.com/docs/pipelines/dependencies">step dependencies</a>.</li>
<li>Commands prefixed with <code class="language-plaintext highlighter-rouge">./tr</code>: <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/dev/ci/scripts/trace-command.sh">this script</a> creates and uploads traces for our builds!</li>
</ul>
<figure>
<img src="/assets/images/posts/self-documenting/pipeline-trace.png" />
<figcaption>
Build traces help visualise and track the performance of various pipeline steps.
Uploaded traces are automatically linked from builds via Buildkite annotations for easy reference, and can also be queried directly in <a href="https://www.honeycomb.io/">Honeycomb</a>.
</figcaption>
</figure>
<p>Features like the build step traces <a href="https://github.com/sourcegraph/sourcegraph/pull/29444/files">was implemented without having to make sweeping changes pipeline configuration</a>, thanks to the generated approach - we just had to adjust the generator to inject the appropriate scripting, and now it <em>just works</em> across all commands in the pipeline.</p>
<p>Additional functions are also available that tweak how a step is created. For example, with <code class="language-plaintext highlighter-rouge">bk.AnnotatedCmd</code> one can indicate that a step will generate annotations by writing to <code class="language-plaintext highlighter-rouge">./annotations</code> - a wrapper script is configured to make sure these annotations gets picked up and uploaded via Buildkite’s API:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// AnnotatedCmd runs the given command, picks up files left in the `./annotations`</span>
<span class="c">// directory, and appends them to a shared annotation for this job. For example, to</span>
<span class="c">// generate an annotation file on error:</span>
<span class="c">//</span>
<span class="c">// if [ $EXIT_CODE -ne 0 ]; then</span>
<span class="c">// echo -e "$OUT" >./annotations/shfmt</span>
<span class="c">// fi</span>
<span class="c">//</span>
<span class="c">// Annotations can be formatted based on file extensions, for example:</span>
<span class="c">//</span>
<span class="c">// - './annotations/Job log.md' will have its contents appended as markdown</span>
<span class="c">// - './annotations/shfmt' will have its contents formatted as terminal output on append</span>
<span class="c">//</span>
<span class="c">// Please be considerate about what generating annotations, since they can cause a lot of</span>
<span class="c">// visual clutter in the Buildkite UI. When creating annotations:</span>
<span class="c">//</span>
<span class="c">// - keep them concise and short, to minimze the space they take up</span>
<span class="c">// - ensure they are actionable: an annotation should enable you, the CI user, to know</span>
<span class="c">// where to go and what to do next.</span>
<span class="k">func</span> <span class="n">AnnotatedCmd</span><span class="p">(</span><span class="n">command</span> <span class="kt">string</span><span class="p">,</span> <span class="n">opts</span> <span class="n">AnnotatedCmdOpts</span><span class="p">)</span> <span class="n">StepOpt</span> <span class="p">{</span>
<span class="k">var</span> <span class="n">annotateOpts</span> <span class="kt">string</span>
<span class="c">// ... set up options</span>
<span class="c">// './an' is a script that runs the given command and uploads the exported annotations</span>
<span class="c">// with the given annotation options before exiting.</span>
<span class="n">annotatedCmd</span> <span class="o">:=</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"./an %q %q %q"</span><span class="p">,</span>
<span class="n">tracedCmd</span><span class="p">(</span><span class="n">command</span><span class="p">),</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"%v"</span><span class="p">,</span> <span class="n">opts</span><span class="o">.</span><span class="n">IncludeNames</span><span class="p">),</span> <span class="n">strings</span><span class="o">.</span><span class="n">TrimSpace</span><span class="p">(</span><span class="n">annotateOpts</span><span class="p">))</span>
<span class="k">return</span> <span class="n">RawCmd</span><span class="p">(</span><span class="n">annotatedCmd</span><span class="p">)</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The author of a pipeline step can then easily opt in to having their annotations uploaded by changing <code class="language-plaintext highlighter-rouge">bk.Cmd(...)</code> to <code class="language-plaintext highlighter-rouge">bk.AnnotatedCmd(...)</code>. This allows all steps to easily create annotations by simply writing content to a file, and get them uploaded, formatted, and grouped nicely without having to learn the specifics of the <a href="https://buildkite.com/docs/agent/v3/cli-annotate">Buildkite annotations API</a>:</p>
<figure>
<img src="/assets/images/posts/self-documenting/annotations.png" />
<figcaption>
Annotations can help guide engineers to how to fix build issues.
</figcaption>
</figure>
<p>The usage of <code class="language-plaintext highlighter-rouge">iota</code> types for both <code class="language-plaintext highlighter-rouge">RunType</code> and <code class="language-plaintext highlighter-rouge">DiffType</code> enables us to iterate over available types for some useful features. For example, turning a <code class="language-plaintext highlighter-rouge">DiffType</code> into a string gives a useful summary of what is included in the diff:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">var</span> <span class="n">allDiffs</span> <span class="p">[]</span><span class="kt">string</span>
<span class="n">ForEachDiffType</span><span class="p">(</span><span class="k">func</span><span class="p">(</span><span class="n">checkDiff</span> <span class="n">Diff</span><span class="p">)</span> <span class="p">{</span>
<span class="n">diffName</span> <span class="o">:=</span> <span class="n">checkDiff</span><span class="o">.</span><span class="n">String</span><span class="p">()</span>
<span class="k">if</span> <span class="n">diffName</span> <span class="o">!=</span> <span class="s">""</span> <span class="o">&&</span> <span class="n">d</span><span class="o">.</span><span class="n">Has</span><span class="p">(</span><span class="n">checkDiff</span><span class="p">)</span> <span class="p">{</span>
<span class="n">allDiffs</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">allDiffs</span><span class="p">,</span> <span class="n">diffName</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">})</span>
<span class="k">return</span> <span class="n">strings</span><span class="o">.</span><span class="n">Join</span><span class="p">(</span><span class="n">allDiffs</span><span class="p">,</span> <span class="s">", "</span><span class="p">)</span>
</code></pre></div></div>
<p>We can take that a bit further to iterate over all our run types and diff types in order to generate a reference page of what each pipeline does - since this page gets committed, it is also a good way to visualise changes to generated pipelines caused by code changes as well!</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Generate each diff type for pull requests</span>
<span class="n">changed</span><span class="o">.</span><span class="n">ForEachDiffType</span><span class="p">(</span><span class="k">func</span><span class="p">(</span><span class="n">diff</span> <span class="n">changed</span><span class="o">.</span><span class="n">Diff</span><span class="p">)</span> <span class="p">{</span>
<span class="n">pipeline</span><span class="p">,</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">ci</span><span class="o">.</span><span class="n">GeneratePipeline</span><span class="p">(</span><span class="n">ci</span><span class="o">.</span><span class="n">Config</span><span class="p">{</span>
<span class="n">RunType</span><span class="o">:</span> <span class="n">runtype</span><span class="o">.</span><span class="n">PullRequest</span><span class="p">,</span>
<span class="n">Diff</span><span class="o">:</span> <span class="n">diff</span><span class="p">,</span>
<span class="p">})</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="n">log</span><span class="o">.</span><span class="n">Fatalf</span><span class="p">(</span><span class="s">"Generating pipeline for diff type %q: %s"</span><span class="p">,</span> <span class="n">diff</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">fmt</span><span class="o">.</span><span class="n">Fprintf</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="s">"</span><span class="se">\n</span><span class="s">- Pipeline for `%s` changes:</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">diff</span><span class="p">)</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">raw</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">pipeline</span><span class="o">.</span><span class="n">Steps</span> <span class="p">{</span>
<span class="n">printStepSummary</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="s">" "</span><span class="p">,</span> <span class="n">raw</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">})</span>
<span class="c">// For the other run types, we can also generate detailed information about what</span>
<span class="c">// conditions trigger each run type!</span>
<span class="k">for</span> <span class="n">rt</span> <span class="o">:=</span> <span class="n">runtype</span><span class="o">.</span><span class="n">PullRequest</span> <span class="o">+</span> <span class="m">1</span><span class="p">;</span> <span class="n">rt</span> <span class="o"><</span> <span class="n">runtype</span><span class="o">.</span><span class="n">None</span><span class="p">;</span> <span class="n">rt</span> <span class="o">+=</span> <span class="m">1</span> <span class="p">{</span>
<span class="n">m</span> <span class="o">:=</span> <span class="n">rt</span><span class="o">.</span><span class="n">Matcher</span><span class="p">()</span>
<span class="k">if</span> <span class="n">m</span><span class="o">.</span><span class="n">Branch</span> <span class="o">!=</span> <span class="s">""</span> <span class="p">{</span>
<span class="n">matchName</span> <span class="o">:=</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"`%s`"</span><span class="p">,</span> <span class="n">m</span><span class="o">.</span><span class="n">Branch</span><span class="p">)</span>
<span class="k">if</span> <span class="n">m</span><span class="o">.</span><span class="n">BranchRegexp</span> <span class="p">{</span>
<span class="n">matchName</span> <span class="o">+=</span> <span class="s">" (regexp match)"</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="n">m</span><span class="o">.</span><span class="n">BranchExact</span> <span class="p">{</span>
<span class="n">matchName</span> <span class="o">+=</span> <span class="s">" (exact match)"</span>
<span class="p">}</span>
<span class="n">conditions</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">conditions</span><span class="p">,</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"branches matching %s"</span><span class="p">,</span> <span class="n">matchName</span><span class="p">))</span>
<span class="k">if</span> <span class="n">m</span><span class="o">.</span><span class="n">BranchArgumentRequired</span> <span class="p">{</span>
<span class="n">conditions</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">conditions</span><span class="p">,</span> <span class="s">"requires a branch argument in the second branch path segment"</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="n">m</span><span class="o">.</span><span class="n">TagPrefix</span> <span class="o">!=</span> <span class="s">""</span> <span class="p">{</span>
<span class="n">conditions</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">conditions</span><span class="p">,</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"tags starting with `%s`"</span><span class="p">,</span> <span class="n">m</span><span class="o">.</span><span class="n">TagPrefix</span><span class="p">))</span>
<span class="p">}</span>
<span class="c">// etc.</span>
<span class="p">}</span>
</code></pre></div></div>
<figure>
<img src="/assets/images/posts/self-documenting/sg-ci-docs.png" />
<figcaption>
A web version of this reference page is also published to the <a href="https://docs.sourcegraph.com/dev/background-information/continuous_integration">pipeline types reference</a>. You can also check out the <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/enterprise/dev/ci/gen-pipeline.go">docs generation code</a> directly!
</figcaption>
</figure>
<p>Taking this <em>even further</em>, with run type requirements available we can also integrate run types into other tooling - for example, our developer tool <code class="language-plaintext highlighter-rouge">sg</code> can help you create builds of various run types from a command like <code class="language-plaintext highlighter-rouge">sg ci build docker-images-patch</code> to build a Docker image for a specific service:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Detect what run-type someone might be trying to build</span>
<span class="n">rt</span> <span class="o">:=</span> <span class="n">runtype</span><span class="o">.</span><span class="n">Compute</span><span class="p">(</span><span class="s">""</span><span class="p">,</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"%s/%s"</span><span class="p">,</span> <span class="n">args</span><span class="p">[</span><span class="m">0</span><span class="p">],</span> <span class="n">branch</span><span class="p">),</span> <span class="no">nil</span><span class="p">)</span>
<span class="c">// From the detected matcher, we can see if an argument is required and request it</span>
<span class="n">m</span> <span class="o">:=</span> <span class="n">rt</span><span class="o">.</span><span class="n">Matcher</span><span class="p">()</span>
<span class="k">if</span> <span class="n">m</span><span class="o">.</span><span class="n">BranchArgumentRequired</span> <span class="p">{</span>
<span class="k">var</span> <span class="n">branchArg</span> <span class="kt">string</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">args</span><span class="p">)</span> <span class="o">>=</span> <span class="m">2</span> <span class="p">{</span>
<span class="n">branchArg</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="m">1</span><span class="p">]</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">branchArg</span><span class="p">,</span> <span class="n">err</span> <span class="o">=</span> <span class="n">open</span><span class="o">.</span><span class="n">Prompt</span><span class="p">(</span><span class="s">"Enter your argument input:"</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">err</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">branch</span> <span class="o">=</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"%s/%s"</span><span class="p">,</span> <span class="n">branchArg</span><span class="p">,</span> <span class="n">branch</span><span class="p">)</span>
<span class="p">}</span>
<span class="c">// Push to the branch required to trigger a build</span>
<span class="n">branch</span> <span class="o">=</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"%s%s"</span><span class="p">,</span> <span class="n">rt</span><span class="o">.</span><span class="n">Matcher</span><span class="p">()</span><span class="o">.</span><span class="n">Branch</span><span class="p">,</span> <span class="n">branch</span><span class="p">)</span>
<span class="n">gitArgs</span> <span class="o">:=</span> <span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"push"</span><span class="p">,</span> <span class="s">"origin"</span><span class="p">,</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">"%s:refs/heads/%s"</span><span class="p">,</span> <span class="n">commit</span><span class="p">,</span> <span class="n">branch</span><span class="p">)}</span>
<span class="k">if</span> <span class="o">*</span><span class="n">ciBuildForcePushFlag</span> <span class="p">{</span>
<span class="n">gitArgs</span> <span class="o">=</span> <span class="nb">append</span><span class="p">(</span><span class="n">gitArgs</span><span class="p">,</span> <span class="s">"--force"</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">run</span><span class="o">.</span><span class="n">GitCmd</span><span class="p">(</span><span class="n">gitArgs</span><span class="o">...</span><span class="p">)</span>
<span class="c">// Query Buildkite API to get the created build</span>
<span class="c">// ...</span>
</code></pre></div></div>
<p>Using a similar iteration over the available run types we can also provide tooltips that automatically list out all the supported run types that can be created this way:</p>
<figure>
<img src="/assets/images/posts/self-documenting/sg-ci-build.png" />
<figcaption>
Check out the <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/dev/sg/sg_ci.go"><code>sg ci build</code> source code</a> directly, or the <a href="https://github.com/sourcegraph/sourcegraph/pull/30932#discussion_r803196181">discussion behind the inception of this feature</a>.
</figcaption>
</figure>
<p>So now we have generated pipelines, documentation about them, the capability to extend pipeline specifications with additional feature like tracing, <em>and</em> tooling that is integrated and automatically kept in sync with pipeline specifications - all derived from a single source of truth!</p>
<p>Learn more about our continuous integration ecosystem in our <a href="https://docs.sourcegraph.com/dev/background-information/continuous_integration">developer documentation</a>, and check out the <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/enterprise/dev/ci">pipeline generator source code here</a>.</p>
<p><br /></p>
<h2 id="wrap-up">Wrap-up</h2>
<p>The generator approach has helped us build a low-maintenance and reliable ecosystem around parts of our infrastructure. Tailor-making such an ecosystem is a non-trivial investment at first, but as an organization grows and business needs become more specific, the investment pays off by making systems easy to learn, use, extend, integrate, validate, and more.</p>
<p>Also, it’s a lot of fun!</p>
<p><br /></p>
<h2 id="about-sourcegraph">About Sourcegraph</h2>
<p>Sourcegraph builds universal code search for every developer and company so they can innovate faster. We help developers and companies with billions of lines of code create the software you use every day.
Learn more about Sourcegraph <a href="https://about.sourcegraph.com/">here</a>.</p>
<p>Interested in joining? <a href="https://about.sourcegraph.com/jobs/">We’re hiring</a>!</p>robertIn a rapidly moving organization, documentation drift is inevitable as the underlying tools undergoes changes to suit changing needs, especially for internal tools where leaning on tribal knowledge can often be more efficient in the short term. As each component grows in complexity, however, this introduces debt that makes for a confusing onboarding process, a poor developer experience, and makes building integrations more difficult.Mirroring GitHub permissions at scale2021-10-08T00:00:00+00:002021-10-08T00:00:00+00:00https://bobheadxi.dev/mirroring-github-permissions-at-scale<p>As a tool for searching over all your code, accurately mirroring repository permissions defined in the relevant code hosts is a core part of <a href="https://about.sourcegraph.com/">Sourcegraph</a>’s functionality. Typically, the only way to do this is through the APIs of code hosts, though rate limits can mean it can take several <em>weeks</em> to work through a large number of users and repositories.</p>
<p>This article goes over some of the work I did on improving GitHub permissions mirroring at Sourcegraph, with the help of several co-workers - primarily <a href="https://github.com/unknwon">Joe Chen</a> (who wrote most of Sourcegraph’s original permissions mirroring code and helped me get up to speed - and is also the author of some big open-source projects like <a href="https://github.com/gogs/gogs">gogs/gogs</a> and <a href="https://github.com/go-ini/ini">go-ini/ini</a>) and <a href="https://github.com/benjaminwgordon">Ben Gordon</a> (who helped a ton on the customer-facing side of things).</p>
<h2 id="github-rate-limits">GitHub rate limits</h2>
<p>The GitHub API has a <a href="https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting">base rate limit of 5000 requests an hour</a>. Let’s look at what it takes to provide access lists for a user: with <a href="https://docs.github.com/en/rest/guides/traversing-with-pagination#basics-of-pagination">page size limits of 100 items per page</a>, iterating over all users can take can take up to the following number of requests, all of which should ideally fall under the rate limit constraints:</p>
\[\dfrac{\text{users} \times \text{repositories}}{100} < 5000\]
<p>This means that we will need $\text{users} \times \text{repositories}$ to be greater than 500000 to hit rate limiting.</p>
<p>To come up with a hopefully representative example for this post, I found a <a href="https://insights.dice.com/2019/10/14/compaies-hiring-software-developers-engineers/">random article</a> that claims some companies are hiring upwards of 3000 to 5000 developers, so let’s consider a case of 4000 developers and 5000 repositories (<a href="https://github.com/microsoft">Microsoft has about 4.5k public repos alone</a>, not including anything private or hosted in different organizations), and we get the following time to sync:</p>
\[\left(\dfrac{\text{4000} \times \text{5000}}{100} \times 2 \right) / 5000 = 80 \text{ hours}\]
<p>Three days is <em>okay</em>, but definitely enroaching into the territory of “cannot be done in a weekend”. In practice, implementation details mean that realistically we will consume far more requests than this, since we currently perform several types of sync<sup id="fnref:two" role="doc-noteref"><a href="#fn:two" class="footnote" rel="footnote">1</a></sup>, so the process will likely take longer than 80 hours.</p>
<p>The time to sync increases dramatically for even larger numbers of users and repositories - such as one customer that was projected to take upwards of <em>an entire month</em> to perform a full sync. Imagine paying thousands of dollars for a software product, only to have it unusable for the first month! Excessive rate limiting also means that permissions are far more likely to go stale, and can cause issues with other parts of Sourcegraph that also leverage GitHub APIs. The issue became a blocker for this particular customer, so we had to devise a solution to this issue.</p>
<h2 id="sourcegraph-and-repository-authorization">Sourcegraph and repository authorization</h2>
<p>I got my first hands-on experience with Sourcegraph’s authorization providers when <a href="https://github.com/sourcegraph/sourcegraph/pull/23755">expanding <code class="language-plaintext highlighter-rouge">p4 protect</code> support for the Perforce integration</a>.</p>
<p>In a nutshell, Sourcegraph internally defines an interface <em>authorization providers</em> can implement to provide access lists for users (<em>user-centric</em> permissions) and repositories (<em>repo-centric</em> permissions) - <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@8685a6bef8c3e9d2556335cb25448dbc1b356a4a/-/blob/internal/authz/iface.go"><code class="language-plaintext highlighter-rouge">authz.Provider</code></a> - to populate a single source-of-truth table for permissions.
This happens continuously and passively in the background. The populated table is then queried by various code paths that use the data to decide what content can and cannot be shown to a user.</p>
<figure>
<img src="../../assets/images/posts/sourcegraph-perms-sync.png" />
<figcaption>
Sourcegraph's repository permissions sync state indicator shows when the last sync occurred.
Site administrators can also trigger a manual sync.
</figcaption>
</figure>
<hr />
<p><strong>⚠️ Update:</strong> Since the writing of this post, I’ve contributed an improved and more in-depth description of how permissions sync works in Sourcegraph, if you are interested in a better overview: <a href="https://docs.sourcegraph.com/@4.4/admin/repo/permissions#background-permissions-syncing"><em>Repository permissions - Background permissions syncing</em></a>.</p>
<hr />
<p>For something like Perforce, user-centric sync is as simple as building a list of patterns from the Perforce protections table that work with <a href="https://www.postgresql.org/docs/12/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP">PostgreSQL’s <code class="language-plaintext highlighter-rouge">SIMILAR TO</code> operator</a>, like so:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// For the following p4 protect:</span>
<span class="c">// open user alice * //Sourcegraph/Engineering/.../Frontend/...</span>
<span class="c">// open user alice * //Sourcegraph/.../Handbook/...</span>
<span class="c">// FetchUserPerms would return:</span>
<span class="n">repos</span> <span class="o">:=</span> <span class="p">[]</span><span class="n">extsvc</span><span class="o">.</span><span class="n">RepoID</span><span class="p">{</span>
<span class="s">"//Sourcegraph/Engineering/%/Frontend/%"</span><span class="p">,</span>
<span class="s">"//Sourcegraph/%/Handbook/%"</span><span class="p">,</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Repo-centric sync is left unimplemented in this case.</p>
<p>For GitHub, we <a href="https://docs.github.com/en/rest/reference/repos#list-repositories-for-the-authenticated-user">query for all private repositories a user can explicitly access</a> via their OAuth token, and return a list in a similar manner:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">hasNextPage</span> <span class="o">:=</span> <span class="no">true</span>
<span class="k">for</span> <span class="n">page</span> <span class="o">:=</span> <span class="m">1</span><span class="p">;</span> <span class="n">hasNextPage</span><span class="p">;</span> <span class="n">page</span><span class="o">++</span> <span class="p">{</span>
<span class="k">var</span> <span class="n">err</span> <span class="kt">error</span>
<span class="k">var</span> <span class="n">repos</span> <span class="p">[]</span><span class="o">*</span><span class="n">github</span><span class="o">.</span><span class="n">Repository</span>
<span class="n">repos</span><span class="p">,</span> <span class="n">hasNextPage</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">err</span> <span class="o">=</span> <span class="n">client</span><span class="o">.</span><span class="n">ListAffiliatedRepositories</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">github</span><span class="o">.</span><span class="n">VisibilityPrivate</span><span class="p">,</span> <span class="n">page</span><span class="p">,</span> <span class="n">affiliations</span><span class="o">...</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">perms</span><span class="p">,</span> <span class="n">errors</span><span class="o">.</span><span class="n">Wrap</span><span class="p">(</span><span class="n">err</span><span class="p">,</span> <span class="s">"list repos for user"</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">r</span> <span class="o">:=</span> <span class="k">range</span> <span class="n">repos</span> <span class="p">{</span>
<span class="n">addRepoToUserPerms</span><span class="p">(</span><span class="n">extsvc</span><span class="o">.</span><span class="n">RepoID</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">ID</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Note that for public repositories, Sourcegraph simply doesn’t enforce permissions, so authorization only needs to care about explicit permissions.</p>
<p>The above is where we bump into <a href="#github-rate-limits">GitHub’s rate limits</a> easily - in a organization with 5000 repositories, that’s up to 50 API requests for each and every user to page through all their repositories. The GitHub authorization implementation also does the same thing for repo-centric permissions by listing all users with access to each repository.</p>
<h2 id="introducing-a-cache">Introducing a cache</h2>
<p>Caches don’t solve all problems, but in this case there was an opportunity to save significant amounts of work through caching. GitHub repository permissions at companies are typically distributed through teams and organizations - membership to either would grant you access to relevant repositories, and teams are strict subsets of organizations. There are still instances of direct permissions - where a user is explicitly added to a repository - but it is unlikely to find a case of repositories without thousands of users added explicitly.</p>
<p>This means that in the vast majority of cases, when querying for user Foo’s repositories, we are actually asking what teams and organizations Foo is in. At a high level, we could do the following instead:</p>
<ol>
<li>Get Foo’s direct repository affiliations</li>
<li>Get the organizations Foo is in
<ol>
<li>Get the teams a user is in within this organization</li>
</ol>
</li>
<li>For each organization and team:
<ol>
<li>If organization allows read permissions on all repositories, or Foo is an organization administrator, get all organization repositories from cache as part of this Foo’s access list</li>
<li>Get all team repositories from cache eas part of Foo’s access list</li>
</ol>
</li>
</ol>
<p>Cache misses would prompt a new query to GitHub to mirror access lists for specific teams and organizations. In the best-case scenario, where all users are part of large teams and organizations and there are very few instances of being directly granted access to a repository, cache hits should be very frequent and greatly reduce the amount of work required. Going back to the earlier example of 4000 developers and 5000 repositories, we get a best case performance of:</p>
\[\dfrac{(\text{teams} + \text{organizations}) \times \text{5000}}{100} = (\text{teams} + \text{organizations}) \times 50\]
<p>Even if we had a 100 teams and organizations, this would fall under the hourly rate limit - a huge improvement from the previously projected 80 hours. Even in the worse case, this would only be marginally less efficient than the existing implementation.</p>
<p>To mitigate outdated caches, a flag to the provider interface was added to allow partial cache invalidation along the path of a sync (important because you don’t want every single team and organization queued for a sync all at once) and tying it into the various ways of triggering a sync (notably webhook receivers and the API).</p>
<p>The approach was promising, and a feature-flagged<sup id="fnref:flagged" role="doc-noteref"><a href="#fn:flagged" class="footnote" rel="footnote">2</a></sup> user-centric sync backed by a Redis cache was implemented in <a href="https://github.com/sourcegraph/sourcegraph/pull/23978">sourcegraph#23978 authz/github: user-centric perms sync from team/org perms caches</a>.</p>
<h2 id="two-way-sync">Two-way sync</h2>
<p>As mentioned earlier, Sourcegraph’s authorization providers provide two-way sync: user-centric and repo-centric. To make the cache-backed sync complete, equivalent functionality had to be implemented for repo-centric sync.</p>
<p>Because GitHub organizations are conveniently supersets of teams (unlike <em>some</em> code hosts), user-centric cache was implemented with either <code class="language-plaintext highlighter-rouge">organization</code> or <code class="language-plaintext highlighter-rouge">organization/team</code> as keys and a big list of repositories as its value:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>org/team: {
repos: [repo-foo, repo-bar]
}
</code></pre></div></div>
<p>To make this cache work both ways, I simply added <code class="language-plaintext highlighter-rouge">users</code> to the cache values, and implemented a similar approach to finding a repository’s relevant organizations and teams. In this case, a relevant organization would be one that has default-read access (otherwise members of an organization do not necessarily have access to said repository).</p>
<p>This makes for somewhat large cache values, but also makes it easy to perform partial cache updates. For example, if user <code class="language-plaintext highlighter-rouge">user-foo</code> is created and added to <code class="language-plaintext highlighter-rouge">org/team</code>, the user can be added to the cache for <code class="language-plaintext highlighter-rouge">org/team</code> during user-centric sync, and subsequent syncs of <code class="language-plaintext highlighter-rouge">repo-foo</code> and <code class="language-plaintext highlighter-rouge">repo-bar</code> will include the new user without having the perform a full sync, and vice versa.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>org/team: {
repos: [repo-foo, repo-bar]
users: [user-bar, user-foo]
}
</code></pre></div></div>
<p>On paper, the performance improvements gained here are similar to the ones when implementing caching for user-centric sync, except scaling off the number of users in teams and organizations instead of repositories.</p>
<p>This was implemented in <a href="https://github.com/sourcegraph/sourcegraph/pull/24328">sourcegraph#24328 authz/github: repo-centric perms sync from team/org perms caches</a>.</p>
<h2 id="scaling-in-practice">Scaling in practice</h2>
<p>Throughout the implementation of the cache-backed GitHub permissions mirroring, a <a href="https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24%408685a6b+file:%5Eenterprise/internal/authz/github/github_test%5C.go+t.Run%28%22cache+enabled%22%2C+:%5Btests%5D%29+count:9999&patternType=structural">large number of unit tests were included</a>, as well as <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@8685a6b/-/blob/enterprise/cmd/repo-updater/internal/authz/integration_test.go">a few integration tests</a>, that tested the behaviour of various combinations of cache hits and misses.</p>
<p>To write integration tests, we use “golden testing”, where we <a href="https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24%408685a6b+file:%5Eenterprise/cmd/repo-updater/internal/authz/testdata/vcr/TestIntegration_GitHubPermissions+lang:yaml&patternType=literal">record network interactions to a file (called “VCRs”)</a>. Tests then use the recorded network interactions instead of reaching out to external services by default, unless explicitly asked to update the recordings. Interestingly, despite the significant improvements of this approach for larger numbers of users and repositories, this also made clear just how inefficient the cache-based approach is for smaller instances:</p>
<ul>
<li>with <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@8685a6b/-/blob/enterprise/cmd/repo-updater/internal/authz/testdata/vcr/TestIntegration_GitHubPermissions/repo-centric/no-groups.yaml">caching disabled</a>, the integration test recorded just 2 network requests for repo-centric sync.</li>
<li>with <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@8685a6b/-/blob/enterprise/cmd/repo-updater/internal/authz/testdata/vcr/TestIntegration_GitHubPermissions/repo-centric/groups-enabled.yaml">caching enabled</a>, the integration test recorded a whopping 22 network requests for repo-centric sync with the exact same number of repositories and users</li>
</ul>
<p>This is why we continue to leave the <a href="https://docs.sourcegraph.com/admin/repo/permissions#teams-and-organizations-permissions-caching">cache-backed sync as a opt-in behaviour</a>.</p>
<p>However, despite reasonably robust testing of the behaviour of the code, we had no way to easily perform and end-to-end test of this at the scale of thousands of repositories and users with the appropriate teams and organizations. In hindsight, I could have invested some effort into generating VCRs to emulate such an environment and test against it, but with the agreement of the customer requesting this the decision was made to ship the changes and ask them to try it out.</p>
<h3 id="debug-logging">Debug logging</h3>
<p>All was well at first in the trial run - the backlog of repositories queued for an initial permissions sync was very rapidly being worked through, with a projected 3-day time to full sync - a huge improvement from the the previously projected 30 days. However, with just a few thousand repositories left to process, the sync stalled.</p>
<p>Metrics indicated jobs were timing out, and a look at the logs revealed thousands upon thousands of lines of random comma-delimited numbers. It seemed that printing all this junk was causing the service to stall, and sure enough <a href="https://docs.docker.com/config/containers/logging/configure/#configure-the-logging-driver-for-a-container">setting the log driver to <code class="language-plaintext highlighter-rouge">none</code></a> to disable all output on the relevant service allowed the sync to proceed and continue.</p>
<p>Where did the log come from? <a href="https://github.com/sourcegraph/sourcegraph/pull/24822">I left a stray <code class="language-plaintext highlighter-rouge">log.Printf("%+v\n", group)</code> in there when I was debugging cache entries</a>. At scale these entries could contain many thousands of entries, causing the system to degrade. Be careful what you log!</p>
<h3 id="postgres-parameter-limits">Postgres parameter limits</h3>
<p>A service we call <code class="language-plaintext highlighter-rouge">repo-updater</code> has an internal service called <code class="language-plaintext highlighter-rouge">PermsSyncer</code> that manages a queue of jobs to request updated access lists using these authorization providers for users and repositories based on a variety of heuristics such as permissions age, as well as on events like webhooks and repository visits (<a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph@8685a6bef8c3e9d2556335cb25448dbc1b356a4a/-/blob/enterprise/cmd/repo-updater/internal/authz/doc.go">diagram</a>). Access lists returned by authorization providers are upserted into a single <a href="https://github.com/sourcegraph/sourcegraph/blob/main/internal/database/schema.md#table-publicrepo_permissions"><code class="language-plaintext highlighter-rouge">repo_permissions</code> table</a> that is the source of truth for all repositories a <em>Sourcegraph</em> user can access, and vice versa.</p>
<p>Entries can also be upserted into a table called <a href="https://github.com/sourcegraph/sourcegraph/blob/main/internal/database/schema.md#table-publicrepo_pending_permissions"><code class="language-plaintext highlighter-rouge">repo_pending_permissions</code></a>, which is home to permissions that do not have a Sourcegraph user attached yet. When a user logs in via a code host’s OAuth mechanism to Sourcegraph, the user’s Sourcegraph identity attached to the user’s identity on that code host (this allows a Sourcegraph user to be associated with multiple code hosts), and relevant entries in <code class="language-plaintext highlighter-rouge">repo_pending_permissions</code> are “granted” to the user.</p>
<p>This means that once the massive number of repositories in the trial run was fully mirrored from GitHub, a user attempting to log in could have a huge set of pending permissions granted to it all at once. Of course, this broke with a fun-looking error:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>execute upsert repo permissions batch query: extended protocol limited to 65535 parameters
</code></pre></div></div>
<p>I was able to reproduce this in an integration test of the relevant query by generating a set of 17000 entries:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span>
<span class="n">name</span><span class="o">:</span> <span class="n">postgresParameterLimitTest</span><span class="p">,</span>
<span class="n">updates</span><span class="o">:</span> <span class="k">func</span><span class="p">()</span> <span class="p">[]</span><span class="o">*</span><span class="n">authz</span><span class="o">.</span><span class="n">UserPermissions</span> <span class="p">{</span>
<span class="n">user</span> <span class="o">:=</span> <span class="o">&</span><span class="n">authz</span><span class="o">.</span><span class="n">UserPermissions</span><span class="p">{</span>
<span class="n">UserID</span><span class="o">:</span> <span class="m">1</span><span class="p">,</span>
<span class="n">Perm</span><span class="o">:</span> <span class="n">authz</span><span class="o">.</span><span class="n">Read</span><span class="p">,</span>
<span class="n">IDs</span><span class="o">:</span> <span class="n">toBitmap</span><span class="p">(),</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">1</span><span class="p">;</span> <span class="n">i</span> <span class="o"><=</span> <span class="m">17000</span><span class="p">;</span> <span class="n">i</span> <span class="o">+=</span> <span class="m">1</span> <span class="p">{</span>
<span class="n">user</span><span class="o">.</span><span class="n">IDs</span><span class="o">.</span><span class="n">Add</span><span class="p">(</span><span class="kt">uint32</span><span class="p">(</span><span class="n">i</span><span class="p">))</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">[]</span><span class="o">*</span><span class="n">authz</span><span class="o">.</span><span class="n">UserPermissions</span><span class="p">{</span><span class="n">user</span><span class="p">}</span>
<span class="p">}(),</span>
<span class="n">expectUserPerms</span><span class="o">:</span> <span class="k">func</span><span class="p">()</span> <span class="k">map</span><span class="p">[</span><span class="kt">int32</span><span class="p">][]</span><span class="kt">uint32</span> <span class="p">{</span>
<span class="n">repos</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">([]</span><span class="kt">uint32</span><span class="p">,</span> <span class="m">17000</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">1</span><span class="p">;</span> <span class="n">i</span> <span class="o"><=</span> <span class="m">17000</span><span class="p">;</span> <span class="n">i</span> <span class="o">+=</span> <span class="m">1</span> <span class="p">{</span>
<span class="n">repos</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="m">1</span><span class="p">]</span> <span class="o">=</span> <span class="kt">uint32</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">return</span> <span class="k">map</span><span class="p">[</span><span class="kt">int32</span><span class="p">][]</span><span class="kt">uint32</span><span class="p">{</span><span class="m">1</span><span class="o">:</span> <span class="n">repos</span><span class="p">}</span>
<span class="p">}(),</span>
<span class="n">expectRepoPerms</span><span class="o">:</span> <span class="k">func</span><span class="p">()</span> <span class="k">map</span><span class="p">[</span><span class="kt">int32</span><span class="p">][]</span><span class="kt">uint32</span> <span class="p">{</span>
<span class="n">repos</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="k">map</span><span class="p">[</span><span class="kt">int32</span><span class="p">][]</span><span class="kt">uint32</span><span class="p">,</span> <span class="m">17000</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="o">:=</span> <span class="m">1</span><span class="p">;</span> <span class="n">i</span> <span class="o"><=</span> <span class="m">17000</span><span class="p">;</span> <span class="n">i</span> <span class="o">+=</span> <span class="m">1</span> <span class="p">{</span>
<span class="n">repos</span><span class="p">[</span><span class="kt">int32</span><span class="p">(</span><span class="n">i</span><span class="p">)]</span> <span class="o">=</span> <span class="p">[]</span><span class="kt">uint32</span><span class="p">{</span><span class="m">1</span><span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">repos</span>
<span class="p">}(),</span>
<span class="p">},</span>
</code></pre></div></div>
<p>This would break because we were performing an insert of 4 values per row, and at 17000 rows we reach 68000 parameters bound to a query. <a href="https://www.postgresql.org/docs/12/protocol-message-formats.html">Postgres uses Int16 codes to denote bind variables</a>, which would mean a maximum of $2^{16} =$ 65536 parameters (hence the seemingly magic number indicated in the error).</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">repo_permissions</span>
<span class="p">(</span><span class="n">repo_id</span><span class="p">,</span> <span class="n">permission</span><span class="p">,</span> <span class="n">user_ids_ints</span><span class="p">,</span> <span class="n">updated_at</span><span class="p">)</span>
<span class="k">VALUES</span>
<span class="o">%</span><span class="n">s</span>
<span class="k">ON</span> <span class="n">CONFLICT</span> <span class="k">ON</span> <span class="k">CONSTRAINT</span>
<span class="cm">/* ... */</span>
</code></pre></div></div>
<p>Funnily enough, you can get around this <a href="https://klotzandrew.com/blog/postgres-passing-65535-parameter-limit">by providing columns as arrays</a>. In this case, if you can provide each of the 4 columns here as an array, that would only count for 4 parameters, allowing this insert to scale indefinitely!</p>
<p>Sadly, one of the columns here is of type <code class="language-plaintext highlighter-rouge">INT[]</code>. When I attempted to perform an <code class="language-plaintext highlighter-rouge">UNNEST</code> on an <code class="language-plaintext highlighter-rouge">INT[][]</code>, it completely unwrapped the array instead of just unwrapping it by a single dimension like one might expect:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="k">unnest</span><span class="p">(</span><span class="n">ARRAY</span><span class="p">[</span><span class="s1">'hello'</span><span class="p">,</span><span class="s1">'world'</span><span class="p">]::</span><span class="nb">TEXT</span><span class="p">[],</span> <span class="n">ARRAY</span><span class="p">[[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">],[</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]]::</span><span class="nb">INT</span><span class="p">[][])</span>
</code></pre></div></div>
<p>Frustratingly returns:</p>
<table>
<thead>
<tr>
<th>unnest</th>
<th>unnest</th>
</tr>
</thead>
<tbody>
<tr>
<td>hello</td>
<td>1</td>
</tr>
<tr>
<td>world</td>
<td>2</td>
</tr>
<tr>
<td> </td>
<td>3</td>
</tr>
<tr>
<td> </td>
<td>4</td>
</tr>
</tbody>
</table>
<p>When the desired result was just a one-dimensional unwrapping:</p>
<table>
<thead>
<tr>
<th>unnest</th>
<th>unnest</th>
</tr>
</thead>
<tbody>
<tr>
<td>hello</td>
<td>[1, 2]</td>
</tr>
<tr>
<td>world</td>
<td>[3, 4]</td>
</tr>
</tbody>
</table>
<p>I briefly toyed with the idea of hacking around this by combining the array type as a single string and splitting it on the fly:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span>
<span class="n">a</span><span class="p">,</span>
<span class="n">string_to_array</span><span class="p">(</span><span class="n">b</span><span class="p">,</span><span class="s1">','</span><span class="p">)::</span><span class="nb">INT</span><span class="p">[]</span>
<span class="k">FROM</span>
<span class="k">unnest</span><span class="p">(</span><span class="n">ARRAY</span><span class="p">[</span><span class="s1">'hello'</span><span class="p">,</span><span class="s1">'world'</span><span class="p">]::</span><span class="nb">TEXT</span><span class="p">[],</span> <span class="n">ARRAY</span><span class="p">[</span><span class="s1">'1,2,3'</span><span class="p">,</span><span class="s1">'4,5,6'</span><span class="p">]::</span><span class="nb">TEXT</span><span class="p">[])</span> <span class="k">AS</span> <span class="n">t</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
</code></pre></div></div>
<p>An <code class="language-plaintext highlighter-rouge">EXPLAIN ANALYZE</code> on the 5000-row sample query that didn’t hit the parameter limit, however, indicated that the performance of this was about 5x worse than before (with a cost of 337.51, compared to the previous cost of 62.50). It was also a bit of a dirty hack anyway, so I ended up resorting to simply paging the insert instead to avoid hitting the parameter limit. This was implemented in <a href="https://github.com/sourcegraph/sourcegraph/pull/24852">sourcegraph#24852 database: page upsertRepoPermissionsBatchQuery</a>.</p>
<p>However, it seemed that this was not the only instance of us exceeding the parameter limits. Another query was running into a similar issue on a different customer instance. This time, there were no array types in the values being inserted, so I was able to try out the insert-as-arrays workaround:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">INSERT INTO user_pending_permissions
</span> (service_type, service_id, bind_id, permission, object_type, updated_at)
<span class="p">VALUES
</span><span class="gd">- %s
</span><span class="gi">+ (service_type, service_id, bind_id, permission, object_type, updated_at)
+ (
+ SELECT %s::TEXT, %s::TEXT, UNNEST(%s::TEXT[]), %s::TEXT, %s::TEXT, %s::TIMESTAMPTZ
+ )
</span><span class="p">ON CONFLICT ON CONSTRAINT
</span> /* ... */
</code></pre></div></div>
<p>This implementation of the query was slower for smaller cases, but for larger datasets was either on par or faster than the original query:</p>
<table>
<thead>
<tr>
<th>Case</th>
<th>Accounts</th>
<th>Cost</th>
<th>Clock</th>
<th>Comparison</th>
</tr>
</thead>
<tbody>
<tr>
<td>Before</td>
<td>100</td>
<td><code class="language-plaintext highlighter-rouge">0.00..1.75</code></td>
<td>287.071 ms</td>
<td> </td>
</tr>
<tr>
<td>After</td>
<td>100</td>
<td><code class="language-plaintext highlighter-rouge">0.02..1.51</code></td>
<td>430.941 ms</td>
<td>~50% slower</td>
</tr>
<tr>
<td>Before</td>
<td>5000</td>
<td><code class="language-plaintext highlighter-rouge">0.00..87.50</code></td>
<td>7199.440 ms</td>
<td> </td>
</tr>
<tr>
<td>After</td>
<td>5000</td>
<td><code class="language-plaintext highlighter-rouge">0.02..75.02</code></td>
<td>7218.860 ms</td>
<td>~same</td>
</tr>
<tr>
<td>Before</td>
<td>10000</td>
<td><code class="language-plaintext highlighter-rouge">0.00..175.00</code></td>
<td>16858.613 ms</td>
<td> </td>
</tr>
<tr>
<td>After</td>
<td>10000</td>
<td><code class="language-plaintext highlighter-rouge">0.02..150.01</code></td>
<td>14566.492 ms</td>
<td>~13% faster</td>
</tr>
<tr>
<td>Before</td>
<td>15000</td>
<td>fail</td>
<td>fail</td>
<td> </td>
</tr>
<tr>
<td>After</td>
<td>15000</td>
<td><code class="language-plaintext highlighter-rouge">0.02..225.01</code></td>
<td>22938.112 ms</td>
<td>success</td>
</tr>
</tbody>
</table>
<p>I originally had the function decide which query to use based on the size of the insert, but during code review it was recommended that we just stick to one implementation for simplicity, since permissions mirroring happens asynchronously and is not particularly latency-sensitive.</p>
<p>This was implemented in <a href="https://github.com/sourcegraph/sourcegraph/pull/24972/files">sourcegraph#24972 database: provide upsertUserPendingPermissionsBatchQuery insert values as array</a>.</p>
<h2 id="results">Results</h2>
<p>After working through the issues mentioned in this article as well as a variety of other minor fixes, the customer was finally able to run a full permissions mirror to completion with everything working as expected. The final result was roughly 2.5 days to full sync, a <strong>more than 10x improvement</strong> to the previously projected 30 days. The improved performance unblocked the customer in question on this front and will hopefully open the door for Sourcegraph to function fully in even larger environments in the future!</p>
<h2 id="about-sourcegraph">About Sourcegraph</h2>
<p>Sourcegraph builds universal code search for every developer and company so they can innovate faster. We help developers and companies with billions of lines of code create the software you use every day.
Learn more about Sourcegraph <a href="https://about.sourcegraph.com/">here</a>.</p>
<p>Interested in joining? <a href="https://about.sourcegraph.com/jobs/">We’re hiring</a>!</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:two" role="doc-endnote">
<p>See <a href="#two-way-sync">Two-way sync</a>. <a href="#fnref:two" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:flagged" role="doc-endnote">
<p>Well, admittedly, it was only feature-flagged to off by default <a href="https://github.com/sourcegraph/sourcegraph/pull/24318">in a follow-up PR</a> when I realised this required additional authentication scopes we do not request by default against the GitHub API (in order to query organizations and teams). <a href="#fnref:flagged" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>robertAs a tool for searching over all your code, accurately mirroring repository permissions defined in the relevant code hosts is a core part of Sourcegraph’s functionality. Typically, the only way to do this is through the APIs of code hosts, though rate limits can mean it can take several weeks to work through a large number of users and repositories.June 2021 updates for bobheadxi.dev2021-06-20T00:00:00+00:002021-06-20T00:00:00+00:00https://bobheadxi.dev/introducing-dark-mode<p>With dark mode on every website nowadays, my website seems to have fallen a bit behind the times. I decided it was about time to give my website a bit of a facelift - and over-hype it with a blog post!</p>
<p>This round of improvements didn’t strictly happen this month, but a lot of it was spurred on by my recent reading of the <a href="https://ia.net/design/blog">iA Design Blog</a>. I think their website is absolutely gorgeous, and it made the lacklustre of <code class="language-plaintext highlighter-rouge">bobheadxi.dev</code> all the more apparent.</p>
<p>For the unfamiliar, my site started off over 2 years ago with the <a href="https://github.com/sergiokopplin/indigo"><code class="language-plaintext highlighter-rouge">indigo</code> Jekyll theme</a>. I have since made quite a number of changes to it, mostly in random spurts of effort, and started <a href="/march-2020-site-updates">writing about these periods of changes last year</a>.</p>
<p>I quite like how things turned out for this set of changes - hope you do as well!</p>
<h2 id="refinements">Refinements</h2>
<h3 id="updated-typography">Updated typography</h3>
<p>A big part of <code class="language-plaintext highlighter-rouge">bobheadxi.dev</code> is my blog posts, even though I’m unsure how many people read them (Google Analytics indicates a lot of traffic, particularly on my <em>really</em> old <a href="/object-casting-in-javascript/">Object Casting in Javascript</a> post). Anyway, I’ve always been rather dissatisfied with the reading experience on my site, but could never quite put my finger on what exactly was wrong with it.</p>
<p>All I knew was that I didn’t like the previous fonts - <em>Helvetica Neue</em> - but until I started using <a href="https://ia.net/writer">iA Writer</a> recently, I didn’t have much of an inkling of what font I would like.</p>
<p>iA Writer uses these gorgeous fonts - aptly named <em>Mono</em>, <em>Duo</em>, and <em>Quattro</em> - that I think looks <em>so nice</em> when typing and reading. They have a <a href="https://ia.net/writer/blog/a-typographic-christmas">neat blog post introducing these fonts</a>, and while I’m not really sure what this stuff means, I decided to make the switch.</p>
<p>This site now uses <em>Quattro</em> as its serif font, and <em>Mono</em> as its monospaced font. I think the results are quite nice.</p>
<h3 id="outdented-heading-anchors">Outdented heading anchors</h3>
<p>While editing in iA Writer, headings get nicely outdented ‘#’s like so:</p>
<p><img src="../../assets/images/dark-mode/header-outdent-ia.png" alt="" /></p>
<p>When I started thinking about it, I’m pretty sure this is a very common style in many websites already. Either way, I quite like how it looks, so I tried to replicate it on my site. I currently generate somewhat similar-looking (but not outdented) anchor links using <a href="https://github.com/allejo/jekyll-anchor-headings"><code class="language-plaintext highlighter-rouge">allejo/jekyll-anchor-headings</code></a>, which allows a little bit of customization - I can give the anchor link elements a class, for example, and style it through that.</p>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><div</span> <span class="na">class=</span><span class="s">"post-content"</span><span class="nt">></span>
{% include anchor_headings.html html=content anchorBody='#' anchorClass='heading-anchor' beforeHeading=true %}
<span class="nt"></div></span>
</code></pre></div></div>
<p>Turns out the outdenting can be achieved using the handy <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/transform-function/translateX"><code class="language-plaintext highlighter-rouge">translateX</code> transformation</a>, and a bit of <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/@media"><code class="language-plaintext highlighter-rouge">@media</code></a> helps me scale this effect for smaller screens (where outdenting could position the anchors very close to the edge of your screen).</p>
<div class="language-sass highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">h1</span><span class="o">,</span> <span class="nt">h2</span><span class="o">,</span> <span class="nt">h3</span><span class="o">,</span> <span class="nt">h4</span>
<span class="c1">// ... some CSS</span>
<span class="o">></span> <span class="nc">.heading-anchor</span>
<span class="nl">position</span><span class="p">:</span> <span class="nb">absolute</span>
<span class="nl">transform</span><span class="p">:</span> <span class="nf">translateX</span><span class="p">(</span><span class="m">-2rem</span><span class="p">)</span>
<span class="k">@media</span> <span class="si">#{</span><span class="nv">$tablet</span><span class="si">}</span><span class="o">,</span> <span class="si">#{</span><span class="nv">$mobile</span><span class="si">}</span>
<span class="nl">position</span><span class="p">:</span> <span class="nb">inherit</span>
<span class="nl">transform</span><span class="p">:</span> <span class="nb">none</span>
</code></pre></div></div>
<p>Sadly, I wasn’t able to figure out a nontrivial way to have the number of ‘#’s correspond to the depth of the heading, but I figured this was close enough, and is definitely an improves the look of headings (in my opinion).</p>
<p><img src="../../assets/images/dark-mode/header-outdent-bob.png" alt="" /></p>
<h3 id="bold-introductions">Bold introductions</h3>
<p>Some books and blogs get big first letters for the first paragraph of a chapter or article. The effect looks nice on books, but I was never really sold on its usage in blog posts - though the look of an emphasised introduction is certainly striking. As I browsed through <a href="https://ia.net/design/blog">iA Design Blog</a>, I noticed that their first paragraphs were <em>big</em>, and it made each essay feel much more compelling.</p>
<p><img src="../../assets/images/dark-mode/wide-intro-ia.png" alt="" /></p>
<p>However, as I went about considering different options for making <em>my</em> intros real big as well, I realised a lot of my introductory paragraphs were complete garbage. While sometimes that was the intent - leading with a tangent before diving into the article’s main topic - they definitely did not age well.</p>
<p>So perhaps a fortunate side effect is that this prompted me to go back through my posts and make the bare minimum effort to make them a bit more interesting. At least I look like I know what I’m talking about now!</p>
<p><img src="../../assets/images/dark-mode/wide-intro.png" alt="" /></p>
<h3 id="exciting-listings">Exciting listings</h3>
<p>I just learned about Jekyll’s <code class="language-plaintext highlighter-rouge">post.excerpt</code> feature that gives you the first paragraph of a blog post. Again inspired by the iA Design Blog, which uses excerpts instead of custom descriptions to great effect, I decided to use them here as well.</p>
<p><img src="../../assets/images/dark-mode/light-blog-listing.gif" alt="" /></p>
<p>I think this gives a far better preview into the content of each post, and kind of makes them look more important. Thankfully my updating of each post’s first paragraphs to accommodate <a href="#bold-introductions">bigger introductions</a> meant that the excerpts are at least somewhat meaningful.</p>
<p>I also made minor improvements such as adding an on-hover effect to the clickable tags, which previously had no indication they were clickable.</p>
<h3 id="the-big-picture">The big picture</h3>
<p>I like to include all sorts of media in my blog posts - images, code snippets, diagrams, quotes, and more. Unfortunately, I also like somewhat narrow widths for my content, which makes for a poor viewing experience for various forms of media.</p>
<p>On <a href="https://about.sourcegraph.com/blog/optimizing-a-code-intel-commit-graph/#Performance-improvements">articles in the Sourcegraph Blog</a> (and I recall that you can do this on Medium as well), I noticed that images were “blown up” - wider than the content - and I thought the effect looked quite nice, giving an expansive canvas for media to be enjoyed while still maintaining a nice reading experience for all the other stuff.</p>
<p>To do this myself, I turned images I wanted to be blown up into <code class="language-plaintext highlighter-rouge"><figure></code> elements, and gave them expanded widths, along with <code class="language-plaintext highlighter-rouge"><figcaption></code>. This also served nicely to standardise the raw HTML I’d been previously using to give images captions.</p>
<figure>
<img src="../../assets/images/dark-mode/wide-image.png" />
<figcaption>Big!!!!</figcaption>
</figure>
<p>Code blocks ran into similar problems, where snippets I didn’t careful adjust to adhere to an 80-character line limit would have to be scrolled to viewed, even on very wide screens. So I made them massive.</p>
<p><img src="../../assets/images/dark-mode/wide-code.png" alt="" /></p>
<p>I’ve also always liked the big quotes used in magazine and newspaper sites to give quotes an even more authoritative and dramatic feel - so quotes joined the big club.</p>
<p><img src="../../assets/images/dark-mode/wide-quote.png" alt="" /></p>
<p><a href="https://mermaid-js.github.io/mermaid">Mermaid diagrams</a> and some other things I might have forgotten also got this treatment. Hopefully these changes make the reading experience more exciting!</p>
<h2 id="dark-mode">Dark mode</h2>
<p>And last but not least, the star of today’s show… dark mode! Because no site is complete without one.</p>
<figure>
<img src="../../assets/images/dark-mode/light-to-dark.gif" />
<figcaption>The site now switches do dark mode if you have dark mode enabled on your device!</figcaption>
</figure>
<p>Luckily for me, the theme my site was based on made decent use of SASS variables for colours (though the naming of the colours left quite a bit to be desired, as you’ll see in a moment).</p>
<p>I found to my dismay that because these variables are compiled away at build time, they cannot be used to respond to <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/@media/prefers-color-scheme"><code class="language-plaintext highlighter-rouge">prefers-color-scheme: dark</code></a>, which seems to be the standard way to detect for what theme you should show to the user.</p>
<p>Instead, I found some blog posts talking about <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/Using_CSS_custom_properties">CSS variables</a>, which turns out to be the only way to have properly variable variables in stylesheets. To be honest this is the first time I’ve had to do something like this myself, and this was news to me!</p>
<p>My implementation ended up pretty straight forward, using <a href="https://www.w3.org/TR/CSS2/selector.html#universal-selector">universal selectors</a> and setting the theme in JavaScript, though I’m sure there are other ways to do this too (maybe even JavaScript-free?).</p>
<div class="language-sass highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">[</span><span class="nt">data-theme</span><span class="o">=</span><span class="s2">"theme-light"</span><span class="o">]</span>
<span class="na">--background</span><span class="p">:</span> <span class="mh">#ffffff</span>
<span class="na">--alpha</span><span class="p">:</span> <span class="mh">#333</span>
<span class="na">--beta</span><span class="p">:</span> <span class="mh">#222</span>
<span class="na">--gama</span><span class="p">:</span> <span class="mh">#aaa</span>
<span class="na">--delta</span><span class="p">:</span> <span class="mh">#5A85F3</span>
<span class="na">--epsilon</span><span class="p">:</span> <span class="mh">#ededed</span>
<span class="na">--zeta</span><span class="p">:</span> <span class="mh">#666</span>
<span class="o">[</span><span class="nt">data-theme</span><span class="o">=</span><span class="s2">"theme-dark"</span><span class="o">]</span>
<span class="na">--background</span><span class="p">:</span> <span class="mh">#141414</span>
<span class="na">--alpha</span><span class="p">:</span> <span class="mh">#aaa</span>
<span class="na">--beta</span><span class="p">:</span> <span class="mh">#eeeeee</span>
<span class="na">--gama</span><span class="p">:</span> <span class="mh">#474747</span>
<span class="na">--delta</span><span class="p">:</span> <span class="mh">#5A85F3</span>
<span class="na">--epsilon</span><span class="p">:</span> <span class="mh">#202020</span>
<span class="na">--zeta</span><span class="p">:</span> <span class="mh">#929292</span>
</code></pre></div></div>
<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">var</span> <span class="nx">prefersDark</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="kd">function</span> <span class="nx">setDarkMode</span><span class="p">(</span><span class="nx">isDark</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">theme</span> <span class="o">=</span> <span class="s2">`theme-</span><span class="p">${</span><span class="nx">isDark</span> <span class="p">?</span> <span class="dl">'</span><span class="s1">dark</span><span class="dl">'</span> <span class="p">:</span> <span class="dl">'</span><span class="s1">light</span><span class="dl">'</span><span class="p">}</span><span class="s2">`</span><span class="p">;</span>
<span class="nb">document</span><span class="p">.</span><span class="nx">querySelector</span><span class="p">(</span><span class="dl">'</span><span class="s1">html</span><span class="dl">'</span><span class="p">).</span><span class="nx">dataset</span><span class="p">.</span><span class="nx">theme</span> <span class="o">=</span> <span class="nx">theme</span><span class="p">;</span>
<span class="nx">prefersDark</span> <span class="o">=</span> <span class="nx">isDark</span><span class="p">;</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Set </span><span class="p">${</span><span class="nx">theme</span><span class="p">}</span><span class="s2">`</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// set the initial theme</span>
<span class="kd">const</span> <span class="nx">prefersDarkMatch</span> <span class="o">=</span> <span class="nb">window</span><span class="p">.</span><span class="nx">matchMedia</span><span class="p">(</span><span class="dl">'</span><span class="s1">(prefers-color-scheme: dark)</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">setDarkMode</span><span class="p">(</span><span class="nx">prefersDarkMatch</span><span class="p">.</span><span class="nx">matches</span><span class="p">);</span>
<span class="c1">// watch for changes to the user's dark mode configuration</span>
<span class="nx">prefersDarkMatch</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="dl">'</span><span class="s1">change</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">e</span><span class="p">)</span> <span class="o">=></span> <span class="nx">setDarkMode</span><span class="p">(</span><span class="nx">e</span><span class="p">.</span><span class="nx">matches</span><span class="p">));</span>
</code></pre></div></div>
<p>Having the <code class="language-plaintext highlighter-rouge">setDarkMode</code> function available is useful for development, allowing me to switch between the modes via console, and I added the <code class="language-plaintext highlighter-rouge">prefersDark</code> variable… just because, I guess. Maybe handy if I want to add a button to toggle dark mode?</p>
<p>In the end, despite picking the colours semi-randomly and not making an awful lot of adjustments, I’m pretty happy with how this (in my opinion) quick effort turn out! I’m particularly pleased with how the blog listings look:</p>
<p><img src="../../assets/images/dark-mode/dark-blog-listing.gif" alt="" /></p>
<h2 id="up-next">Up next</h2>
<p>There are still a lot of issues with dark mode - most noticeably the company logos I’m using that don’t have transparent backgrounds, but also a few contrast issues in code highlighting.</p>
<p>There also seems to be an issue with the tags page where posts from different collections do not get included that I definitely want to fix now that interaction with tags is more prominent.</p>
<p>I recently wrote a newsletter featuring a ludicrous number of footnotes, and at some point I want to get <a href="https://edwardtufte.github.io/tufte-css/#sidenotes">Tufte “sidenotes”</a> here so that I can abuse footnotes in my blog posts as well. Sadly, I haven’t found a particularly elegant solution to this, so I’m putting it off for the time being.</p>
<p>And, of course, I’m hoping to do more blog-writing as well.</p>
<p>That’s all for now - feel free to highlight anything on this post if you have comments for questions!</p>robertWith dark mode on every website nowadays, my website seems to have fallen a bit behind the times. I decided it was about time to give my website a bit of a facelift - and over-hype it with a blog post!Semantic line breaks2021-02-18T00:00:00+00:002021-02-18T00:00:00+00:00https://bobheadxi.dev/semantic-line-breaks<p>As an organisation grows, it becomes increasingly important to record knowledge and processes.
One popular approach is using a collection of <a href="https://en.wikipedia.org/wiki/Markdown">Markdown</a> files, tracked in <a href="https://git-scm.com/">Git</a>, where changes can easily be proposed and discussed.
Unfortunately, the readability and understandability of these changes is often quite poor, negating much of the benefits of using a version control system.</p>
<p>Consider what a change - or a “diff” - usually looks like:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">- this line was removed
</span><span class="gi">+ this line was added
</span></code></pre></div></div>
<p>How does this play with changes to documentation?
In general, Markdown files are written with lines breaks at some arbitrary character column (such as 80 characters), or are written with entire paragraphs on a single line.
Both these approaches have significant issues:</p>
<ul>
<li>Line-breaking at some arbitrary character column looks nice when viewed in a terminal or code editor, but the consistency of line widths is easily lost when making and suggesting edits, necessitating reflowing entire paragraphs and creating unreadable diffs.
This leads to incomprehensible or uninformative diffs that are difficult to review.</li>
<li>Writing entire paragraphs on a single line is reasonably readable nowadays due to most editors and viewers performing wrapping out-of-the-box, but they make suggestions and diffs difficult to review due to every single change causing a diff on entire paragraphs.</li>
</ul>
<p>In the example above, the diff is small and there is not too much going on, so it is easy to see what has changed.
Consider the following text, where we want to change <code class="language-plaintext highlighter-rouge">incididunt</code> with <code class="language-plaintext highlighter-rouge">I am so hungry</code>:</p>
<blockquote>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</p>
</blockquote>
<p>If the text was broken at a character column, the resulting diff (including reflowing the text) might look like:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt
</span><span class="gi">+ Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor I am so
</span><span class="gd">- ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation
</span><span class="gi">+ hungry ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud
</span><span class="gd">- ullamco laboris nisi ut aliquip ex ea commodo consequat.
</span><span class="gi">+ exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
</span></code></pre></div></div>
<p>This can be rather incomprehensible. If the text was not broken at all, the diff would then look like:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
</span><span class="gi">+ Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor I am so hungry ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
</span></code></pre></div></div>
<p>This is marginally better, but still quite difficult, especially because not all git interfaces will be able to show you the specific word that has changed (and even fewer that can do that for very, very long lines, as is the case for paragraphs of many sentences).</p>
<p>To combat this, the idea of <em>semantic line breaks</em> has been floated.
The general idea is to perform line breaks along semantic boundaries, instead of just along paragraphs.
An approach suggested at <a href="https://sembr.org/"><code class="language-plaintext highlighter-rouge">sembr.org</code></a> sums this up as:</p>
<blockquote>
<p>When writing text with a compatible markup language, add a line break after each substantial unit of thought.</p>
</blockquote>
<p>This particular specification goes on to describe how this works:</p>
<blockquote>
<p>Many lightweight markup languages, including Markdown, reStructuredText, and AsciiDoc, join consecutive lines with a space.
Conventional markup languages like HTML and XML exhibit a similar behaviour in particular contexts.
This behaviour allows line breaks to be used as semantic delimiters, making prose easier to author, edit, and read in source — without affecting the rendered output.
[…]
By inserting line breaks at semantic boundaries, writers, editors, and other collaborators can make source text easier to work with, without affecting how it’s seen by readers.</p>
</blockquote>
<p>In my interpretation, a good semantic line break specification then ought to:</p>
<ul>
<li>Make use of how most Markdown specifications ignore single new lines to still provide a good <strong>rendered Markdown</strong> experience.</li>
<li>Leverage modern line-wrapping in most viewers to maintain a good <strong>raw Markdown</strong> experience.</li>
<li>Maintain understandable diffs in Markdown documentation for a good <strong>reviewing</strong> experience.</li>
</ul>
<p>I quite like this idea! Perhaps semantic line breaks could allow us to break this paragraph of text into smaller chunks, and make small diffs significantly more approachable, simpler to reason about, and easier to discuss.</p>
<h2 id="solving-unreadable-changes">Solving unreadable changes</h2>
<p><a href="https://sembr.org/"><code class="language-plaintext highlighter-rouge">sembr.org</code></a> proposes a set of rules that would make content easier to manage and make changes to. Their website presents the following example:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
</code></pre></div></div>
<p>Their <em>recommendation</em> is to change this to:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>All human beings are born free and equal in dignity and rights.
They are endowed with reason and conscience
and should act towards one another in a spirit of brotherhood.
</code></pre></div></div>
<p><em>Recommendation</em> is the crux of the problem here, and is a significant barrier to adoption.
The <a href="https://sembr.org/"><code class="language-plaintext highlighter-rouge">sembr.org</code></a> specification depends entirely on the writer to maintain the appropriate formatting, and it leaves the interpretation of what a “semantic boundary” is at all up in the air.
<em>Nine</em> of the twelve requirements in this particular specification are <code class="language-plaintext highlighter-rouge">MAY</code>’s, <code class="language-plaintext highlighter-rouge">SHOULD</code>’s, and <code class="language-plaintext highlighter-rouge">RECOMMEND</code>’s!
This is surely to lead to:</p>
<ul>
<li>Inconsistent and difficult documents, thanks to so much of the specification being up for interpretation.</li>
<li>Contributors forgetting to add, or simply not wanting to go through the trouble of adding, the necessary line breaks.</li>
<li><em>Someone</em> is going to be frustrated at someone else’s very short lines, and refuse to format appropriately.
Alternatively, they might disagree with someone else’s line breaks, and cause unnecessary churn in diffs.</li>
</ul>
<p>Both of these problems pose significant barriers to widespread adoption, which is necessary for any semantic line break specification to be of any use.</p>
<h2 id="a-formatter-for-semantic-line-breaks">A formatter for semantic line breaks</h2>
<p>A similar problem arises with code standards: semicolons? Spaces or tabs?
Left up to individuals, no standard will ever be truly consistent, especially in the face of the need to “just get the job done”.
In code formatting, this has primarily been solved mostly through automated tooling.
Why bother arguing about semicolons if a program will just do it for you, and will even check if everything is consistent?</p>
<p>What if the same thing could happen for documentation source: a tool to automatically format your text?
To accommodate this, I propose a simpler specification that still offers a small amount of customization:</p>
<ul>
<li>A <em>semantic boundary</em> is defined to be the end of a sentence.</li>
<li>Allow multiple short sentences to be part of a single line, up to a character threshold.</li>
<li>After a character threshold, a semantic boundary should be followed by a line break.</li>
</ul>
<p>A simpler set of rules reduces the opens the door to potential automation (a program would not need to make as many complicated decisions), and still achieves part of our original goal: changes now reflect changes to ideas within semantic boundaries, and more accurately reflect the idea being changed.</p>
<p>Returning to the <em>Lorem ipsum</em> example, with this version of semantic line breaks, our change might look like:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
</span><span class="gi">+ Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor I am so hungry ut labore et dolore magna aliqua.
</span><span class="p">Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
</span></code></pre></div></div>
<p>In this diff, it is significantly clearer what <em>idea</em> has changed, as encapsulated by the sentence it belongs in.
This makes it easier to understand the context of the change being made, reason about it, and open discussions regarding it.</p>
<p>I’ve taken a stab at creating just such a tool, <a href="https://github.com/bobheadxi/readable">Readable</a>, which will add semantic line breaks to any document for you with a single command, for example <code class="language-plaintext highlighter-rouge">readable fmt **/*.md</code>.</p>
<p>It will also feature commands to preview changes, perform changes as you edit, and checks that can be run in continuous integration.
So far it seems very promising, but there are a lot of edge cases to sort out and fix still.</p>
<p>Readable is being built in <a href="https://www.typescriptlang.org/">TypeScript</a> with <a href="https://deno.land/">Deno</a>, a handy new TypeScript and Javascript runtime.
Follow the project on <a href="https://github.com/bobheadxi/readable">GitHub</a>!</p>robertAs an organisation grows, it becomes increasingly important to record knowledge and processes. One popular approach is using a collection of Markdown files, tracked in Git, where changes can easily be proposed and discussed. Unfortunately, the readability and understandability of these changes is often quite poor, negating much of the benefits of using a version control system.Extending Docker images with sidecar services2020-06-21T00:00:00+00:002020-06-21T00:00:00+00:00https://bobheadxi.dev/docker-sidecar<p>Many open-source services are distributed as <a href="https://docs.docker.com/get-started/overview/#docker-objects">Docker images</a>, but sometimes you’ll want to extend the functionality slightly - whether it be adding your own endpoints, manipulating configuration of the service within the Docker image, or something along those lines.</p>
<p>In some cases, such as for manipulating configuration, most images will allow you to mount configuration within the container or use environment variables, so you can build a proper sidecar service to do whatever updates you want and restart the target container. The same goes for extending endpoints - a proper sidecar can serve you well. You can have one service manage the a large number of containers, which is what I did for <a href="/ipfs-orchestrator">a project I worked on at RTrade, <em>Nexus</em></a>.</p>
<p>There’s a significant convenience factor to keeping your service as a single container however - it’s far easier to distribute and easier to deploy, and if you are trying to extend an off-the-shelf service like <a href="https://grafana.com/">Grafana</a> that lives within a <a href="https://docs.sourcegraph.com/dev/architecture">large, multi-service deployment like Sourcegraph</a>, adding additional services becomes quite a pain. Heck, even adding an additional port is something that must have additional configuration propagated across an entire fleet of services across various deployment methods.</p>
<p>This article goes over the approach I took to achieve the following without significantly changing the public interface of our Grafana image:</p>
<ul>
<li>subscribe to core Sourcegraph configuration from another service</li>
<li>apply changes to the Grafana instance through API calls or configuration changes</li>
<li>report problems in the sidecar process</li>
</ul>
<p>While I’ll generally refer to Grafana in this writeup, you can apply it to pretty much any service image out there. I also use Go here, but you can draw from the same concepts to leverage your language of choice as well.</p>
<hr />
<p><strong>⚠️ Update:</strong> Since the writing of this post, we have pivoted on the plan (<a href="https://github.com/sourcegraph/sourcegraph/issues/11452#issuecomment-648628953">sourcegraph#11452</a>) and most of the work here no longer lives in our Grafana distribution, but is instead a part of our Prometheus distribution - see <a href="https://github.com/sourcegraph/sourcegraph/pull/11832">sourcegraph#11832</a> for the new implementation. You can explore the source code <a href="https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/docker-images/prometheus">on Sourcegraph</a>, and relevant documentation <a href="https://docs.sourcegraph.com/dev/background-information/observability/prometheus">here</a>.</p>
<p>Most of this article still applies though, but with Prometheus + Alertmanager instead of Grafana.</p>
<hr />
<ul id="markdown-toc">
<li><a href="#wrapping-the-sidecar-and-the-service" id="markdown-toc-wrapping-the-sidecar-and-the-service">Wrapping the sidecar and the service</a></li>
<li><a href="#implementing-the-wrapper" id="markdown-toc-implementing-the-wrapper">Implementing the wrapper</a> <ul>
<li><a href="#adding-endpoints" id="markdown-toc-adding-endpoints">Adding endpoints</a></li>
<li><a href="#restarting-the-service" id="markdown-toc-restarting-the-service">Restarting the service</a></li>
</ul>
</li>
<li><a href="#source-code-and-pull-requests" id="markdown-toc-source-code-and-pull-requests">Source code and pull requests</a></li>
<li><a href="#about-sourcegraph" id="markdown-toc-about-sourcegraph">About Sourcegraph</a></li>
</ul>
<h2 id="wrapping-the-sidecar-and-the-service">Wrapping the sidecar and the service</h2>
<p>In a nutshell, the primary change made to the Grafana image is an adjustment to the entrypoint script:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">- exec "/run.sh" # run the Grafana image's default entrypoint
</span><span class="gi">+ exec "/bin/grafana-wrapper" # run our sidecar program, implemented as a wrapper
</span></code></pre></div></div>
<p>I’ll go over the specifics of the wrapper in the next section, since I think it’ll help to understand how we’re extending the vanilla image. You’ll want to set up a <a href="https://docs.docker.com/engine/reference/builder/">Dockerfile</a> that builds the program and copies it over to the final image, which should be based on the vanilla image:</p>
<div class="language-Dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> golang:latest AS builder</span>
<span class="c"># ... build your sidecar</span>
<span class="k">FROM</span><span class="s"> grafana/grafana:latest AS final</span>
<span class="c"># copy your compiled program from the builder into the final image</span>
<span class="k">COPY</span><span class="s"> --from=builder /go/bin/grafana-wrapper /bin/grafana-wrapper</span>
<span class="k">ENTRYPOINT</span><span class="s"> ["/entry.sh"]</span>
</code></pre></div></div>
<p>The goal here is to start a wrapper program that will start up your sidecar and the actual service within the image you are trying to extend (<code class="language-plaintext highlighter-rouge">grafana/grafana</code> in this case).</p>
<h2 id="implementing-the-wrapper">Implementing the wrapper</h2>
<p>Depending on what level of functionality you want to achieve, this program can be as simple as a server that makes API calls to the main service. For example:</p>
<pre><code class="language-mermaid">sequenceDiagram
participant Sidecar
participant Service
note right of Service: the program<br />you are extending
activate Sidecar
Sidecar->>Service: cmd.Start
activate Service
loop stuff
Sidecar->>Service: Requests
Service->>Sidecar: Responses
end
Service->>Sidecar: cmd.Wait returns
deactivate Service
deactivate Sidecar
</code></pre>
<p>This can be achieved using the Go standard library’s <a href="https://golang.org/pkg/os/exec/"><code class="language-plaintext highlighter-rouge">os/exec</code> package</a> to run the main image entrypoint, start up the sidecar, and simply wait for the entrypoint to exit.</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="p">(</span>
<span class="s">"errors"</span>
<span class="s">"os"</span>
<span class="s">"os/exec"</span>
<span class="p">)</span>
<span class="c">// newGrafanaRunCmd instantiates a new command to run grafana.</span>
<span class="k">func</span> <span class="n">newGrafanaRunCmd</span><span class="p">()</span> <span class="o">*</span><span class="n">exec</span><span class="o">.</span><span class="n">Cmd</span> <span class="p">{</span>
<span class="n">cmd</span> <span class="o">:=</span> <span class="n">exec</span><span class="o">.</span><span class="n">Command</span><span class="p">(</span><span class="s">"/run.sh"</span><span class="p">)</span>
<span class="n">cmd</span><span class="o">.</span><span class="n">Env</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">Environ</span><span class="p">()</span> <span class="c">// propagate env to grafana</span>
<span class="n">cmd</span><span class="o">.</span><span class="n">Stderr</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">Stderr</span>
<span class="n">cmd</span><span class="o">.</span><span class="n">Stdout</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">Stdout</span>
<span class="k">return</span> <span class="n">cmd</span>
<span class="p">}</span>
<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">grafanaErrs</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="k">chan</span> <span class="kt">error</span><span class="p">)</span>
<span class="k">go</span> <span class="k">func</span><span class="p">()</span> <span class="p">{</span>
<span class="n">grafanaErrs</span> <span class="o"><-</span> <span class="n">newGrafanaRunCmd</span><span class="p">()</span><span class="o">.</span><span class="n">Run</span><span class="p">()</span>
<span class="p">}()</span>
<span class="k">go</span> <span class="k">func</span><span class="p">()</span> <span class="p">{</span>
<span class="c">// your sidecar</span>
<span class="p">}()</span>
<span class="c">// wait for grafana to exit</span>
<span class="n">err</span> <span class="o">:=</span> <span class="o"><-</span><span class="n">grafanaErrs</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="c">// propagate exit code outwards</span>
<span class="k">var</span> <span class="n">exitErr</span> <span class="o">*</span><span class="n">exec</span><span class="o">.</span><span class="n">ExitError</span>
<span class="k">if</span> <span class="n">errors</span><span class="o">.</span><span class="n">As</span><span class="p">(</span><span class="n">err</span><span class="p">,</span> <span class="o">&</span><span class="n">exitErr</span><span class="p">)</span> <span class="p">{</span>
<span class="n">os</span><span class="o">.</span><span class="n">Exit</span><span class="p">(</span><span class="n">exitErr</span><span class="o">.</span><span class="n">ProcessState</span><span class="o">.</span><span class="n">ExitCode</span><span class="p">())</span>
<span class="p">}</span>
<span class="n">os</span><span class="o">.</span><span class="n">Exit</span><span class="p">(</span><span class="m">1</span><span class="p">)</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">os</span><span class="o">.</span><span class="n">Exit</span><span class="p">(</span><span class="m">0</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="adding-endpoints">Adding endpoints</h3>
<p>What if both your sidecar and the extended service expose endpoints over the network? Sure, you could simply have the sidecar listen on a separate port, but that would involve adding another port to expose on your container, and adds another point of configuration that dependents need to be aware of before they can connect to your service.</p>
<p>My solution to this is to keep the same container “interface” by having a reverse proxy listen on the exposed port, which would handle forwarding requests to either the main service or the sidecar.</p>
<pre><code class="language-mermaid">graph TB
subgraph Container
R{Router}
Sidecar
Service
ReverseProxy
end
Dependent <-- $PORT --> R
R <-- sidecarHandler --> Sidecar
R <--> ReverseProxy
ReverseProxy <-- internalServicePort --> Service
</code></pre>
<p>Again, the Go standard library comes to the rescue with the <a href="https://golang.org/pkg/net/http/httputil/"><code class="language-plaintext highlighter-rouge">net/http/httputil</code> package</a>. We also use <code class="language-plaintext highlighter-rouge">gorilla/mux</code> for routing in this example, but you can choose any routing library that serves your needs.</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="p">(</span>
<span class="s">"net/http/httputil"</span>
<span class="s">"github.com/gorilla/mux"</span>
<span class="p">)</span>
<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
<span class="c">// ... as before</span>
<span class="n">router</span> <span class="o">:=</span> <span class="n">mux</span><span class="o">.</span><span class="n">NewRouter</span><span class="p">()</span>
<span class="c">// route specific paths to your sidecar's endpoints</span>
<span class="n">router</span><span class="o">.</span><span class="n">Prefix</span><span class="p">(</span><span class="s">"/sidecar/api"</span><span class="p">,</span> <span class="n">sidecar</span><span class="o">.</span><span class="n">Handler</span><span class="p">())</span>
<span class="c">// if a request doesn't route to the sidecar, route to your main service</span>
<span class="n">router</span><span class="o">.</span><span class="n">PathPrefix</span><span class="p">(</span><span class="s">"/"</span><span class="p">)</span><span class="o">.</span><span class="n">Handler</span><span class="p">(</span><span class="o">&</span><span class="n">httputil</span><span class="o">.</span><span class="n">ReverseProxy</span><span class="p">{</span>
<span class="c">// the Director of a ReverseProxy handles transforming requests and</span>
<span class="c">// sending them on to the correct location, in this case another port</span>
<span class="c">// in this container (our service's internal port)</span>
<span class="n">Director</span><span class="o">:</span> <span class="k">func</span><span class="p">(</span><span class="n">req</span> <span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span> <span class="p">{</span>
<span class="n">req</span><span class="o">.</span><span class="n">URL</span><span class="o">.</span><span class="n">Scheme</span> <span class="o">=</span> <span class="s">"http"</span>
<span class="n">req</span><span class="o">.</span><span class="n">URL</span><span class="o">.</span><span class="n">Host</span> <span class="o">=</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">":%s"</span><span class="p">,</span> <span class="n">serviceInternalPort</span><span class="p">)</span>
<span class="p">},</span>
<span class="p">})</span>
<span class="k">go</span> <span class="k">func</span><span class="p">()</span> <span class="p">{</span>
<span class="c">// listen on our external port - the port that will be exposed by the</span>
<span class="c">// container - to handle routing</span>
<span class="n">err</span> <span class="o">:=</span> <span class="n">http</span><span class="o">.</span><span class="n">ListenAndServe</span><span class="p">(</span><span class="n">fmt</span><span class="o">.</span><span class="n">Sprintf</span><span class="p">(</span><span class="s">":%s"</span><span class="p">,</span> <span class="n">exportPort</span><span class="p">),</span> <span class="n">router</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="o">&&</span> <span class="o">!</span><span class="n">errors</span><span class="o">.</span><span class="n">Is</span><span class="p">(</span><span class="n">err</span><span class="p">,</span> <span class="n">http</span><span class="o">.</span><span class="n">ErrServerClosed</span><span class="p">)</span> <span class="p">{</span>
<span class="n">os</span><span class="o">.</span><span class="n">Exit</span><span class="p">(</span><span class="m">1</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">os</span><span class="o">.</span><span class="n">Exit</span><span class="p">(</span><span class="m">0</span><span class="p">)</span>
<span class="p">}()</span>
<span class="c">// ... as before</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="restarting-the-service">Restarting the service</h3>
<p>In my case, I eventually had to add restart capabilities, since some configuration changes required the service to be restarted.</p>
<p>Simply restarting the container was not an option, since it would complicate how the configuration would persist, and would cause us to lose the advantage of having a single self-isolated container that required no external care.</p>
<p>Fortunately, <code class="language-plaintext highlighter-rouge">exec.Cmd</code>, once started, provides an <a href="https://golang.org/pkg/os/#Process"><code class="language-plaintext highlighter-rouge">*os.Process</code></a> that we can use to stop an existing process. I introduced a controller that would expose functions through which the sidecar can stop and start the main service:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">type</span> <span class="n">grafanaController</span> <span class="k">struct</span> <span class="p">{</span>
<span class="n">mux</span> <span class="n">sync</span><span class="o">.</span><span class="n">Mutex</span>
<span class="n">proc</span> <span class="o">*</span><span class="n">os</span><span class="o">.</span><span class="n">Process</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Stopping is pretty straight-forward - if the service is running, <code class="language-plaintext highlighter-rouge">proc</code> will be non-nill, and we can simply signal it to stop:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">c</span> <span class="o">*</span><span class="n">grafanaController</span><span class="p">)</span> <span class="n">Stop</span><span class="p">()</span> <span class="kt">error</span> <span class="p">{</span>
<span class="n">c</span><span class="o">.</span><span class="n">mux</span><span class="o">.</span><span class="n">Lock</span><span class="p">()</span>
<span class="k">defer</span> <span class="n">c</span><span class="o">.</span><span class="n">mux</span><span class="o">.</span><span class="n">Unlock</span><span class="p">()</span>
<span class="k">if</span> <span class="n">c</span><span class="o">.</span><span class="n">proc</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">c</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">Signal</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">Interrupt</span><span class="p">);</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s">"failed to stop Grafana instance: %w"</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">_</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">c</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">Wait</span><span class="p">()</span> <span class="c">// this can error for a variety of irrelvant reasons</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">c</span><span class="o">.</span><span class="n">proc</span><span class="o">.</span><span class="n">Release</span><span class="p">();</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="n">c</span><span class="o">.</span><span class="n">log</span><span class="o">.</span><span class="n">Warn</span><span class="p">(</span><span class="s">"failed to release process"</span><span class="p">,</span> <span class="s">"error"</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">c</span><span class="o">.</span><span class="n">proc</span> <span class="o">=</span> <span class="no">nil</span>
<span class="p">}</span>
<span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Notice how this is starting to look a bit gnarly:</p>
<ul>
<li>A failed <code class="language-plaintext highlighter-rouge">proc.Wait()</code> does not strictly indicate that the shutdown failed, but could also indicate that the process shut down immediately (before <code class="language-plaintext highlighter-rouge">proc.Wait()</code> could run). However, it is still import to wait, since a signal does not indicate the service has stopped completely.</li>
<li>A failed <code class="language-plaintext highlighter-rouge">proc.Release()</code> does not strictly indicate a fatal error, so we log and continue as if nothing has happened.</li>
</ul>
<p>Starting the service is even less appealing - we can’t just start the service on a goroutine and ignore it, since we want to be aware of and log errors. However, not every error should be fatal, and the line is blurry.</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">func</span> <span class="p">(</span><span class="n">c</span> <span class="o">*</span><span class="n">grafanaController</span><span class="p">)</span> <span class="n">RunServer</span><span class="p">()</span> <span class="kt">error</span> <span class="p">{</span>
<span class="n">c</span><span class="o">.</span><span class="n">mux</span><span class="o">.</span><span class="n">Lock</span><span class="p">()</span>
<span class="k">defer</span> <span class="n">c</span><span class="o">.</span><span class="n">mux</span><span class="o">.</span><span class="n">Unlock</span><span class="p">()</span>
<span class="c">// spin up grafana and track process</span>
<span class="n">c</span><span class="o">.</span><span class="n">log</span><span class="o">.</span><span class="n">Debug</span><span class="p">(</span><span class="s">"starting Grafana server"</span><span class="p">)</span>
<span class="n">cmd</span> <span class="o">:=</span> <span class="n">newGrafanaRunCmd</span><span class="p">()</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">cmd</span><span class="o">.</span><span class="n">Start</span><span class="p">();</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">return</span> <span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s">"failed to start Grafana: %w"</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">c</span><span class="o">.</span><span class="n">proc</span> <span class="o">=</span> <span class="n">cmd</span><span class="o">.</span><span class="n">Process</span>
<span class="c">// capture results from grafana process</span>
<span class="k">go</span> <span class="k">func</span><span class="p">()</span> <span class="p">{</span>
<span class="c">// cmd.Wait output:</span>
<span class="c">// * exits with status 0 => nil</span>
<span class="c">// * command fails to run or stopped => *ExitErr</span>
<span class="c">// * other IO error => error</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">cmd</span><span class="o">.</span><span class="n">Wait</span><span class="p">();</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="k">var</span> <span class="n">exitErr</span> <span class="o">*</span><span class="n">exec</span><span class="o">.</span><span class="n">ExitError</span>
<span class="k">if</span> <span class="n">errors</span><span class="o">.</span><span class="n">As</span><span class="p">(</span><span class="n">err</span><span class="p">,</span> <span class="o">&</span><span class="n">exitErr</span><span class="p">)</span> <span class="p">{</span>
<span class="n">exitCode</span> <span class="o">:=</span> <span class="n">exitErr</span><span class="o">.</span><span class="n">ProcessState</span><span class="o">.</span><span class="n">ExitCode</span><span class="p">()</span>
<span class="c">// unfortunately grafana exits with code 1 on sigint</span>
<span class="k">if</span> <span class="n">exitCode</span> <span class="o">></span> <span class="m">1</span> <span class="p">{</span>
<span class="n">c</span><span class="o">.</span><span class="n">log</span><span class="o">.</span><span class="n">Crit</span><span class="p">(</span><span class="s">"grafana exited with unexpected code"</span><span class="p">,</span> <span class="s">"exitcode"</span><span class="p">,</span> <span class="n">exitCode</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">Exit</span><span class="p">(</span><span class="n">exitCode</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">c</span><span class="o">.</span><span class="n">log</span><span class="o">.</span><span class="n">Info</span><span class="p">(</span><span class="s">"grafana has stopped"</span><span class="p">,</span> <span class="s">"exitcode"</span><span class="p">,</span> <span class="n">exitCode</span><span class="p">)</span>
<span class="k">return</span>
<span class="p">}</span>
<span class="n">c</span><span class="o">.</span><span class="n">log</span><span class="o">.</span><span class="n">Warn</span><span class="p">(</span><span class="s">"error waiting for grafana to stop"</span><span class="p">,</span> <span class="s">"error"</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}()</span>
<span class="k">return</span> <span class="no">nil</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Just like errors in libraries, exit codes are often up to the jurisdiction of the developer, and in this case Grafana does not give us a useful indication of whether a process has stopped because of an intentional <code class="language-plaintext highlighter-rouge">SIGINT</code>, or if a fatal error occurred causing it to exit (and indicating that we should exit our controller).</p>
<p>You can add some additional management (i.e. a thread-safe flag or channel to indicate that a shutdown has been triggered intentionally, and only exit on code 1 if this flag is not set), but the complexity of what is meant to be a simple wrapper will quickly ramp up.</p>
<p>At the time of writing I’m not sure that any additional handling is required for a reasonable experience, but I’ll be keeping an eye on how this code behaves.</p>
<p>Note that this also means we can no longer just block on the main program until the service exits, since it can exit (intentionally) at any time - instead, we must depend on an external <code class="language-plaintext highlighter-rouge">SIGINT</code> to tell us when to stop:</p>
<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="p">(</span>
<span class="s">"os"</span>
<span class="s">"os/signal"</span>
<span class="p">)</span>
<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
<span class="c">// ... mostly as before</span>
<span class="n">c</span> <span class="o">:=</span> <span class="nb">make</span><span class="p">(</span><span class="k">chan</span> <span class="n">os</span><span class="o">.</span><span class="n">Signal</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span>
<span class="n">signal</span><span class="o">.</span><span class="n">Notify</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">os</span><span class="o">.</span><span class="n">Interrupt</span><span class="p">)</span>
<span class="o"><-</span><span class="n">c</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">:=</span> <span class="n">grafana</span><span class="o">.</span><span class="n">Stop</span><span class="p">();</span> <span class="n">err</span> <span class="o">!=</span> <span class="no">nil</span> <span class="p">{</span>
<span class="n">log</span><span class="o">.</span><span class="n">Warn</span><span class="p">(</span><span class="s">"failed to stop Grafana server"</span><span class="p">,</span> <span class="s">"error"</span><span class="p">,</span> <span class="n">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<h2 id="source-code-and-pull-requests">Source code and pull requests</h2>
<p>And that’s it for a rudimentary sidecar service that allows you to continue treating a service container as a completely isolated unit!</p>
<p>Some relevant pull requests implementing these features:</p>
<ul>
<li><a href="https://github.com/sourcegraph/sourcegraph/pull/11427">sourcegraph#11427</a> - I ended up reverting this due to bugs in certain environments and adding it back in <a href="https://github.com/sourcegraph/sourcegraph/pull/11483">sourcegraph#11483</a>, but both PRs include relevant discussions. These PRs implements a basic sidecar without start and restart capabilities.</li>
<li><a href="https://github.com/sourcegraph/sourcegraph/pull/11554">sourcegraph#11554</a> adds the ability for the sidecar to start and restart the main service.</li>
</ul>
<p>Note that most of the above work has been superseded by a pivot to Prometheus (see the update at the start of this post).
Following the pivot, a lot of other work was enabled by the addition of this sidecar:</p>
<ul>
<li><a href="https://github.com/sourcegraph/sourcegraph/issues/12010">sourcegraph#12010</a> (implementation: <a href="https://github.com/sourcegraph/sourcegraph/pull/12491">sourcegraph#12491</a>) proposed a mechanism for denoting ownership in our monitoring and routing alerts appropriately.</li>
<li><a href="https://github.com/sourcegraph/sourcegraph/pull/17602">sourcegraph#17602</a> demonstrated potential summary capabilities a sidecar can export.</li>
<li><a href="https://github.com/sourcegraph/sourcegraph/pull/17014">sourcegraph#17014</a> and <a href="https://github.com/sourcegraph/sourcegraph/pull/17034">sourcegraph#17034</a> adds timestamped links to relevant Grafana panels to alert messages.</li>
</ul>
<h2 id="about-sourcegraph">About Sourcegraph</h2>
<p>Learn more about Sourcegraph <a href="https://about.sourcegraph.com/">here</a>.</p>robertMany open-source services are distributed as Docker images, but sometimes you’ll want to extend the functionality slightly - whether it be adding your own endpoints, manipulating configuration of the service within the Docker image, or something along those lines.Introducing the new UBC Launch Pad website2020-04-25T00:00:00+00:002020-04-25T00:00:00+00:00https://bobheadxi.dev/introducing-new-launch-pad-site<p>We’ve had a design sitting around for a while now, but this year we’ve finally decided to get to work and churn out a brand new, from-the-ground-up refresh of our 4-year-old website to showcase our new branding and this semester’s projects!</p>
<h2 id="the-new-website">The New Website</h2>
<figure>
<img src="/assets/images/posts/introducing-new-launch-pad-site/landing.gif" />
</figure>
<p>We’re launching our new website today on <a href="https://ubclaunchpad.com">ubclaunchpad.com</a>! The revamped website features:</p>
<ul>
<li><strong>completely refreshed design</strong></li>
<li>brand new sections featuring current and past projects</li>
<li>each project now has a shareable modal where teams can showcase their work</li>
<li>everything is still fully responsive!</li>
</ul>
<p><br /></p>
<figure>
<img src="../../assets/images/posts/introducing-new-launch-pad-site/responsive.png" />
</figure>
<p><br /></p>
<p>This revamp has been in the works for a long time, but development only started less than 2 weeks ago - so if you find any issues <a href="TODO">please let us know</a>!</p>
<h2 id="behind-the-scenes">Behind the Scenes</h2>
<h3 id="design">Design</h3>
<p>Our design team, lead by our wonderful designer <a href="https://github.com/cowjuh">Jenny</a>, first prepared a set of refreshed branding and designs for a new website in early 2019. The design went through several iterations, we showed it off to the club, and… never got around to building it, which is unfortunately because the team did a great job with the designs and it looked <em>great</em>.</p>
<figure>
<img src="../../assets/images/posts/introducing-new-launch-pad-site/old-designs.png" />
</figure>
<figure>
<img src="../../assets/images/posts/introducing-new-launch-pad-site/final-designs.png" />
</figure>
<p>By the time April 2020 rolled around, we were in desperate need of an online platform aligned with the branding we were sending out to sponsors and partners to showcase this year’s projects.</p>
<h3 id="development">Development</h3>
<p>There were two main pain points of the existing website (which is over 4 years old at this point) that I wanted to tackle with a rewrite:</p>
<ul>
<li>The old website was written with the bare minimum amount of web technologies possible. While this has its advantages, it also meant that nobody really wanted to work on it - learning is a big motivator, and web development at larger firms tend to revolve around trendy web frameworks nowadays. It also meant that adding any sort of interactive required adding gnarly JavaScript and jQuery that nobody really wanted to maintain. It also meant that code reuse could be rather difficult, even though the CSS classes were reasonably well-maintained.</li>
<li>The website was difficult for non-technical folks (and frankly anyone not familiar with the codebase) to update. With everything piled up in one massive <code class="language-plaintext highlighter-rouge">index.html</code> and a handful of random JavaScript files, the website quickly lagged behind in content, and there were times when we didn’t even add club signup links on the website until well into a recruitment season.</li>
</ul>
<p>In hopes of remedying these issues, I made two major decisions right as I started:</p>
<ul>
<li>I chose an approachable web framework, in this case <a href="https://vuejs.org/">Vue.js</a>. Its template-based approach seemed well-suited to the mostly-static website that we were going to build, while being reasonably trendy and flexible enough to accommodate new integrations in the future (for example, with our <a href="https://github.com/ubclaunchpad/rocket2">Slack bot</a>). To go with it I chose <a href="https://www.typescriptlang.org">TypeScript</a>, a typed superset of JavaScript, which would serve as a form of self-documentation for future Launch Pad students to leverage.</li>
<li>I wanted as much of the website’s information configurable through a familar yet easily validated format. Updating the projects featured on the website or our sponsors or the positions open for application should be a simple matter of editing a single file and redeploying the website. To do this, I added <a href="https://github.com/ubclaunchpad/ubclaunchpad.com/blob/master/src/data/types.ts">TypeScript types for the data we would need</a> and a <a href="https://github.com/ubclaunchpad/ubclaunchpad.com/blob/master/src/config.ts">single file, <code class="language-plaintext highlighter-rouge">config.ts</code>, where all the website data could be viewed and edited</a>. Accompanying this is an automatically generated <a href="https://ubclaunchpad.com/config">configuration guidelines website</a> that provides additional instructions and presents documentation on every single field in a hopefully digestible and not-overly-intimidating matter.</li>
</ul>
<p><img src="../../assets/images/posts/introducing-new-launch-pad-site/docs.png" alt="" /></p>
<p>We’d made a few attempts at starting development on the website before, but unfortunately since it was always treated as more of a club “side project” or an off-season project, it was quite difficult to get the ball rolling and development would quickly fizzle out. To get the project off the ground with some momentum, I decided I would build the initial implementation by myself. I <a href="https://github.com/ubclaunchpad/ubclaunchpad.com/commit/64e720c4bb1fd74f9aa49fd4096b10a25a5212fe">started off on my own on April 17th</a>, <a href="https://github.com/ubclaunchpad/ubclaunchpad.com/issues/16">finished implementing a first pass of the entire website just <em>two</em> days later</a>, set up what documentation I could, and opened the project up to contributions from the club!</p>
<figure>
<img src="../../assets/images/posts/introducing-new-launch-pad-site/internal-launch.png" />
</figure>
<p>Despite having a website you could scroll through that looked pretty close to the actual design, there was still a significant amount of work to be done:</p>
<ul>
<li>mobile-friendliness was pretty poor in some sections</li>
<li>content had to be created and collected for projects, both old and new, to feature on the website</li>
<li>nothing was interactive - most notably, there wasn’t even a way to see a project’s description at first</li>
<li>parts of the design were a bit rough around the edges once implemented</li>
</ul>
<p>…and the list went on and on, and grew as we continued to add features. Thankfully, a couple of members stepped up and did some <em>great</em> work helping bring the project to a state where we could retire the old website! I’d like to give special thanks to:</p>
<ul>
<li><a href="https://github.com/RachitMalik12">Rachit</a> for <a href="https://github.com/ubclaunchpad/ubclaunchpad.com/pull/45">wireframing and implementing the first iteration of the project modal</a>, amongst many other contributions</li>
<li><a href="https://github.com/srijonsaha">Srijon</a> for improvements like <a href="https://github.com/ubclaunchpad/ubclaunchpad.com/pull/42">some awesome on-hover interactions</a> and providing tons of feedback</li>
<li><a href="https://github.com/renehuang8822">Rene</a> and <a href="https://github.com/SophieMBerger">Sophie</a> for trying out the new configuration format to add content to the website</li>
<li>Everyone who gave the website trial runs and provided feedback!</li>
<li><a href="https://github.com/cowjuh">Jenny</a> who lead the design of the website (and all our recent branding efforts!), as well as everyone who participated in our previous attempts at building the website</li>
</ul>
<p>All together, this entire project took us <strong>just 8 days</strong> (April 17th to April 25th) to bring to launch, which I think is not too shabby of an achievement. Thank you to everyone who participated!</p>robertWe’ve had a design sitting around for a while now, but this year we’ve finally decided to get to work and churn out a brand new, from-the-ground-up refresh of our 4-year-old website to showcase our new branding and this semester’s projects!