{
  "version": "https://jsonfeed.org/version/1",
  "title": "Ian's Digital Garden",
  "home_page_url": "https://ianwwagner.com/",
  "feed_url": "https://ianwwagner.com//tag-software-engineering.json",
  "description": "",
  "items": [
    {
      "id": "https://ianwwagner.com//two-weeks-of-emacs.html",
      "url": "https://ianwwagner.com//two-weeks-of-emacs.html",
      "title": "Two Weeks of Emacs",
      "content_html": "<p>I'm approximately 2 weeks into using emacs as my daily editor and, well, I haven't opened JetBrains since.\nI honestly didn't expect that, but here we are.</p>\n<h1><a href=\"#papercuts-i-said-i-would-solve-later\" aria-hidden=\"true\" class=\"anchor\" id=\"papercuts-i-said-i-would-solve-later\"></a>Papercuts I said I would solve later</h1>\n<p>Here's the list of things I noted in my last post that I said I'd come back to.\nThe list has changed a bit since the last post:</p>\n<p>Solved:</p>\n<ul>\n<li>Issues with automatic indentation</li>\n<li>Files not reloading automatically when changed externally (fixed with <code>global-auto-revert-mode</code>)</li>\n<li>Highlighting mutable variables</li>\n</ul>\n<p>Haven't bothered to try resolving (infrequently used):</p>\n<ul>\n<li>Macro expansion</li>\n<li>Code completion and jump to definition within rustdoc comments</li>\n</ul>\n<p>The highlighting one is worth a bit of explanation.\nHere's what I had to do to get it working:</p>\n<pre><code class=\"language-lisp\">;; Highlight mutable variables (like RustRover/JetBrains).\n;; NB: Requires eglot 1.20+\n(defface eglot-semantic-mutable\n  '((t :underline t))\n  &quot;Face for mutable variables via semantic tokens.&quot;)\n\n(with-eval-after-load 'eglot\n  (add-to-list 'eglot-semantic-token-modifiers &quot;mutable&quot;))\n</code></pre>\n<p>Apparently this requires a fairly recent version of eglot to work,\nand it isn't necessarily supported by every LSP,\nbut it works for me with rust-analyzer.\nI spent way too much time on this because for some reason running <code>M-x eglot-reconnect</code>\nor <code>M-x eglot</code> and accepting a restart didn't reset the buffer settings or something.\nIf this doesn't work, try killing the buffer and then find the file again.</p>\n<h1><a href=\"#other-new-papercuts\" aria-hidden=\"true\" class=\"anchor\" id=\"other-new-papercuts\"></a>Other (new) papercuts!</h1>\n<p>Here's a similarly categorized list of things that I found over the past week or so.</p>\n<p>Solved:</p>\n<ul>\n<li>&quot;Project&quot; views: I got even more than I bargained for with <code>(setq tab-bar-mode t)</code>! It's great.\nIt's even better than I expected TBH since every tab can contain an arbitrary configuration of buffers.\nThis is a weird way of thinking at first, but it's really nice since stuff doesn't need to follow the traditional bounds\nthat I was used to in IDEs (e.g. a tab can be entirely terminal buffers, or cross &quot;projects&quot; which is useful to me).</li>\n<li><code>xref-matches-in-files</code> was SLOW. Turned out to be an issue in my <code>fish</code> configuration (which isn't even my &quot;preferred&quot; shell,\nbut it's still my login shell due to being more supported than nushell, which I use for most things).\nRemoving pyenv fixed that.\nAlso you can set it to use ripgrep with <code>(setq xref-search-program 'ripgrep)</code></li>\n<li>Fuzzy finding files by name within a project quickly annoyed me.\nTurns out this is also not an unreasonable hotkey with the built-in project.el: <code>C-x p f</code> (mnemonic: project find).</li>\n<li>Searching the project by <em>symbol</em> (variable, struct, trait, etc.) works well with the <code>consult-eglot</code> package.\nSpecifically, it includes a <code>consult-eglot-symbols</code> command.</li>\n</ul>\n<p>Not solved yet:</p>\n<ul>\n<li>It was really nice to just fold sections of code by clicking something in the margin (&quot;fringe&quot; in emacs parlance; gutter in JetBrains).\nIt looks like there are ways to do this; I just haven't had time to mess with it.</li>\n<li>The language server can get confused if you do a big operation like a git branch switch. Restarting eglot fixes this.\nI'm sure this happened occasionally with JetBrains but it seems worse here.</li>\n<li>The lovely <code>diff-hl</code> package doesn't get the hint when files reload for some reason.</li>\n</ul>\n<p>I'll also add a quick note that it's (still) surprisingly easy to screw up your own config.\nEmacs as a system is super flexible but that also makes it somewhat fragile.\nEverything is programmable, in a single-threaded, garbage-collected language.</p>\n<p>One snag I hit was that after some period, the environment got super slow,\naffecting things like unit test runtimes in terminal buffers,\nand making input noticeably laggy.\nThe issue turned out to be my <code>global-auto-revert-mode</code> config.\nApparently if you do it wrong, it turns into a whole stack of polling operations for every buffer.\nThis was a consequence of Claude suggesting something dumb and me not researching it :P\nThe normal configuration will use filesystem notifications like kqueue or inotify.</p>\n<h1><a href=\"#whats-next\" aria-hidden=\"true\" class=\"anchor\" id=\"whats-next\"></a>What's next?</h1>\n<p>I'm pretty happy with the new setup overall.\nObviously some room for tweaks, but it's pretty great overall,\nand I'm really enjoying the tab bar approach for organizing things.\nI'm also frankly shocked at how little CPU I'm using relative to previous norms on my MacBook.</p>\n<p>Next up I'll probably try (in no particular order):</p>\n<ul>\n<li>Magit / Majitsu; I actually love Sublime Merge, but wouldn't mind one less context switch.\nEspecially if I can get a view of the current project easily based on context.\nSublime's search interface is terrible when you have hundreds of repos.</li>\n<li>Chezmoi for dotfile sync + see what breaks on my desktop (FreeBSD).</li>\n<li>More adventures with TRAMP. I used this extensively in the early '00s but have mostly been doing local dev this time around.\nBut I see emacs having a lot of potential for remote dev with TRAMP so I'll give that a shot for some stuff over the next few weeks.</li>\n</ul>\n",
      "summary": "",
      "date_published": "2026-03-28T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "software-engineering"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//returning-to-emacs.html",
      "url": "https://ianwwagner.com//returning-to-emacs.html",
      "title": "Returning to Emacs",
      "content_html": "<h1><a href=\"#jetbrains-woes\" aria-hidden=\"true\" class=\"anchor\" id=\"jetbrains-woes\"></a>JetBrains woes</h1>\n<p>I have been a fan of JetBrains products for over a decade by now,\nand an unapologetic lover of IDEs generally.\nI've used PyCharm since shortly after it launched,\nand over the years I've used IntelliJ IDEA,\nWebStorm, DataGrip, RustRover, and more.\nI literally have the all products pack (and have for many years).</p>\n<p>I truly believe that a good IDE can be a productivity multiplier.\nYou get refactoring, jump-to-definition, symbol-aware search,\nsaved build/run configurations, a nice and consistent interface\nto otherwise terrible tooling (looking at you CMake and the half dozen Python package managers\nof the last decade and change).</p>\n<p>But something has changed over the past few years.\nThe quality of the product has generally deteriorated in several ways.\nWith the advent of LSP, the massive lead JetBrains had in &quot;code intelligence&quot;\nhas eroded, and in many cases no longer exists.\nThe resource requirements of the IDE have also ballooned massively,\neven occasionally causing memory pressure on my amply equipped MacBook Pro with 32GB of RAM.</p>\n<p>(Side note: I regularly have 3 JetBrains IDEs open at once because I need to work in many languages,\nand for some reason they refuse to ship a single product that does that.\nI would have paid for such a product.)</p>\n<p>And as if that weren't enough, it seems like I have to restart to install some urgent nagging update\nseveral times/week, usually related to one of their confusing mess of AI plugins\n(is AI Chat what we're supposed to use? Or Junie? Or... what?).\nTo top it all off, stability has gone out the window.\nAt least once/week, I will open my laptop from sleep,\nonly to find out that one or more of my JetBrains IDEs has crashed.\nUsually RustRover.\nWhich also eats up like 30GB of extra disk space for things like macro expansions\nand other code analysis.\nThe taxes are high and increasing on every front.</p>\n<h1><a href=\"#my-philosophy-of-editors\" aria-hidden=\"true\" class=\"anchor\" id=\"my-philosophy-of-editors\"></a>My philosophy of editors</h1>\n<p>So, I decided the time was right to give Emacs another shot.</p>\n<p>If you know me personally, you may recall that I made some strong statements in the past\nto the effect that spending weeks writing thousands of lines of Lua to get the ultimate Neovim config was silly.\nAnd my strongly worded statements of the past were partially based on my own experiences with such editors,\nincluding Emacs.\nBasically, I appreciate that you <em>can</em> &quot;build your own lightsaber&quot;,\nbut I did not consider that to be a good use of my time.\nOne of the reasons I like(d) JetBrains is that I <em>didn't</em> ever need to think about tweaking configs!</p>\n<p>But things have gotten so bad that I figured I'd give it a shot with a few stipulations.</p>\n<ol>\n<li>I would try it for a week, but if it seriously hampered my productivity after a few days, I'd switch back.</li>\n<li>I was only going to spend a few hours configuring it.</li>\n</ol>\n<p>With these constraints, I set off to see if I needed to revise my philosophy of editors.</p>\n<h1><a href=\"#why-emacs\" aria-hidden=\"true\" class=\"anchor\" id=\"why-emacs\"></a>Why Emacs?</h1>\n<p>Aside: why not (Helix|Neovim|Zed|something else)?\nA few reasons, in no particular order:</p>\n<ul>\n<li>I sorta know Emacs. I used it as one of my primary editors for a year or two in the early 2010s.</li>\n<li>I tried Helix for a week last year. It didn't stick; something about &quot;modal editing&quot; just does not fit with my brain.</li>\n<li>I don't mind a terminal per se, but we invented windowing systems decades before I was born and I don't understand the fascination\nwith running <em>everything</em> in a terminal (or a web browser, for that matter :P).</li>\n<li>If I'm going to go through the pain of switching, I want to be confident it'll be around and thriving in another 10 years.\nAnd it should work everywhere, including lesser known platforms like FreeBSD.</li>\n<li>If your movement keys require a QWERTY layout, I will be very annoyed.</li>\n</ul>\n<h1><a href=\"#first-impressions-3-days-in\" aria-hidden=\"true\" class=\"anchor\" id=\"first-impressions-3-days-in\"></a>First impressions (3 days in)</h1>\n<p>So, how's it going so far?\nHere are a few of the highlights.</p>\n<h2><a href=\"#lsps-have-improved-a-lot\" aria-hidden=\"true\" class=\"anchor\" id=\"lsps-have-improved-a-lot\"></a>LSPs have improved a lot!</h2>\n<p>It used to be the case that JetBrains had a dominant position in code analysis.\nThis isn't the case anymore, and most of the languages I use that would benefit from an LSP\nhave a great one available.\nThings have improved a lot, particularly in terms of Emacs integrations,\nover the past decade!\n<a href=\"https://www.gnu.org/software/emacs/manual/html_node/eglot/Eglot-Features.html\"><code>eglot</code></a> is now bundled with Emacs,\nso you don't even need to go out of your way to get some funky packages hooked up\n(like I had to with some flycheck plugin for Haskell back in the day).</p>\n<h3><a href=\"#refactoring-tools-have-also-improved\" aria-hidden=\"true\" class=\"anchor\" id=\"refactoring-tools-have-also-improved\"></a>Refactoring tools have also improved</h3>\n<p>The LSP-guided tools for refactoring have also improved a lot.\nIt used to be that only a &quot;real IDE&quot; had much better than grep and replace.\nI was happy to find that <code>eglot-rename</code> &quot;just worked&quot;.</p>\n<h3><a href=\"#docs\" aria-hidden=\"true\" class=\"anchor\" id=\"docs\"></a>Docs</h3>\n<p>I'm used to hovering my mouse over any bit of code, waiting a few seconds,\nand being greeted by a docs popover.\nThis is now possible in Emacs too with <code>eldoc</code> + your LSP.\nI added the <a href=\"https://github.com/casouri/eldoc-box\"><code>eldoc-box</code></a> plugin and configured it to my liking.</p>\n<h3><a href=\"#quick-fix-actions-work-too\" aria-hidden=\"true\" class=\"anchor\" id=\"quick-fix-actions-work-too\"></a>Quick fix actions work too!</h3>\n<p>So far, every single quick-fix action that I'm used to in RustRover\nseems to be there in the eglot integration with rust-analyzer.\nIt took me a few minutes to realize that this was called <code>eglot-code-actions</code>),\nbut once I figured that out, I was rolling.</p>\n<h2><a href=\"#jump-to-definition-works-great-but-navigation-has-caveats\" aria-hidden=\"true\" class=\"anchor\" id=\"jump-to-definition-works-great-but-navigation-has-caveats\"></a>Jump to definition works great, but navigation has caveats</h2>\n<p>I frequently use the jump-to-definition feature in IDEs.\nUsually by command+clicking.\nYou can do the same in Emacs with <code>M-.</code>, which is a bit weird, but okay.\nI picked up the muscle memory after less than an hour.\nThe weird thing though is what happens next.\nI'm used to JetBrains and most other well-designed software (<em>glares in the general direction of Apple</em>)\n&quot;just working&quot; with the forward+back buttons that many input devices have.\nEmacs did not out of the box.</p>\n<p>One thing JetBrains did fairly well was bookmarking where you were in a file, and even letting you jump back after\nnavigating to the definition or to another file.\nThis had some annoying side effects with multiple tabs, which I won't get into but it worked overall.\nIn Emacs, you can return from a definition jump with <code>M-,</code>, but there is no general navigate forward/backward concept.\nThis is where the build-your-own-lightsaber philosophy comes in I guess.\nI knew I'd hit it eventually.</p>\n<p>I tried out a package called <code>better-jumper</code> but it didn't <em>immediately</em> do what I wanted,\nso I abandoned it.\nI opted instead to simple backward and forward navigation.\nIt works alright.</p>\n<pre><code class=\"language-lisp\">(global-set-key (kbd &quot;&lt;mouse-3&gt;&quot;) #'previous-buffer)\n(global-set-key (kbd &quot;&lt;mouse-4&gt;&quot;) #'next-buffer)\n</code></pre>\n<p>Aside: I had to use <code>C-h k</code> (<code>describe-key</code>) to figure out what the mouse buttons were.\nAdvice I saw online apparently isn't universally applicable,\nand Xorg, macOS, etc. may number the buttons differently!</p>\n<h2><a href=\"#terminal-emulation-within-emacs\" aria-hidden=\"true\" class=\"anchor\" id=\"terminal-emulation-within-emacs\"></a>Terminal emulation within Emacs</h2>\n<p>The emacs <code>shell</code> mode is terrible.\nIt's particularly unusable if you're running any sort of TUI application.\nA friend recommended <a href=\"https://codeberg.org/akib/emacs-eat\"><code>eat</code></a> as an alternative.\nThis worked pretty well out of the box with most things,\nbut when I ran <code>cargo nextest</code> for the first time,\nI was shocked at how slow it was.\nMy test suite which normally runs in under a second took over 30!\nYikes.\nI believe the slowness is because it's implemented in elisp,\nwhich is still pretty slow even when native compilation is enabled.</p>\n<p>Another Emacs user recommended I try out <a href=\"https://github.com/akermu/emacs-libvterm\"><code>vterm</code></a>, so I did.\nHallelujah!\nIt's no iTerm 2, and it does have a few quirks,\nbut it's quite usable and MUCH faster.\nIt also works better with full-screen TUI apps like Claude Code.</p>\n<h2><a href=\"#claude-code-cli-is-actually-great\" aria-hidden=\"true\" class=\"anchor\" id=\"claude-code-cli-is-actually-great\"></a>Claude Code CLI is actually great</h2>\n<p>I'm not going to get into the pros and cons of LLMs in this post.\nBut if you use these tools in your work,\nI think you'll be surprised by how good the experience is with <code>vterm</code> and the <code>claude</code> CLI.\nI have been evaluating JetBrains' disjoint attempts at integrations with Junie,\nand more recently Claude Code and Codex.</p>\n<p>Junie is alright for some things.\nThe only really good thing I have to say about the product is that at least it let me select a GPT model.\nAnthropic models have been severely hampered in their ability to do anything useful in most codebases I work in,\ndue to tiny context windows.\nThat recently changed when Anthropic rolled out a 1 million token context window to certain users.</p>\n<p>JetBrains confusingly refers to Claude Code as &quot;Claude Agent&quot; and team subscriptions automatically include some monthly credits.\nEvery single JetBrains IDE will install its own separate copy of Claude Code (yay).\nBut it <em>is</em> really just shelling out to Claude Code it seems\n(it asks for your permission to download the binary.\nCodex is the same.)</p>\n<p>Given this, I assumed the experience and overall quality would be similar.\nWell, I was VERY wrong there.\nClaude Code in the terminal is far superior for a number of reasons.\nNot just access to the new model though that helps.\nYou can also configure &quot;effort&quot; (lol), and the &quot;plan&quot; mode seems to be far more sophisticated than what you get in the JetBrains IDEs.</p>\n<p>So yeah, if you're going to use these tools, just use the official app.\nIt makes sense; they have an incentive to push people to buy direct.\nAnd it so happens that Claude Code fits comfortably in my Emacs environment.</p>\n<p>More directly relevant to this post,\nLLMs (any of them really) are excellent at recommending Emacs packages and config tweaks.\nSo it's never been easier to give it a try.\nI've spent something like 2-3x longer writing this post than I did configuring Emacs.\n(And yes, before you ask, this post is 100% hand-written.)\nMy basic flow was to work, get annoyed (thats pretty easy for me),\nand describe my problem to ChatGPT or Claude.\nI am nowhere near the hours I budgeted for config fiddling.\nThat surprised me!</p>\n<h2><a href=\"#vcs-integration\" aria-hidden=\"true\" class=\"anchor\" id=\"vcs-integration\"></a>VCS integration</h2>\n<p>While I'm no stranger to hacking around with nothing more than a console,\nI really don't like the git CLI.\nI've heard jj is better, but honestly I think GUIs are pretty great most of the time.\nI will probably try magit at some point,\nbut for now I'm very happy with Sublime Merge.</p>\n<p>But one thing I MUST have in my editor is a &quot;gutter&quot; view of lines that are new/changed,\nand a way to get a quick inline diff.\nJetBrains had a great UX for this which I used daily.\nAnd for Emacs, I found something just as great: <a href=\"https://github.com/dgutov/diff-hl\"><code>diff-hl</code></a>.</p>\n<p>My config for this is very simple:</p>\n<pre><code class=\"language-lisp\">(unless (package-installed-p 'diff-hl)\n  (package-install 'diff-hl))\n(use-package diff-hl\n  :config\n  (global-diff-hl-mode))\n</code></pre>\n<p>To get a quick diff of a section that's changed,\nI use <code>diff-hl-show-chunk</code>.\nI might even like the hunk review experience here better than in JetBrains!</p>\n<h2><a href=\"#project-wide-search\" aria-hidden=\"true\" class=\"anchor\" id=\"project-wide-search\"></a>Project-wide search</h2>\n<p>I think JetBrains has the best search around with their double-shift, cmd+shift+o, and cmd-shift-f views.\nI have not yet gotten my Emacs configured to be as good.\nBut <code>C-x p g</code> (<code>project-find-regexp</code>) is pretty close.\nI'll look into other plugins later for fuzzy filename/symbol search.\nI <em>do</em> miss that.</p>\n<h2><a href=\"#run-configurations\" aria-hidden=\"true\" class=\"anchor\" id=\"run-configurations\"></a>Run configurations</h2>\n<p>The final pleasant surprise is that I don't miss JetBrains run configurations as much as I expected.\nI instead switch to putting a <a href=\"https://just.systems/man/en/introduction.html\"><code>justfile</code></a> in my repo and populating that with my run configurations\n(much of the software I work on has half a dozen switches which vary by environment).\nThis also has the side effect of cleaning up some of my CI configuration (<code>just</code> run the same thing!)\nand also serves as useful documentation to LLMs.</p>\n<h2><a href=\"#spell-checking\" aria-hidden=\"true\" class=\"anchor\" id=\"spell-checking\"></a>Spell checking</h2>\n<p>I have <a href=\"https://github.com/crate-ci/typos\"><code>typos</code></a> configured for most of my projects in CI,\nbut it drives me nuts when an editor doesn't flag typos for me.\nJetBrains did this well.\nEmacs has nothing out of the box (Zed also annoyingly doesn't ship with anything, which is really confusing to me).\nBut it's easy to add.</p>\n<p>I went with Jinx.\nThere are other options, but this one seemed pretty modern and worked without any fuss, so I stuck with it.</p>\n<h1><a href=\"#papercuts-to-solve-later\" aria-hidden=\"true\" class=\"anchor\" id=\"papercuts-to-solve-later\"></a>Papercuts to solve later</h1>\n<p>This is all a lot more positive than I was expecting to be honest!\nI am not going to cancel my JetBrains subscription tomorrow;\nthey still <em>do</em> make the best database tool I know of.\nBut I've moved all my daily editing to Emacs.</p>\n<p>That said, there are still some papercuts I need to address:</p>\n<ul>\n<li>Macro expansion. I liked that in RustRover. There's apparently a way to get this with <code>eglot-x</code> which I'll look into later.</li>\n<li>Automatic indentation doesn't work out of the box for all modes to my liking. I think I've fixed most of these but found the process confusing.</li>\n<li>Files don't reload in buffers automatically with disk changes (e.g. <code>cargo fmt</code>)!</li>\n<li>Code completion and jump to definition don't work inside rustdoc comments.</li>\n<li>RustRover used to highlight all of my <code>mut</code> variables. I would love to get that back in Emacs.</li>\n</ul>\n",
      "summary": "",
      "date_published": "2026-03-18T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "software-engineering",
        "shell"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//databases-as-an-alternative-to-application-logging.html",
      "url": "https://ianwwagner.com//databases-as-an-alternative-to-application-logging.html",
      "title": "Databases as an Alternative to Application Logging",
      "content_html": "<p>In my <a href=\"https://stadiamaps.com/\">work</a>, I've been doing a lot of ETL pipeline design recently for our geocoding system.\nThe system processes on the order of a billion records per job,\nand failures are part of the process.\nWe want to log these.</p>\n<p>Most applications start by dumping logs to <code>stderr</code>.\nUntil they overflow their terminal scrollback buffer.\nThe next step is usually text files.\nBut getting insights from 10k+ of lines of text with <code>grep</code> is a chore.\nIt may even be impossible unless you've taken extra care with how your logs are formatted.</p>\n<p>In this post we'll explore some approcahes to do application logging better.</p>\n<h1><a href=\"#structured-logging\" aria-hidden=\"true\" class=\"anchor\" id=\"structured-logging\"></a>Structured logging</h1>\n<p>My first introduction to logs with a structural element was probably Logcat for Android.\nLogcat lets you filter the fire hose of Android logs down to a specific application,\nand can even refine the scope further if you learn how to use it.\nLogcat is a useful tool, but fundamentally all it can do is <em>filter</em> logs from a stream\nand it has most of the same drawbacks as grepping plain text files.</p>\n<p>Larger systems often benefit from something like the <code>tracing</code> crate,\nwhich integrates with services like <code>journald</code> and Grafana Loki.\nThis is a great fit for a long-running <em>service</em>,\nbut is total overkill for an application that does some important stuff ™\nand exits.\nLike our ETL pipeline example.</p>\n<p>(Aside: I have a love/hate relationship with <code>journalctl</code>.\nI mostly interact with it through Ctrl+R in my shell history,\nwhich is problemating when connecting to a new server.\nBut it does have the benefit of being a nearly ubiquitous local structured logging system!)</p>\n<h1><a href=\"#databases-for-application-logs\" aria-hidden=\"true\" class=\"anchor\" id=\"databases-for-application-logs\"></a>Databases for application logs</h1>\n<p>Using a database as an application log can be a brilliant level up for many applications\nbecause you can actually <em>query</em> your logs with ease.\nI'll give a few examples, and then show some crazy cool stuff you can do with that.</p>\n<p>One type of failure we frequently encounter is metadata that looks like a URL where it shouldn't be.\nFor example, the name of a shop being <code>http://spam.example.com/</code>,\nor having a URL in an address or phone number field.\nIn this case, we usually drop the record, but we also want to log it so we can clean up the source data.\nSome other common failures are missing required fields, data in the wrong format, and the like.</p>\n<h2><a href=\"#a-good-schema-enables-analytics\" aria-hidden=\"true\" class=\"anchor\" id=\"a-good-schema-enables-analytics\"></a>A good schema enables analytics</h2>\n<p>Rather than logging these to <code>stderr</code> or some plain text files, we write to a DuckDB database.\nThis has a few benefits beyond the obvious.\nFirst, using a database forces you to come up with a schema.\nAnd just like using a language with types, this forces you to clarify your thinking a bit upfront.\nIn our case, we log things like the original data source, an ID, a log level (warn, error, info, etc.),\na failure code, and additional details.</p>\n<p>From here, we can do meaningufl <em>analytical</em> queries like\n&quot;how many records were dropped due to invalid geographic coordinates&quot;\nor &quot;how many records were rejected due to metadata mismatches&quot;\n(ex: claiming to be a US address but appearing in North Korea).</p>\n<h2><a href=\"#cross-dataset-joins-anyone\" aria-hidden=\"true\" class=\"anchor\" id=\"cross-dataset-joins-anyone\"></a>Cross-dataset joins, anyone?</h2>\n<p>If this query uncovers a lot of rejected records from one data source,\nwouldn't it be nice if we could look at a sample?\nWe have the IDs right there in the log, and the data source identifier, after all.\nBut since we're in DuckDB rather than a plain text file,\nwe can pretty much effortlessly join on the data files!\n(This assumes that your data is in some halfway sane format like JSON, CSV, Parquet, or even another database).</p>\n<p>We can even take this one step further and compare logs across imports!\nWhat's up with that spike in errors compared to last month's release from that data source?</p>\n<p>These are the sort of insights which are almost trivial to uncover when your log is a database.</p>\n<h1><a href=\"#practical-bits\" aria-hidden=\"true\" class=\"anchor\" id=\"practical-bits\"></a>Practical bits</h1>\n<p>Now that I've described all the awesome things you can do,\nlet's get down to the practical questions like how you'd do this in your app.\nMy goals for the code were to make it easy to use and impossible to get wrong at the use site.\nFortunately that's pretty easy in Rust!</p>\n<pre><code class=\"language-rust\">#[derive(Clone)]\npub struct ImportLogger {\n    pool: Pool&lt;DuckdbConnectionManager&gt;,\n    // Implementation detail for our case: we have multiple ETL importers that share code AND logs.\n    // If you have any such attributes that will remain fixed over the life of a logger instance,\n    // consider storing them as struct fields so each event is easier to log.\n    importer_name: String,\n}\n</code></pre>\n<p>Pretty standard struct setup using DuckDB and <a href=\"https://github.com/sfackler/r2d2\"><code>r2d2</code></a> for connection pooling.\nWe put this in a shared logging crate in a workspace containing multiple importers.\nThe <code>importer_name</code> is a field that will get emitted with every log,\nand doesn't change for a logger instance.\nIf your logging has any such attributes (ex: a component name),\nstoring them as struct fields makes each log invocation easier!</p>\n<div class=\"markdown-alert markdown-alert-note\">\n<p class=\"markdown-alert-title\">Note</p>\n<p>At the time of this writing, I couldn't find any async connection pool integrations for DuckDB.\nIf anyone knows of one (or wants to add it to <a href=\"https://github.com/djc/bb8\"><code>bb8</code></a>), let me know!</p>\n</div>\n<pre><code class=\"language-rust\">pub fn new(config: ImportLogConfig, importer_name: String) -&gt; anyhow::Result&lt;ImportLogger&gt; {\n    let manager = DuckdbConnectionManager::file(config.import_log_path)?;\n    let pool = Pool::new(manager)?;\n\n    pool.get()?.execute_batch(include_str!(&quot;schema.sql&quot;))?;\n\n    Ok(Self {\n        pool,\n        importer_name,\n    })\n}\n</code></pre>\n<p>The constructor isn't anything special; it sets up a DuckDB connection to a file-backed database\nbased on our configuration.\nIt also initializes the schema from a file.\nThe schema file lives in the source tree, but the lovely <a href=\"https://doc.rust-lang.org/std/macro.include_str.html\"><code>include_str!</code></a>\nmacro bakes it into a static string at compile time (so we can still distribute a single binary).</p>\n<pre><code class=\"language-rust\">pub fn log(&amp;self, level: Level, source: &amp;str, id: Option&lt;&amp;str&gt;, code: &amp;str, reason: &amp;str) {\n    log::log!(level, &quot;{code}\\t{source}\\t{id:?}\\t{reason}&quot;);\n    let conn = match self.pool.get() {\n        Ok(conn) =&gt; conn,\n        Err(e) =&gt; {\n            log::error!(&quot;failed to get connection: {}&quot;, e);\n            return;\n        }\n    };\n    match conn.execute(\n        &quot;INSERT INTO logs VALUES (current_timestamp, ?, ?, ?, ?, ?, ?)&quot;,\n        params![level.as_str(), self.importer_name, source, id, code, reason],\n    ) {\n        Ok(_) =&gt; (),\n        Err(e) =&gt; log::error!(&quot;Failed to insert log entry: {}&quot;, e),\n    }\n}\n</code></pre>\n<p>And now the meat of the logging!\nThe <code>log</code> method does what you'd expect.\nThe signature is a reflection of the schema:\nwhat you need to log, what you may optionally log, and what type of data you're logging.</p>\n<p>For our use case, we decided to additionally log via the <code>log</code> crate.\nThis way, we can see critical errors on the console to as the job is running.</p>\n<p>And that's pretty much it!\nIt took significantly more time to write this post than to actually write the code.\nSomeone could probably write a macro-based crate to generate these sorts of loggers if they had some spare time ;)</p>\n<h2><a href=\"#bonus-filter_log\" aria-hidden=\"true\" class=\"anchor\" id=\"bonus-filter_log\"></a>Bonus: <code>filter_log</code></h2>\n<p>We have a pretty common pattern in our codebase,\nwhere most operations / pipeline stages yield results,\nand we want to chain these together.\nWhen it succeeds, we pass the result on to the next stage.\nOtherwise, we want to log what went wrong.</p>\n<p>We called this <code>filter_log</code> because it usually shows up in <code>filter_map</code> over streams\nand as such yields an <code>Option&lt;T&gt;</code>.</p>\n<p>This was extremely easy to add to our logging struct,\nand saves loads of boilerplate!</p>\n<pre><code class=\"language-rust\">/// Converts a result to an option, logging the failure if the result is an `Err` variant.\npub fn filter_log&lt;T, E: Debug&gt;(\n    &amp;self,\n    level: Level,\n    source: &amp;str,\n    id: Option&lt;&amp;str&gt;,\n    code: &amp;str,\n    result: Result&lt;T, E&gt;,\n) -&gt; Option&lt;T&gt; {\n    match result {\n        Ok(result) =&gt; Some(result),\n        Err(err) =&gt; {\n            self.log(level, source, id, code, &amp;format!(&quot;{:?}&quot;, err));\n            None\n        }\n    }\n}\n</code></pre>\n<h1><a href=\"#conclusion\" aria-hidden=\"true\" class=\"anchor\" id=\"conclusion\"></a>Conclusion</h1>\n<p>The concept of logging to a database is not at all original with me\nMany enterprise services log extensively to special database tables.\nBut I think the technique is rarely applied to applications.</p>\n<p>Hopefully this post convinced you to give it a try in the next situation where it makes sense.</p>\n",
      "summary": "",
      "date_published": "2025-01-13T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "software-engineering",
        "duckdb",
        "databases",
        "rust"
      ],
      "language": "en"
    }
  ]
}