Category: Projects

  • I Built an AI Medical Research Partner — Here’s Why It’s Not a Diagnosis Bot

    I Built an AI Medical Research Partner — Here’s Why It’s Not a Diagnosis Bot

    You are the person responsible for your healthcare. Not your doctor, not your insurance company, not the hospital system — you. They all play roles, but when the results come back and the decisions have to be made, nobody else lives with the consequences.

    This is uncomfortable to say plainly, because the immediate follow-up is also true: you are not the expert. You didn’t go to medical school. You don’t read imaging. You can’t interpret your own bloodwork with the confidence of someone who’s seen ten thousand panels. The gap between “this is my responsibility” and “I am not qualified to make these decisions alone” is where most people get stuck — and where most of the bad information on the internet lives, waiting to fill the vacuum.

    I built something to help with that gap. Not to close it — nobody’s closing it with a config file — but to make it easier to stand in.

    The skill is open source: Medical Research Thinking Partner on GitHub


    The Problem With “Just Google It”

    The information exists. The literacy doesn’t.

    Search for a symptom online and you’ll get a results page that casually spans the range from “this is nothing” to “you might die.” That’s not because the sources are wrong. It’s because serious conditions are disproportionately represented in medical literature relative to how often they actually occur. The internet doesn’t know your pre-test probability. It just shows you everything. It’s gotten worse, not better — a Guardian investigation in January 2026 found Google’s AI Overviews returning inaccurate and potentially harmful health information at the top of search results.

    Health headlines are worse. “New drug reduces heart attack risk by 50%” is technically accurate and almost completely misleading. That 50% is a relative risk reduction. If your baseline risk was 2%, the drug brought it to 1% — an absolute benefit of one percentage point. One person in a hundred benefits. The other ninety-nine took a drug with side effects for nothing. But “Drug provides 1% absolute risk reduction” doesn’t generate clicks.

    The tools that already exist in this space — symptom checkers, chatbot triage systems, “ask an AI doctor” products — are mostly solving the wrong problem. A February 2026 study from Mount Sinai found that LLMs can amplify medical misinformation when used without safeguards — they’re agreeable by design, and agreeableness is dangerous when the user’s premise is wrong. These tools are trying to be the expert. They want to take your symptoms and hand you a diagnosis or a triage recommendation. That’s probably not what most people need. What most people need is the ability to evaluate the information they’re already drowning in and translate it into better conversations with the actual experts. There may be a future where technology can navigate all this nuance, but evidence suggests that we aren’t there yet.

    What I Actually Built

    I built a Claude Code skill — a plain-text instruction file that loads structured frameworks into an AI conversation. It’s not an app. It’s not even code, exactly. It’s roughly 600 lines of English across five files that tell Claude how to think about medical research when I bring it a question.

    For readers unfamiliar with Claude Code: it’s Anthropic’s CLI tool for working with Claude. “Skills” are instruction files you drop into a directory that activate when relevant — they load context and behavioral rules into the conversation. Think of them as a role description with reference materials attached.

    The skill has one orchestrator file (SKILL.md) and four reference documents: an evidence hierarchy guide, a medical statistics primer, a source routing table, and an appointment prep scaffold. When I invoke the skill, it identifies what mode I’m operating in — new diagnosis, chronic condition, treatment research, appointment prep, interpreting results — and applies the relevant frameworks.

    The entire thing is a set of instructions — all of it is on GitHub. There’s no supporting code, no API calls, no database. It works because the frameworks themselves are the value — PICO question framing, evidence hierarchy evaluation, statistics translation, cognitive bias flags. Technically, you could simply read the files and apply them yourself. But simply is doing a lot of heavy lifting there. Most of us don’t have the cognitive stamina to load and apply a framework like this when the material we’re reviewing is itself incredibly taxing. Therefore, the AI is the delivery mechanism for structured thinking that already exists in evidence-based medicine. It just isn’t typically accessible to non-clinicians.

    The Design Philosophy: Thinking Partner, Not Diagnostician

    The most important design decision was the first one: the skill never attempts diagnosis or treatment recommendation. Not because of liability concerns — though those are real — but because that’s genuinely the wrong use of the tool. An AI doesn’t have your labs, your imaging, your clinical history, or the ability to perform a physical exam. It has no longitudinal1Longitudinal means tracking the same thing over time instead of looking at a single snapshot. In research, this means following a group of people across years or decades to see how patterns emerge — what causes disease, what prevents it, what changes. At the individual level, it means the same principle applied to you: your doctor tracking your blood pressure, A1C, or cholesterol across visits to spot trends that a single reading would miss. Both scales depend on the same insight — one measurement tells you where you are, but a series of measurements tells you where you’re heading. For more: longitudinal research studies | longitudinal patient data in clinical care relationship with you. Pretending otherwise isn’t just irresponsible; it produces worse outcomes than doing nothing, because it generates false confidence.

    What the skill does instead is teach you how to evaluate evidence. When you bring it a question — “my doctor wants to start me on a statin, is that right for me?” — it doesn’t answer the question. It reframes it using PICO:

    • P (Patient/Problem): What’s your specific cardiovascular risk profile? Age, sex, existing conditions, family history.
    • I (Intervention): Which statin, at what dose?
    • C (Comparison): Compared to what — lifestyle changes alone? A different medication? Watchful waiting?
    • O (Outcome): What matters to you? Reducing heart attack risk? Avoiding side effects? Both, weighted how?

    A vague question produces vague answers. PICO produces a question you can actually research — and, more importantly, a question you can bring to your doctor that invites a specific, useful response.

    This is where the tension between self-advocacy and inexpertise matters most. You are responsible for understanding your own care well enough to participate in decisions about it. You are not responsible for knowing more than your cardiologist about lipid management. The skill is designed to live in that gap — to help you become a better-informed participant without pretending you’re a peer reviewer.

    A Session, Annotated

    Here’s what using the skill actually looks like. The scenario is fabricated but realistic — the kind of situation that sends most people straight to Google at 2 AM.

    The setup: I’ve just gotten blood work back. My A1C came back at 6.8% — just over the Type 2 diabetes threshold. My doctor mentioned starting metformin and scheduled a follow-up for next week. I don’t know what any of this means.

    I invoke the skill and tell it what happened.

    Me: Just got blood work back. A1C is 6.8%. My doctor says that’s Type 2 diabetes and wants to start me on metformin. I’m seeing him next week to discuss. I don’t really understand what any of this means or whether metformin is the right call.

    Claude: This is a thinking-partner conversation to help you understand information and ask better questions — not medical advice. Your doctor has access to your full clinical picture.

    Before I dig into this, I want to make sure I’m helping with the right things. A few questions:

    1. Is this your first time hearing “diabetes” in connection with your own health, or has prediabetes come up before?
    2. Do you have any other conditions your doctor manages — high blood pressure, high cholesterol, anything else?
    3. What matters most to you right now — understanding the diagnosis itself, evaluating whether metformin is right, prepping for next week’s appointment, or all of the above?

    This is the first thing that surprised me. I expected an info dump about diabetes. Instead, it asked three targeted questions. Each one matters: question 1 determines how much baseline explanation I need, question 2 affects which evidence is relevant (comorbidities change the risk calculus), and question 3 lets me steer. It’s triaging before it teaches.

    I tell it this is brand new, no other conditions, and I want all three. It starts with what the numbers mean, then flags something I wouldn’t have thought to ask about:

    Claude: A1C measures your average blood sugar over the past 2–3 months. The thresholds: below 5.7% is normal, 5.7–6.4% is prediabetes, 6.5% and above is Type 2 diabetes. You’re at 6.8% — past the threshold, but early-stage.

    One thing worth knowing: the ADA recommends confirming a diabetes diagnosis with a repeat test in asymptomatic patients. Did your doctor mention a confirmation test, or was this treated as definitive?

    A question, not a conclusion. It caught something that’s in the clinical guidelines but doesn’t typically make it into a fifteen-minute appointment. I didn’t know to ask about confirmation testing. Now I do.

    The conversation continues. Eventually I ask the big question:

    Me: So is metformin the right move?

    Claude: Let’s frame that more precisely. Using the PICO framework:

    • P (Patient): Newly diagnosed T2D, A1C 6.8%, no comorbidities
    • I (Intervention): Metformin, likely 500mg starting dose
    • C (Comparison): Lifestyle intervention alone — diet, exercise, weight management
    • O (Outcome): What matters to you? Reducing A1C? Avoiding long-term complications — kidney, eye, nerve damage? Minimizing side effects? Some combination?

    “Should I take metformin?” is actually several different questions depending on which outcome you prioritize. Which of those matters most to you?

    There’s the PICO reframe in practice. My vague question — “is metformin right?” — became a structured, answerable question. And it forced me to articulate what I actually care about, which is something I hadn’t done.

    I chose avoiding long-term complications. The skill walked through the evidence — the UKPDS trial, the landmark metformin study — and then translated the statistics:

    Claude: The UKPDS showed metformin reduced the risk of any diabetes-related endpoint by about 32% in overweight patients. But that’s the relative number. In absolute terms, over 10 years, the Number Needed to Treat was roughly 14 — meaning for every 14 people treated, one avoids a complication they otherwise would have had. An NNT of 14 is strong. That’s better than many widely prescribed preventive medications.

    But here’s what’s worth noting for your specific situation: there’s evidence that intensive lifestyle intervention — from the Diabetes Prevention Program trial — can reduce progression by 58% in prediabetes and early diabetes. Your doctor may be recommending metformin and lifestyle changes, not instead of them. That’s a question worth asking next week.

    Two things happened here that wouldn’t happen in a Google search. First, it translated a relative risk reduction into an NNT and told me whether that number is good or bad — context the raw study doesn’t provide for a lay reader. Second, it caught a false binary in my thinking (metformin OR lifestyle) and reframed it (probably both). That’s the cognitive trap flagging working in real time.

    By the end of the session, it generated an appointment prep list:

    Claude: Your prioritized question list for next week. These three must get answered:

    1. “Should we confirm the diagnosis with a second A1C, given that I’m asymptomatic?” — The ADA recommends confirmation. This is a legitimate clinical question, not a challenge.
    2. “Are you recommending metformin alongside lifestyle changes, or instead of them? What’s the target A1C, and when do we reassess?” — Establishes measurable goals and a timeline.
    3. “What should I be monitoring at home, and what would prompt me to call before the next visit?” — Your safety net.

    If time allows: What are metformin’s common side effects and how long do they last? Should I see an endocrinologist, or is this manageable in primary care at this stage?

    Bring your full medication list including supplements. If you can, bring someone with you to take notes.

    I went from “I don’t understand any of this” to a ranked question list, a framework for evaluating the evidence my doctor cites, and a clearer picture of what my numbers actually mean. The entire conversation took about twenty minutes. None of it replaced the appointment. All of it made the appointment more useful.


    The Statistics Problem

    Illustration of a skeptical person reading a newspaper at a cafe with a thought bubble showing a pie chart

    This section is worth reading even if you never touch the skill. It might be the most practically useful thing in this post.

    Medical statistics are routinely presented in ways that mislead non-experts. This isn’t always intentional — relative risk is a legitimate statistical measure — but the effect is the same: people make decisions based on numbers that sound more dramatic than they are. And, unfortunately, this isn’t always unintentional either.

    The headline test: When you see a health claim with a percentage, run it through this filter.

    A drug “reduces your risk of heart attack by 50%.” Step one: is that relative or absolute? Almost always relative. Step two: what’s the baseline rate? Say your ten-year risk of a heart attack is 2%. Step three: compute the absolute reduction. 2% times 50% equals 1 percentage point — your risk goes from 2% to 1%. Step four: compute the Number Needed to Treat (NNT). One divided by 0.01 equals 100. One hundred people take this drug for ten years. One of them avoids a heart attack. Ninety-nine took a drug with potential side effects — muscle pain, liver enzyme elevation, diabetes risk — for no personal benefit.

    That doesn’t mean the drug is bad. At a population level, an NNT of 100 for a serious outcome is meaningful. But it does mean the decision is more nuanced than “50% reduction” suggests, and it means the side effect profile matters a lot more than the headline implies.

    The skill’s statistics primer includes a rough benchmark table:

    NNTInterpretation
    2-5Excellent — strong individual benefit
    10-20Good for serious conditions
    50-100Modest — worth scrutiny for the individual
    200+Low — carefully weigh against side effects

    There’s a companion concept: Number Needed to Harm (NNH). If the NNT is 50 and the NNH is 30, the drug harms more people than it helps. Comparing both numbers before forming an opinion is essential, and almost nobody does it — because almost nobody is taught to. (If you want to look up pre-computed NNTs for common treatments, TheNNT.com is a free resource.)

    The skill also flags cognitive traps: surrogate endpoints (a drug that improves a lab marker but doesn’t improve actual outcomes), publication bias (positive studies are published more than negative ones, so the literature systematically overstates benefits — Cochrane systematic reviews are one of the few sources that actively seek unpublished data to counter this), and the correlation-causation conflation that plagues observational studies. These aren’t exotic epistemological concerns. They’re the basic mechanics of how medical evidence works, and most patients — most people — have never encountered them.

    The Humans in the System

    Here’s another thing that’s uncomfortable to say plainly: every person in the healthcare system — your doctor, the specialist, the nurse, the insurance reviewer, the hospital administrator — is a human being with biases and incentive structures that may not align perfectly with your best outcome.

    This is not a conspiracy theory. It’s not even a criticism. It’s just how systems made of humans work. A surgeon’s training and livelihood are oriented around surgery; they may be more likely to recommend it than a non-surgical specialist would. An insurance company’s incentive is to manage costs; they may deny a treatment that’s clinically appropriate. A busy primary care physician with a fifteen-minute appointment slot may default to the most common recommendation rather than the most tailored one. A pharmaceutical rep’s job is to present their product favorably. None of these people are villains. They’re professionals operating within systems that create predictable biases.

    Treating this as true is essential. Treating it dispassionately is equally essential.

    The productive response isn’t suspicion — it’s structured skepticism. Ask your doctor why they’re recommending Treatment A over Treatment B. Ask what the evidence base is. Ask whether industry guidelines align with the latest meta-analyses. Do it with genuine curiosity, not accusation. “I want to make sure I understand the reasoning” is a collaboration. “I think you’re biased” is a fight.

    The skill is designed to help you arrive at that collaboration. The evidence hierarchy exists so you can evaluate the quality of what’s being cited. The statistics primer exists so you can understand the magnitude of what’s being claimed. The appointment prep scaffold exists so you can structure the conversation to get the most out of limited time. None of it is adversarial. All of it is about being a more effective participant in a system that, for all its flaws, is still staffed by people who overwhelmingly went into medicine to help.

    Bringing Research to Your Doctor

    The appointment prep framework is where the theoretical becomes practical. You’ve done the research. You’ve framed questions with PICO. You’ve computed some NNTs. Now you’re sitting in the exam room with ten minutes. How do you use this without coming across as the patient who “did their own research”?

    The skill’s framing advice is simple: present findings as an invitation to collaborate, not a challenge to authority.

    Instead of “I read that Drug X is better than what you prescribed,” try: “I came across a comparison between Drug X and what you’ve recommended. I’d love your perspective on whether that’s relevant to my situation.”

    The first version puts the doctor on the defensive. The second positions you as engaged and informed — exactly what you are — while acknowledging that they have clinical context you don’t. Most physicians respond well to patients who bring questions rather than conclusions.

    The skill also teaches the teach-back method: after your doctor explains something important, restate it in your own words and ask if you’ve got it right. “So if I understand correctly, what you’re saying is [your version] — is that right?” This catches misunderstandings in the room instead of at home three hours later when you’re trying to remember what they said.

    Other practical pieces: bring a ranked question list (you have three to five questions that must get answered, everything else is a bonus), bring a medication list (all of them, including supplements), bring someone else if you can (they’ll remember things you won’t and ask questions you’re too anxious to raise), and take notes or ask to record.

    What It Can’t Do

    The skill can’t examine you. It can’t order labs. It can’t see your imaging. It doesn’t know your full medical history, your family history in context, or the clinical gestalt that an experienced physician develops over years of practice. It has no longitudinal2Longitudinal means tracking the same thing over time instead of looking at a single snapshot. In research, this means following a group of people across years or decades to see how patterns emerge — what causes disease, what prevents it, what changes. At the individual level, it means the same principle applied to you: your doctor tracking your blood pressure, A1C, or cholesterol across visits to spot trends that a single reading would miss. Both scales depend on the same insight — one measurement tells you where you are, but a series of measurements tells you where you’re heading. For more: longitudinal research studies | longitudinal patient data in clinical care relationship with you — it doesn’t remember last year’s labs or notice that your weight has been trending in a direction that matters.

    It also can’t navigate the emotional weight of medical decisions. It can help you frame a question about whether to pursue chemotherapy, but it can’t sit with you while you decide. That’s what your care team, your family, and your own judgment are for.

    These aren’t limitations to apologize for. They’re the boundaries that make the tool honest. The skill is designed to make you better at one specific thing: evaluating medical information and translating it into productive action. It does that by loading frameworks that already exist in evidence-based medicine — PICO, evidence hierarchies, NNT analysis, structured appointment prep — and making them accessible in a conversation.

    The name of this blog is “I Don’t Know Anything.” That applies here more than anywhere else I’ve written about. I’m not a clinician. I built a tool that helps me think more clearly about medical information because I needed one, and because the frameworks it uses are well-established and freely available — just not widely taught to the people who need them most.

    The goal isn’t to become your own doctor. It’s to minimize being a passive recipient of information you can’t evaluate. It’s to minimize being exclusively guided by systems and processes that have as much influence from profit motive as they do from interest in your well-being. The evidence hierarchy exists. NNT exists. PICO exists. Most people just never encounter them until they’re sitting in an exam room, overwhelmed, nodding along to a recommendation they don’t fully understand.

    If a config file and a structured conversation can close that gap even slightly — can turn “I don’t understand any of this” into “I have three specific questions about this” — that feels like it was worth building.

    • 1
      Longitudinal means tracking the same thing over time instead of looking at a single snapshot. In research, this means following a group of people across years or decades to see how patterns emerge — what causes disease, what prevents it, what changes. At the individual level, it means the same principle applied to you: your doctor tracking your blood pressure, A1C, or cholesterol across visits to spot trends that a single reading would miss. Both scales depend on the same insight — one measurement tells you where you are, but a series of measurements tells you where you’re heading. For more: longitudinal research studies | longitudinal patient data in clinical care
    • 2
      Longitudinal means tracking the same thing over time instead of looking at a single snapshot. In research, this means following a group of people across years or decades to see how patterns emerge — what causes disease, what prevents it, what changes. At the individual level, it means the same principle applied to you: your doctor tracking your blood pressure, A1C, or cholesterol across visits to spot trends that a single reading would miss. Both scales depend on the same insight — one measurement tells you where you are, but a series of measurements tells you where you’re heading. For more: longitudinal research studies | longitudinal patient data in clinical care
  • The Router Backup API Synology Didn’t Document

    The Router Backup API Synology Didn’t Document

    Update (2026-03-04): After this post was shared on Reddit, a commenter pointed out a simpler CLI-based alternative that also works on SRM. See the update below.

    My router has a backup feature. It lives in the control panel under Backup & Restore — click Export, get a .dss file containing your network settings, firewall rules, DHCP reservations, DNS zones, WiFi configuration, and mesh topology. Everything you’d need to rebuild from scratch.

    If you’re here for the working code, skip to the Implementation Reference at the end.

    The problem: it’s manual-only. No scheduler. No “run this at 3am every morning.” The device managing my entire home network had no automated config backup, and it had been on my to-do list long enough that I’d stopped noticing it.

    What followed was a two-session investigation into an API that exists but isn’t documented, an authentication mechanism that the documentation actively misrepresents, and a use case where AI collaboration was the thing that made the whole effort practical.


    The API Exists

    Synology’s SRM — the operating system running on the RT6600ax — shares its DNA with DSM, their NAS OS. DSM exposes a REST API that powers its own web UI, and SRM inherits a version of it. I knew this API existed. The question was whether the backup functionality was reachable through it.

    Querying SYNO.API.Info — an unauthenticated endpoint that returns every available API on the device — confirmed it was:

    SYNO.Backup.Config.Backup  →  entry.cgi  (versions 1–1)

    The endpoint existed. So far so good.


    The Documentation Stops There

    Here’s where things get interesting. Synology publishes a developer guide for DSM — 223 pages covering authentication, session management, API structure, and dozens of specific namespaces. For SRM, there’s no equivalent. The router OS running in millions of homes has no public developer documentation.1Synology’s DSM Developer Guide (version 7, 2022) runs to 223 pages covering authentication, session management, API versioning, and namespace documentation. No equivalent document exists for SRM. Synology’s public knowledge base contains no developer-facing API reference for SRM 1.x.

    The official authentication guide describes a login flow: submit your username and password to SYNO.API.Auth, get a session ID back, use it for subsequent calls. Straightforward. That is the documentation.2Synology’s DSM Login Web API Guide describes authentication as: submit account and passwd to SYNO.API.Auth via HTTP/HTTPS, receive a session ID. Client-side password encryption is not mentioned. HTTPS is referenced as the security mechanism; there is no mention of application-layer encryption of credentials.

    It is also wrong — or at least, incomplete to the point of being useless on SRM.


    The Auth Wall

    We started testing with curl. The backup service account had admin group membership in SRM. The request was well-formed. The response was error 402: permission denied.

    Tried the built-in admin account. Error 400: bad credentials — except the web UI accepted the same credentials without complaint.

    Over the next stretch of the session, Claude and I systematically eliminated every reasonable variable: session name (Core, SRM, RouterManagement, SurveillanceStation), HTTP vs HTTPS, port 8000 vs 8001, API version 1 vs 3, GET vs POST with URL encoding. Every combination returned 400 for the admin account while the browser accepted the same credentials without issue.

    The SRM security log offered one useful clue: every successful UI login appeared as admin:. Every API attempt appeared as SYSTEM:. The router was rejecting API auth at a system level, before credential validation even happened.

    We checked AutoBlock (added source IPs to the allow list — no effect). We looked at the “Allow external access to SRM” setting (WAN-only, completely irrelevant to LAN API calls). Both were dead ends.


    Finding the Real Problem

    With every obvious explanation exhausted, Claude queried the API info endpoint for anything encryption or token related. One name in the results stood out:

    SYNO.API.Encryption

    That single API name explained everything.

    SRM’s web UI encrypts passwords client-side using RSA before submitting them. The server only accepts encrypted credentials. The official documentation describing plaintext authentication is DSM documentation — and while DSM supports plaintext credentials over HTTPS, SRM does not. There is no published note about this divergence anywhere.3The existence of SYNO.API.Encryption is discoverable only by querying the SYNO.API.Info endpoint on the device itself — it does not appear in any Synology developer documentation, knowledge base article, or official community resource. The divergence between DSM (plaintext over HTTPS) and SRM (RSA-encrypted credentials) is nowhere documented.

    Fetching the encryption endpoint returned a 4096-bit RSA public key, a cipher field name, and a server timestamp. The browser fetches this on every login, encrypts your password with PKCS1v15, and sends ciphertext instead of plaintext. We’d been sending plaintext the whole time.


    The RSA Rabbit Hole

    We implemented the RSA flow manually using Python’s cryptography library: fetch the public key, encrypt the password, submit the ciphertext in the documented field. Still error 400.

    Something was still missing. Claude pivoted to the synology-api Python library — a community project wrapping Synology’s APIs — and traced through its authentication module. What it revealed were two parameters appearing nowhere in any documentation: dsm_version=3 and session=webui.4The synology-api Python library implements ~300 Synology APIs. Its README and supported API list do not mention SRM support, the dsm_version=3 parameter required for SRM, or session=webui as the correct session name. SYNO.Backup.Config.Backup is not among the library’s listed APIs. The working parameter combination was determined through library source code, not documentation.

    DSM uses dsm_version=7. SRM uses dsm_version=3. The session name webui — not Core, not SRM, not any of the names we’d tried — is what SRM expects. Neither is documented for SRM anywhere. The values were found in library source code, not in any published reference.

    With those two parameters, authentication succeeded.


    The Backup Flow

    Once past the auth wall, SYNO.Backup.Config.Backup worked as advertised — a clean three-step async flow:

    1. Start — call with method=start. The router begins generating the archive and returns a task_id.
    2. Poll — call with method=status and the task_id until completion is confirmed.
    3. Download — call with method=download. The router returns the .dss archive.

    The output is a file named SynologyRouter_YYYYMMDD.dss, roughly 91KB, containing the complete router configuration — network settings, firewall rules, DHCP reservations, DNS zones, WiFi credentials, and mesh AP config.


    What This Would Have Looked Like Without Claude

    This is the honest part of the post.

    The technical problem here isn’t exotic. API investigation is a standard skill. The individual steps — probe the info endpoint, test auth variations, find the encryption layer, locate a community library — are each things a developer could work through independently.

    But the scope would have made it impractical as a spare-evening project. Systematically testing a dozen authentication variations, recognizing SYNO.API.Encryption in a list of 100+ API names as the diagnostic key, implementing RSA in Python on the fly, debugging why the manual RSA implementation still failed, and tracing through library source code to identify two undocumented parameters — each step depends on full context from all the previous steps. Losing that thread between evenings, or spending an hour reconstructing it each time, is exactly what turns “automatable” into “perpetually on the list.”

    Claude held the full investigation context across the session, made the lateral connection to SYNO.API.Encryption, knew to reach for the synology-api library when the manual implementation stalled, and identified the undocumented dsm_version and session parameters by reading the library source. No single step was magic. The value was continuity — a collaborator that didn’t lose the thread.

    The router config backup now runs at 3:00 AM every night. Each .dss file lands in /mnt/user/backups/router/, where it’s swept into the existing nightly rsync to a secondary NAS. Thirty days of retention at ~91KB per file. If the router ever needs to be rebuilt, the config is there.

    The gap that existed because the documentation didn’t exist is closed.


    Appendix: Implementation Reference

    Enough detail to replicate this setup without AI assistance.

    Dependencies

    pip install synology-api cryptography requests

    On Unraid, install to a persistent path (the root filesystem is tmpfs and doesn’t survive reboots):

    pip3 install --target=/mnt/user/scripts/router-backup/lib \
      synology-api cryptography requests

    Authentication

    import sys, os
    sys.path.insert(0, '/mnt/user/scripts/router-backup/lib')
    
    from synology_api import base_api
    
    session = base_api.BaseApi(
        ip_address='192.168.1.1',
        port=8000,
        username='backup',           # must be in admin group in SRM
        password=os.environ['ROUTER_BACKUP'],
        secure=False,
        cert_verify=False,
        dsm_version=3,               # SRM requires 3, not 6 or 7
        debug=False,
        otp_code=None
    )
    session.login('webui')           # 'webui' is required, other names return 402

    Key parameters undocumented for SRM:

    ParameterSRM valueDSM valueEffect of wrong value
    dsm_version36 or 7Error 400
    session name'webui''Core'Error 402

    The service account must be a member of the admin group in SRM Control Panel → User.

    Backup API Flow

    import time, datetime, requests
    
    WEBAPI = 'http://192.168.1.1:8000/webapi/entry.cgi'
    sid = session.session_id
    
    # Step 1: Start
    resp = requests.post(WEBAPI, data={
        'api': 'SYNO.Backup.Config.Backup',
        'version': '1',
        'method': 'start',
        '_sid': sid,
    })
    task_id = resp.json()['data']['task_id']
    
    # Step 2: Poll
    while True:
        status = requests.post(WEBAPI, data={
            'api': 'SYNO.Backup.Config.Backup',
            'version': '1',
            'method': 'status',
            'task_id': task_id,
            '_sid': sid,
        }).json()
        if status['data']['state'] == 'finish':
            break
        time.sleep(2)
    
    # Step 3: Download
    response = requests.post(WEBAPI, data={
        'api': 'SYNO.Backup.Config.Backup',
        'version': '1',
        'method': 'download',
        'task_id': task_id,
        '_sid': sid,
    }, stream=True)
    
    filename = f"SynologyRouter_{datetime.date.today().strftime('%Y%m%d')}.dss"
    with open(f'/mnt/user/backups/router/{filename}', 'wb') as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)

    Output and Restoration

    The .dss file is a binary archive (~91KB for a moderately configured RT6600ax). Restore via SRM → Control Panel → Backup & Restore → Restore. Contents: network interfaces, DHCP reservations, firewall rules, DNS zones, WiFi SSIDs and credentials, mesh AP configuration.

    Deployment

    ItemValue
    Script location/mnt/user/scripts/router-backup/
    Dependencies/mnt/user/scripts/router-backup/lib/
    Credentials.env in script dir, chmod 600
    Schedule0 3 * * * via Unraid User Scripts
    Output path/mnt/user/backups/router/
    Retention30 most recent files (~2.7MB total)
    Backup chainSwept into Tier 4 rsync to NAS2 at 5:30 AM

    Update — 2026-03-04

    After sharing this post on r/synology, DaveR007 pointed out that Synology ships a CLI tool called synoconfbkp that can export the same .dss configuration file — and that it exists on SRM, not just DSM. I SSH’d into my RT6600ax to verify, and he’s right:

    backup@SynologyRouter:/$ ls -la /usr/syno/bin/synoconfbkp
    -rwxr-xr-x    1 root     root        554881 Nov  4  2022 /usr/syno/bin/synoconfbkp

    The binary works on SRM and produces a standard .dss backup:

    sudo /usr/syno/bin/synoconfbkp export --filepath=/tmp/backup.dss

    It supports the same config categories you’d expect, plus SRM-specific ones like wifi, mesh, and router_cp. DaveR007 maintains a wrapper script that adds timestamped filenames, hostname inclusion, and optional SCP to remote shares: Synology_Config_Backup on GitHub.

    How the two approaches compare:

    The CLI method is simpler. If you have SSH access to the device, synoconfbkp does the job in a single command with no authentication gymnastics. You can schedule it with Synology’s built-in Task Scheduler and the backup runs entirely on the device itself.

    The API method described in this post runs remotely — it doesn’t require SSH to be enabled on the router, and the backup is initiated and downloaded from another machine on the network. That’s a meaningful distinction: SSH access on SRM grants sudo to any user in the admin group (there’s no granular privilege separation), so enabling SSH for automated backups means any compromised admin credential has full root-equivalent access to the device. The API uses the same admin account, but the channel is narrower — an API session can only reach what Synology built HTTP handlers for, with no path to arbitrary command execution.

    Both produce the same .dss file. Both are valid. Which one fits depends on what you’re comfortable leaving enabled and where you want the automation to live.

    Thanks to DaveR007 for the tip and for sharing his work.

    • 1
      Synology’s DSM Developer Guide (version 7, 2022) runs to 223 pages covering authentication, session management, API versioning, and namespace documentation. No equivalent document exists for SRM. Synology’s public knowledge base contains no developer-facing API reference for SRM 1.x.
    • 2
      Synology’s DSM Login Web API Guide describes authentication as: submit account and passwd to SYNO.API.Auth via HTTP/HTTPS, receive a session ID. Client-side password encryption is not mentioned. HTTPS is referenced as the security mechanism; there is no mention of application-layer encryption of credentials.
    • 3
      The existence of SYNO.API.Encryption is discoverable only by querying the SYNO.API.Info endpoint on the device itself — it does not appear in any Synology developer documentation, knowledge base article, or official community resource. The divergence between DSM (plaintext over HTTPS) and SRM (RSA-encrypted credentials) is nowhere documented.
    • 4
      The synology-api Python library implements ~300 Synology APIs. Its README and supported API list do not mention SRM support, the dsm_version=3 parameter required for SRM, or session=webui as the correct session name. SYNO.Backup.Config.Backup is not among the library’s listed APIs. The working parameter combination was determined through library source code, not documentation.
  • The AI Architecture Tournament — Three Rounds and Eight Contested Decisions

    The AI Architecture Tournament — Three Rounds and Eight Contested Decisions

    Part 2 of 2. Part 1 covers the motivations, hardware, and requirements.


    Round One: Six Prompts, Six Blueprints

    I fed the requirements document to five AI systems: Claude, ChatGPT, Grok, Lumo, and Meta AI. Each received the same document. The prompt was simple:

    Prepare Architecture recommendations in a markdown document.

    The output was six architecture proposals, ranging from terse (MetaAI with 536 words) to torrential (Claude with 4,255 words). Reading them back to back was instructive. There was significant consensus on the fundamentals – Proxmox VE was recommended by almost every system, ZFS for storage, Docker Compose as the service layer – but the disagreements were where it got interesting.

    One system suggested running k3s across three machines spanning Sandy Bridge, Kaby Lake, and Raptor Lake architecture. Three CPU generations, three GPU generations, heterogeneous storage, aging SATA. (No.) One suggested I flip which machine is primary in a way that would have dedicated my i9-14900K/RTX 4080 Super to light transcription duties while the i7-7700K ran primary AI inference. (Also no.) One documented a recovery point objective of zero, which felt more like aspiration than engineering.

    The disagreements were the interesting part. Agreement is cheap. Disagreement forces you to actually think.


    Round Two: Three Comparative Analyses

    I took the six responses and fed them – all of them, together – to three different systems (Meta, Lumo and Claude) and asked each to compare, rank, and critique. What did they agree on? Where were the meaningful differences? Which recommendations were well-reasoned versus well-marketed?

    This round was useful in a specific way: it surfaced the shape of the disagreements. Not just “System A said X and System B said Y,” but why those differences existed and what they revealed about each system’s underlying assumptions. Systems that had been confidently wrong in Round One looked different in aggregate than systems that had been thoughtfully uncertain.


    Round Three: The Synthesis

    For the final round, I sat down with Claude to write a synthesis – a document that takes everything useful from all six responses, resolves every contested decision with explicit reasoning, and throws out anything over-engineered for this scale. Whether that last point was effectively navigated I’ll leave as an exercise for the reader.

    At this stage I also shifted emphasis on human vs AI and asked Claude to imagine implementing and operating the entire stack with little to no interaction from me. That framing shaped the output significantly.

    The resulting document – Homelab Architecture Synthesis: Claude-Implementable Design – is sitting at version 1.0 in my research folder. It runs to about sixty kilobytes of markdown, which means it’s either comprehensive or I have a problem. Plausibly both.


    Where the Contested Decisions Landed

    Eight decisions diverged meaningfully across the six responses. Here’s where the synthesis came out:

    Proxmox VE. Almost unanimous, and correct. FOSS, first-class ZFS, LXC containers with GPU passthrough, a purpose-built backup server. Unraid has been fine – but, to my sorrow and somewhat unforgivably, it expects to operate as root. That’s a hard thing to build an agentic management model on top of.

    Machine roles. My desktop – the i9-14900K/RTX 4080 Super – becomes the primary production host. My existing server becomes secondary production and storage. The Sandy Bridge box gets a quiet semi-retirement running AdGuard Home, Uptime Kuma, and dev instances of services. Things where “14-year-old hardware” genuinely doesn’t matter. RIP ultimate resolution on new-release video games.

    Terraform + Ansible + Git-controlled Compose files as the IaC stack. This is the decision I’m most excited about. Right now, if my Unraid server died, I’d be reconstructing container configurations from memory and half-remembered XML templates. With this stack, recovery is a `terraform apply`.

    SOPS + Age for secrets management. Encrypted in git. No plaintext credentials in compose files. (Yes, I currently have a Forgejo database running with the password `changeme`. That’s on the list. It’s been on the list.)

    Caddy as the reverse proxy. No more bare port numbers on every service URL. Finally becoming an adult.

    RPO honesty. Most systems told me my recovery point objective would be zero. One said it would be fifteen minutes and here’s why. The honest answer was more useful, even though it was less impressive. ZFS snapshots every fifteen minutes get you to RPO ? 15 minutes. Critical databases get WAL archiving to approach true zero. Document what you can actually deliver.


    Claude’s Implementation Model

    The section I’m most excited and scared of.

    None of the Round One responses had an opportunity to address agentic administration – it wasn’t an explicit requirement, and no system volunteered it. The synthesis adds a layer no other document addressed: a defined model for how Claude operates on the infrastructure. What network access it has (WireGuard peer). What credentials (SSH keys, Proxmox API, Forgejo API). What it can do autonomously versus what requires my approval. What it is explicitly never allowed to do without human confirmation. How it responds to Alertmanager incidents – fetching a runbook, executing the procedure, reporting back.

    The whole thing is designed to be end-to-end manageable by either a human or an AI agent. That discipline turns out to improve the infrastructure design regardless of whether the AI ever actually runs it.


    Phase -1

    The synthesis has a four-phase implementation roadmap, starting with installing Proxmox on physical hardware, which requires hands and scheduling. But I’m not in Phase 0 yet.

    A lot is riding on runbooks that aren’t written. We need a disaster recovery plan for when the AI is unavailable or gets fired. A consistent infrastructure idiom for nomenclature and design choices across disparate surfaces. Baked-in patterns for knowledge reinvestment – making sure the system gets smarter from operating, not dumber.

    I have in mind four core skill documents – shared DNA, different mindsets appropriate to each phase: implementation, management, refinement, dev/testing. That’s where the principles, guardrails, and operating posture get encoded. That work comes before Phase 0.

    So: a Phase -1, perhaps?


    What Running the Tournament Taught Me

    The majority is right more often than any individual. On every major decision where four or five systems agreed, they were right. The outliers were usually chasing novelty or solving problems I don’t have.

    But sometimes… honesty about tradeoffs is rare and valuable. The response that told me my RPO was fifteen minutes was more useful than the five that told me it was zero.

    No round 1 response had an opportunity to address exclusive agentic administration. Few of them mentioned shared duties though I’d contemplated that in my requirements. My outcomes might have been substantively different if exclusivity had been an initial core requirement, though the early specification of IaC probably helped. All six gave me plausible results, but the contrasting process gave something that feels more considered.


    The plan is written. The decisions are made. The document exists. I can still see gaps, which is either a sign of maturity or a sign I need to stop looking.

    Send a rescue team if you don’t hear from me in a week.


    This is part of an ongoing series about running an obsessively documented homelab and learning something new every time I break it.

  • The AI Architecture Tournament — Motivation, Resources, and Requirements

    The AI Architecture Tournament — Motivation, Resources, and Requirements

    Part 1 of 2. Part 2 covers the three rounds of AI input and the contested decisions.


    There’s a certain kind of homelab project that starts as a reasonable question and ends with you staring at a 60-page architecture document wondering how you got here.

    That’s where I am.


    A Working Mess

    My homelab has been running on Unraid. It works. I have fourteen Docker containers running, a Synology Diskstation for backups, a Home Assistant Green for automation, my gaming PC doing AI inference, and enough Shelly devices to control the lighting in every room of my house. It’s not elegant, but it’s mine, and it mostly does what I want.

    The problem isn’t that it’s broken. The problem is that the more I’ve learned, the more I can see the places where it’s more accreted than designed. More generative than purposeful. No reverse proxy. No infrastructure-as-code. Secrets half-managed. Backups partially verified. A git-watcher script that was never actually version-controlled. The kind of debt that doesn’t break you today but makes every future change a little harder than it needs to be.

    I’d been thinking about a proper rearchitecture for a while. But thinking and doing are different things, and I needed a question sharp enough to actually move on.


    The Question

    What if I put everything on the table? No sacred cows. What would the setup look like if every decision were optimized based on using the hardware at its most powerful and efficient, and every software decision was driven by use cases and outcomes rather than favorites or familiarity?

    I’d be deeply interested in the answer to this question, but the legwork to get there is daunting to my ADHD. Thankfully we’ve got AI to offload that kind of cognitive gruntwork to.

    I also wanted to avoid problem-solving with my credit card. A hard constraint of ‘no new purchases’ was applied. Work with what you have.

    So, here’s the full list:

    • Selected configurations are always best practices aligned
    • Infrastructure as code
    • No new hardware. All architecture, method, process, software and OS are fair game.
    • 3-2-1 backups
    • Free and Open Source software
    • Development vs Production environments, where possible
    • End to end administrable by human or by agent
    • Rigorous changelog
    • Securely accessible inside and outside the home
    • Tier 2 Availability (redundancy primarily in storage)
    • RTO: 24hrs max
    • RPO: No data loss is acceptable
    • MTTR: 4 hrs
    • Failover: Manual
    • Migration downtime should be minimized, but is the least concern. Home Assistant is the largest concern for migration downtime.

    What I Have to Work With

    Three machines, a Synology, and a Home Assistant Green.

    The primary server is a 2017 era repurposed HP workstation — i7-7700K, 32GB DDR4, a GTX 1070 I use for audio transcription, and a mix of spinning rust and SSDs. It’s been getting the job done.

    The beast machine is my desktop — where I’m typing this. An i9-14900K with an RTX 4080 Super, 48GB of DDR5, and two 4TB NVMe drives. It currently runs Ollama for local LLM inference and games on the weekends. The “games” part of that may be in jeopardy.

    The third machine is a Sandy Bridge i7-2600K from 2011. It runs Ubuntu and various things I’ve tried over the years. She’s old and no longer as mighty as when I specced her back in the day, but she has sentimental value and still shows up to work on time if the task is sized right.

    The Synology is a DS418 with 4×8TB drives. It handles backups. The Home Assistant Green runs home automation and will be staying exactly where it is for as long as I can manage it.


    Writing the Requirements Document

    Before I could ask anyone — AI or otherwise — for a design, I had to know what I was asking for. So I wrote a requirements document.

    It covered the hardware above in detail, including exact CPU/RAM/storage/GPU specifications per machine. It listed every service I run and why I run it. It articulated the constraints.

    It also specified something I hadn’t entirely thought through until I wrote it down: I wanted the final architecture to be end-to-end administrable by an AI agent. Not because I’m looking to fully hand over the keys, but because if Claude can autonomously execute operations, that means the operations are well-documented, idempotent, and testable. The discipline of designing for AI operation will hopefully produce better infrastructure for human operation too.

    Some of you may be screaming inside about the sustainability (along many axes) of letting the AI run the show. And rightly so. But at this point we were at the thought exercise stage of the game and there was plenty of time to navigate risk as we kicked off the opening ceremony.


    What Came Next

    With a requirements document in hand, I did something I hadn’t done much of since diving into LLM tools: a structured response comparison. Not just asking one system and running with the answer, but treating it as an input process.

    Part 2 is where the tournament happens.


    This is part of an ongoing series about running an obsessively documented homelab and learning something new every time I break it.