Tag: self-hosted

  • The Router Backup API Synology Didn’t Document

    The Router Backup API Synology Didn’t Document

    Update (2026-03-04): After this post was shared on Reddit, a commenter pointed out a simpler CLI-based alternative that also works on SRM. See the update below.

    My router has a backup feature. It lives in the control panel under Backup & Restore — click Export, get a .dss file containing your network settings, firewall rules, DHCP reservations, DNS zones, WiFi configuration, and mesh topology. Everything you’d need to rebuild from scratch.

    If you’re here for the working code, skip to the Implementation Reference at the end.

    The problem: it’s manual-only. No scheduler. No “run this at 3am every morning.” The device managing my entire home network had no automated config backup, and it had been on my to-do list long enough that I’d stopped noticing it.

    What followed was a two-session investigation into an API that exists but isn’t documented, an authentication mechanism that the documentation actively misrepresents, and a use case where AI collaboration was the thing that made the whole effort practical.


    The API Exists

    Synology’s SRM — the operating system running on the RT6600ax — shares its DNA with DSM, their NAS OS. DSM exposes a REST API that powers its own web UI, and SRM inherits a version of it. I knew this API existed. The question was whether the backup functionality was reachable through it.

    Querying SYNO.API.Info — an unauthenticated endpoint that returns every available API on the device — confirmed it was:

    SYNO.Backup.Config.Backup  →  entry.cgi  (versions 1–1)

    The endpoint existed. So far so good.


    The Documentation Stops There

    Here’s where things get interesting. Synology publishes a developer guide for DSM — 223 pages covering authentication, session management, API structure, and dozens of specific namespaces. For SRM, there’s no equivalent. The router OS running in millions of homes has no public developer documentation.1Synology’s DSM Developer Guide (version 7, 2022) runs to 223 pages covering authentication, session management, API versioning, and namespace documentation. No equivalent document exists for SRM. Synology’s public knowledge base contains no developer-facing API reference for SRM 1.x.

    The official authentication guide describes a login flow: submit your username and password to SYNO.API.Auth, get a session ID back, use it for subsequent calls. Straightforward. That is the documentation.2Synology’s DSM Login Web API Guide describes authentication as: submit account and passwd to SYNO.API.Auth via HTTP/HTTPS, receive a session ID. Client-side password encryption is not mentioned. HTTPS is referenced as the security mechanism; there is no mention of application-layer encryption of credentials.

    It is also wrong — or at least, incomplete to the point of being useless on SRM.


    The Auth Wall

    We started testing with curl. The backup service account had admin group membership in SRM. The request was well-formed. The response was error 402: permission denied.

    Tried the built-in admin account. Error 400: bad credentials — except the web UI accepted the same credentials without complaint.

    Over the next stretch of the session, Claude and I systematically eliminated every reasonable variable: session name (Core, SRM, RouterManagement, SurveillanceStation), HTTP vs HTTPS, port 8000 vs 8001, API version 1 vs 3, GET vs POST with URL encoding. Every combination returned 400 for the admin account while the browser accepted the same credentials without issue.

    The SRM security log offered one useful clue: every successful UI login appeared as admin:. Every API attempt appeared as SYSTEM:. The router was rejecting API auth at a system level, before credential validation even happened.

    We checked AutoBlock (added source IPs to the allow list — no effect). We looked at the “Allow external access to SRM” setting (WAN-only, completely irrelevant to LAN API calls). Both were dead ends.


    Finding the Real Problem

    With every obvious explanation exhausted, Claude queried the API info endpoint for anything encryption or token related. One name in the results stood out:

    SYNO.API.Encryption

    That single API name explained everything.

    SRM’s web UI encrypts passwords client-side using RSA before submitting them. The server only accepts encrypted credentials. The official documentation describing plaintext authentication is DSM documentation — and while DSM supports plaintext credentials over HTTPS, SRM does not. There is no published note about this divergence anywhere.3The existence of SYNO.API.Encryption is discoverable only by querying the SYNO.API.Info endpoint on the device itself — it does not appear in any Synology developer documentation, knowledge base article, or official community resource. The divergence between DSM (plaintext over HTTPS) and SRM (RSA-encrypted credentials) is nowhere documented.

    Fetching the encryption endpoint returned a 4096-bit RSA public key, a cipher field name, and a server timestamp. The browser fetches this on every login, encrypts your password with PKCS1v15, and sends ciphertext instead of plaintext. We’d been sending plaintext the whole time.


    The RSA Rabbit Hole

    We implemented the RSA flow manually using Python’s cryptography library: fetch the public key, encrypt the password, submit the ciphertext in the documented field. Still error 400.

    Something was still missing. Claude pivoted to the synology-api Python library — a community project wrapping Synology’s APIs — and traced through its authentication module. What it revealed were two parameters appearing nowhere in any documentation: dsm_version=3 and session=webui.4The synology-api Python library implements ~300 Synology APIs. Its README and supported API list do not mention SRM support, the dsm_version=3 parameter required for SRM, or session=webui as the correct session name. SYNO.Backup.Config.Backup is not among the library’s listed APIs. The working parameter combination was determined through library source code, not documentation.

    DSM uses dsm_version=7. SRM uses dsm_version=3. The session name webui — not Core, not SRM, not any of the names we’d tried — is what SRM expects. Neither is documented for SRM anywhere. The values were found in library source code, not in any published reference.

    With those two parameters, authentication succeeded.


    The Backup Flow

    Once past the auth wall, SYNO.Backup.Config.Backup worked as advertised — a clean three-step async flow:

    1. Start — call with method=start. The router begins generating the archive and returns a task_id.
    2. Poll — call with method=status and the task_id until completion is confirmed.
    3. Download — call with method=download. The router returns the .dss archive.

    The output is a file named SynologyRouter_YYYYMMDD.dss, roughly 91KB, containing the complete router configuration — network settings, firewall rules, DHCP reservations, DNS zones, WiFi credentials, and mesh AP config.


    What This Would Have Looked Like Without Claude

    This is the honest part of the post.

    The technical problem here isn’t exotic. API investigation is a standard skill. The individual steps — probe the info endpoint, test auth variations, find the encryption layer, locate a community library — are each things a developer could work through independently.

    But the scope would have made it impractical as a spare-evening project. Systematically testing a dozen authentication variations, recognizing SYNO.API.Encryption in a list of 100+ API names as the diagnostic key, implementing RSA in Python on the fly, debugging why the manual RSA implementation still failed, and tracing through library source code to identify two undocumented parameters — each step depends on full context from all the previous steps. Losing that thread between evenings, or spending an hour reconstructing it each time, is exactly what turns “automatable” into “perpetually on the list.”

    Claude held the full investigation context across the session, made the lateral connection to SYNO.API.Encryption, knew to reach for the synology-api library when the manual implementation stalled, and identified the undocumented dsm_version and session parameters by reading the library source. No single step was magic. The value was continuity — a collaborator that didn’t lose the thread.

    The router config backup now runs at 3:00 AM every night. Each .dss file lands in /mnt/user/backups/router/, where it’s swept into the existing nightly rsync to a secondary NAS. Thirty days of retention at ~91KB per file. If the router ever needs to be rebuilt, the config is there.

    The gap that existed because the documentation didn’t exist is closed.


    Appendix: Implementation Reference

    Enough detail to replicate this setup without AI assistance.

    Dependencies

    pip install synology-api cryptography requests

    On Unraid, install to a persistent path (the root filesystem is tmpfs and doesn’t survive reboots):

    pip3 install --target=/mnt/user/scripts/router-backup/lib \
      synology-api cryptography requests

    Authentication

    import sys, os
    sys.path.insert(0, '/mnt/user/scripts/router-backup/lib')
    
    from synology_api import base_api
    
    session = base_api.BaseApi(
        ip_address='192.168.1.1',
        port=8000,
        username='backup',           # must be in admin group in SRM
        password=os.environ['ROUTER_BACKUP'],
        secure=False,
        cert_verify=False,
        dsm_version=3,               # SRM requires 3, not 6 or 7
        debug=False,
        otp_code=None
    )
    session.login('webui')           # 'webui' is required, other names return 402

    Key parameters undocumented for SRM:

    ParameterSRM valueDSM valueEffect of wrong value
    dsm_version36 or 7Error 400
    session name'webui''Core'Error 402

    The service account must be a member of the admin group in SRM Control Panel → User.

    Backup API Flow

    import time, datetime, requests
    
    WEBAPI = 'http://192.168.1.1:8000/webapi/entry.cgi'
    sid = session.session_id
    
    # Step 1: Start
    resp = requests.post(WEBAPI, data={
        'api': 'SYNO.Backup.Config.Backup',
        'version': '1',
        'method': 'start',
        '_sid': sid,
    })
    task_id = resp.json()['data']['task_id']
    
    # Step 2: Poll
    while True:
        status = requests.post(WEBAPI, data={
            'api': 'SYNO.Backup.Config.Backup',
            'version': '1',
            'method': 'status',
            'task_id': task_id,
            '_sid': sid,
        }).json()
        if status['data']['state'] == 'finish':
            break
        time.sleep(2)
    
    # Step 3: Download
    response = requests.post(WEBAPI, data={
        'api': 'SYNO.Backup.Config.Backup',
        'version': '1',
        'method': 'download',
        'task_id': task_id,
        '_sid': sid,
    }, stream=True)
    
    filename = f"SynologyRouter_{datetime.date.today().strftime('%Y%m%d')}.dss"
    with open(f'/mnt/user/backups/router/{filename}', 'wb') as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)

    Output and Restoration

    The .dss file is a binary archive (~91KB for a moderately configured RT6600ax). Restore via SRM → Control Panel → Backup & Restore → Restore. Contents: network interfaces, DHCP reservations, firewall rules, DNS zones, WiFi SSIDs and credentials, mesh AP configuration.

    Deployment

    ItemValue
    Script location/mnt/user/scripts/router-backup/
    Dependencies/mnt/user/scripts/router-backup/lib/
    Credentials.env in script dir, chmod 600
    Schedule0 3 * * * via Unraid User Scripts
    Output path/mnt/user/backups/router/
    Retention30 most recent files (~2.7MB total)
    Backup chainSwept into Tier 4 rsync to NAS2 at 5:30 AM

    Update — 2026-03-04

    After sharing this post on r/synology, DaveR007 pointed out that Synology ships a CLI tool called synoconfbkp that can export the same .dss configuration file — and that it exists on SRM, not just DSM. I SSH’d into my RT6600ax to verify, and he’s right:

    backup@SynologyRouter:/$ ls -la /usr/syno/bin/synoconfbkp
    -rwxr-xr-x    1 root     root        554881 Nov  4  2022 /usr/syno/bin/synoconfbkp

    The binary works on SRM and produces a standard .dss backup:

    sudo /usr/syno/bin/synoconfbkp export --filepath=/tmp/backup.dss

    It supports the same config categories you’d expect, plus SRM-specific ones like wifi, mesh, and router_cp. DaveR007 maintains a wrapper script that adds timestamped filenames, hostname inclusion, and optional SCP to remote shares: Synology_Config_Backup on GitHub.

    How the two approaches compare:

    The CLI method is simpler. If you have SSH access to the device, synoconfbkp does the job in a single command with no authentication gymnastics. You can schedule it with Synology’s built-in Task Scheduler and the backup runs entirely on the device itself.

    The API method described in this post runs remotely — it doesn’t require SSH to be enabled on the router, and the backup is initiated and downloaded from another machine on the network. That’s a meaningful distinction: SSH access on SRM grants sudo to any user in the admin group (there’s no granular privilege separation), so enabling SSH for automated backups means any compromised admin credential has full root-equivalent access to the device. The API uses the same admin account, but the channel is narrower — an API session can only reach what Synology built HTTP handlers for, with no path to arbitrary command execution.

    Both produce the same .dss file. Both are valid. Which one fits depends on what you’re comfortable leaving enabled and where you want the automation to live.

    Thanks to DaveR007 for the tip and for sharing his work.

    • 1
      Synology’s DSM Developer Guide (version 7, 2022) runs to 223 pages covering authentication, session management, API versioning, and namespace documentation. No equivalent document exists for SRM. Synology’s public knowledge base contains no developer-facing API reference for SRM 1.x.
    • 2
      Synology’s DSM Login Web API Guide describes authentication as: submit account and passwd to SYNO.API.Auth via HTTP/HTTPS, receive a session ID. Client-side password encryption is not mentioned. HTTPS is referenced as the security mechanism; there is no mention of application-layer encryption of credentials.
    • 3
      The existence of SYNO.API.Encryption is discoverable only by querying the SYNO.API.Info endpoint on the device itself — it does not appear in any Synology developer documentation, knowledge base article, or official community resource. The divergence between DSM (plaintext over HTTPS) and SRM (RSA-encrypted credentials) is nowhere documented.
    • 4
      The synology-api Python library implements ~300 Synology APIs. Its README and supported API list do not mention SRM support, the dsm_version=3 parameter required for SRM, or session=webui as the correct session name. SYNO.Backup.Config.Backup is not among the library’s listed APIs. The working parameter combination was determined through library source code, not documentation.
  • The AI Architecture Tournament — Three Rounds and Eight Contested Decisions

    The AI Architecture Tournament — Three Rounds and Eight Contested Decisions

    Part 2 of 2. Part 1 covers the motivations, hardware, and requirements.


    Round One: Six Prompts, Six Blueprints

    I fed the requirements document to five AI systems: Claude, ChatGPT, Grok, Lumo, and Meta AI. Each received the same document. The prompt was simple:

    Prepare Architecture recommendations in a markdown document.

    The output was six architecture proposals, ranging from terse (MetaAI with 536 words) to torrential (Claude with 4,255 words). Reading them back to back was instructive. There was significant consensus on the fundamentals – Proxmox VE was recommended by almost every system, ZFS for storage, Docker Compose as the service layer – but the disagreements were where it got interesting.

    One system suggested running k3s across three machines spanning Sandy Bridge, Kaby Lake, and Raptor Lake architecture. Three CPU generations, three GPU generations, heterogeneous storage, aging SATA. (No.) One suggested I flip which machine is primary in a way that would have dedicated my i9-14900K/RTX 4080 Super to light transcription duties while the i7-7700K ran primary AI inference. (Also no.) One documented a recovery point objective of zero, which felt more like aspiration than engineering.

    The disagreements were the interesting part. Agreement is cheap. Disagreement forces you to actually think.


    Round Two: Three Comparative Analyses

    I took the six responses and fed them – all of them, together – to three different systems (Meta, Lumo and Claude) and asked each to compare, rank, and critique. What did they agree on? Where were the meaningful differences? Which recommendations were well-reasoned versus well-marketed?

    This round was useful in a specific way: it surfaced the shape of the disagreements. Not just “System A said X and System B said Y,” but why those differences existed and what they revealed about each system’s underlying assumptions. Systems that had been confidently wrong in Round One looked different in aggregate than systems that had been thoughtfully uncertain.


    Round Three: The Synthesis

    For the final round, I sat down with Claude to write a synthesis – a document that takes everything useful from all six responses, resolves every contested decision with explicit reasoning, and throws out anything over-engineered for this scale. Whether that last point was effectively navigated I’ll leave as an exercise for the reader.

    At this stage I also shifted emphasis on human vs AI and asked Claude to imagine implementing and operating the entire stack with little to no interaction from me. That framing shaped the output significantly.

    The resulting document – Homelab Architecture Synthesis: Claude-Implementable Design – is sitting at version 1.0 in my research folder. It runs to about sixty kilobytes of markdown, which means it’s either comprehensive or I have a problem. Plausibly both.


    Where the Contested Decisions Landed

    Eight decisions diverged meaningfully across the six responses. Here’s where the synthesis came out:

    Proxmox VE. Almost unanimous, and correct. FOSS, first-class ZFS, LXC containers with GPU passthrough, a purpose-built backup server. Unraid has been fine – but, to my sorrow and somewhat unforgivably, it expects to operate as root. That’s a hard thing to build an agentic management model on top of.

    Machine roles. My desktop – the i9-14900K/RTX 4080 Super – becomes the primary production host. My existing server becomes secondary production and storage. The Sandy Bridge box gets a quiet semi-retirement running AdGuard Home, Uptime Kuma, and dev instances of services. Things where “14-year-old hardware” genuinely doesn’t matter. RIP ultimate resolution on new-release video games.

    Terraform + Ansible + Git-controlled Compose files as the IaC stack. This is the decision I’m most excited about. Right now, if my Unraid server died, I’d be reconstructing container configurations from memory and half-remembered XML templates. With this stack, recovery is a `terraform apply`.

    SOPS + Age for secrets management. Encrypted in git. No plaintext credentials in compose files. (Yes, I currently have a Forgejo database running with the password `changeme`. That’s on the list. It’s been on the list.)

    Caddy as the reverse proxy. No more bare port numbers on every service URL. Finally becoming an adult.

    RPO honesty. Most systems told me my recovery point objective would be zero. One said it would be fifteen minutes and here’s why. The honest answer was more useful, even though it was less impressive. ZFS snapshots every fifteen minutes get you to RPO ? 15 minutes. Critical databases get WAL archiving to approach true zero. Document what you can actually deliver.


    Claude’s Implementation Model

    The section I’m most excited and scared of.

    None of the Round One responses had an opportunity to address agentic administration – it wasn’t an explicit requirement, and no system volunteered it. The synthesis adds a layer no other document addressed: a defined model for how Claude operates on the infrastructure. What network access it has (WireGuard peer). What credentials (SSH keys, Proxmox API, Forgejo API). What it can do autonomously versus what requires my approval. What it is explicitly never allowed to do without human confirmation. How it responds to Alertmanager incidents – fetching a runbook, executing the procedure, reporting back.

    The whole thing is designed to be end-to-end manageable by either a human or an AI agent. That discipline turns out to improve the infrastructure design regardless of whether the AI ever actually runs it.


    Phase -1

    The synthesis has a four-phase implementation roadmap, starting with installing Proxmox on physical hardware, which requires hands and scheduling. But I’m not in Phase 0 yet.

    A lot is riding on runbooks that aren’t written. We need a disaster recovery plan for when the AI is unavailable or gets fired. A consistent infrastructure idiom for nomenclature and design choices across disparate surfaces. Baked-in patterns for knowledge reinvestment – making sure the system gets smarter from operating, not dumber.

    I have in mind four core skill documents – shared DNA, different mindsets appropriate to each phase: implementation, management, refinement, dev/testing. That’s where the principles, guardrails, and operating posture get encoded. That work comes before Phase 0.

    So: a Phase -1, perhaps?


    What Running the Tournament Taught Me

    The majority is right more often than any individual. On every major decision where four or five systems agreed, they were right. The outliers were usually chasing novelty or solving problems I don’t have.

    But sometimes… honesty about tradeoffs is rare and valuable. The response that told me my RPO was fifteen minutes was more useful than the five that told me it was zero.

    No round 1 response had an opportunity to address exclusive agentic administration. Few of them mentioned shared duties though I’d contemplated that in my requirements. My outcomes might have been substantively different if exclusivity had been an initial core requirement, though the early specification of IaC probably helped. All six gave me plausible results, but the contrasting process gave something that feels more considered.


    The plan is written. The decisions are made. The document exists. I can still see gaps, which is either a sign of maturity or a sign I need to stop looking.

    Send a rescue team if you don’t hear from me in a week.


    This is part of an ongoing series about running an obsessively documented homelab and learning something new every time I break it.

  • The AI Architecture Tournament — Motivation, Resources, and Requirements

    The AI Architecture Tournament — Motivation, Resources, and Requirements

    Part 1 of 2. Part 2 covers the three rounds of AI input and the contested decisions.


    There’s a certain kind of homelab project that starts as a reasonable question and ends with you staring at a 60-page architecture document wondering how you got here.

    That’s where I am.


    A Working Mess

    My homelab has been running on Unraid. It works. I have fourteen Docker containers running, a Synology Diskstation for backups, a Home Assistant Green for automation, my gaming PC doing AI inference, and enough Shelly devices to control the lighting in every room of my house. It’s not elegant, but it’s mine, and it mostly does what I want.

    The problem isn’t that it’s broken. The problem is that the more I’ve learned, the more I can see the places where it’s more accreted than designed. More generative than purposeful. No reverse proxy. No infrastructure-as-code. Secrets half-managed. Backups partially verified. A git-watcher script that was never actually version-controlled. The kind of debt that doesn’t break you today but makes every future change a little harder than it needs to be.

    I’d been thinking about a proper rearchitecture for a while. But thinking and doing are different things, and I needed a question sharp enough to actually move on.


    The Question

    What if I put everything on the table? No sacred cows. What would the setup look like if every decision were optimized based on using the hardware at its most powerful and efficient, and every software decision was driven by use cases and outcomes rather than favorites or familiarity?

    I’d be deeply interested in the answer to this question, but the legwork to get there is daunting to my ADHD. Thankfully we’ve got AI to offload that kind of cognitive gruntwork to.

    I also wanted to avoid problem-solving with my credit card. A hard constraint of ‘no new purchases’ was applied. Work with what you have.

    So, here’s the full list:

    • Selected configurations are always best practices aligned
    • Infrastructure as code
    • No new hardware. All architecture, method, process, software and OS are fair game.
    • 3-2-1 backups
    • Free and Open Source software
    • Development vs Production environments, where possible
    • End to end administrable by human or by agent
    • Rigorous changelog
    • Securely accessible inside and outside the home
    • Tier 2 Availability (redundancy primarily in storage)
    • RTO: 24hrs max
    • RPO: No data loss is acceptable
    • MTTR: 4 hrs
    • Failover: Manual
    • Migration downtime should be minimized, but is the least concern. Home Assistant is the largest concern for migration downtime.

    What I Have to Work With

    Three machines, a Synology, and a Home Assistant Green.

    The primary server is a 2017 era repurposed HP workstation — i7-7700K, 32GB DDR4, a GTX 1070 I use for audio transcription, and a mix of spinning rust and SSDs. It’s been getting the job done.

    The beast machine is my desktop — where I’m typing this. An i9-14900K with an RTX 4080 Super, 48GB of DDR5, and two 4TB NVMe drives. It currently runs Ollama for local LLM inference and games on the weekends. The “games” part of that may be in jeopardy.

    The third machine is a Sandy Bridge i7-2600K from 2011. It runs Ubuntu and various things I’ve tried over the years. She’s old and no longer as mighty as when I specced her back in the day, but she has sentimental value and still shows up to work on time if the task is sized right.

    The Synology is a DS418 with 4×8TB drives. It handles backups. The Home Assistant Green runs home automation and will be staying exactly where it is for as long as I can manage it.


    Writing the Requirements Document

    Before I could ask anyone — AI or otherwise — for a design, I had to know what I was asking for. So I wrote a requirements document.

    It covered the hardware above in detail, including exact CPU/RAM/storage/GPU specifications per machine. It listed every service I run and why I run it. It articulated the constraints.

    It also specified something I hadn’t entirely thought through until I wrote it down: I wanted the final architecture to be end-to-end administrable by an AI agent. Not because I’m looking to fully hand over the keys, but because if Claude can autonomously execute operations, that means the operations are well-documented, idempotent, and testable. The discipline of designing for AI operation will hopefully produce better infrastructure for human operation too.

    Some of you may be screaming inside about the sustainability (along many axes) of letting the AI run the show. And rightly so. But at this point we were at the thought exercise stage of the game and there was plenty of time to navigate risk as we kicked off the opening ceremony.


    What Came Next

    With a requirements document in hand, I did something I hadn’t done much of since diving into LLM tools: a structured response comparison. Not just asking one system and running with the answer, but treating it as an input process.

    Part 2 is where the tournament happens.


    This is part of an ongoing series about running an obsessively documented homelab and learning something new every time I break it.