refactor(config): Move all state files to .madblog #22

Merged
blacklight merged 5 commits from refactor/unique-state-directory into main 2026-03-11 21:59:59 +01:00
Owner

State Directory Refactoring Plan

This document describes the plan to consolidate all Madblog-specific state under
a single configurable state directory (defaulting to .madblog).

Motivation

Currently, Madblog stores state in multiple locations within the content
directory:

  • .madblog/ — partial state (ActivityPub sync, cache, webmentions sync)
  • activitypub/ — pubby's FileActivityPubStorage data (private key,
    followers, objects, interactions)
  • mentions/ — webmentions markdown files (incoming/outgoing)

This layout clutters the content directory with implementation details. The goal
is to:

  1. Consolidate all non-content state under a single .madblog directory.
  2. Make it configurable so users can point state storage to a different
    location (e.g., outside the content directory for backup/sync separation).
  3. Preserve existing data through automatic migration with no reprocessing.

Current Layout

<content_dir>/
├── markdown/                          # User content (unchanged)
│   └── *.md
├── .madblog/                          # Partial state (current)
│   ├── activitypub/
│   │   ├── deleted_urls.json          # Tracks deleted article URLs
│   │   ├── file_urls.json             # Maps files to AP object URLs
│   │   └── published_objects.json     # mtime cache for startup sync
│   ├── cache/
│   │   └── tags-index.json            # Tag index cache
│   └── webmentions_sync.json          # mtime cache for startup sync
├── activitypub/                       # pubby FileActivityPubStorage
│   ├── private_key.pem                # RSA key for HTTP signatures
│   ├── followers/                     # Follower JSON files
│   │   └── *.json
│   ├── objects/                       # Published AP objects
│   │   └── *.json
│   └── interactions/                  # Received interactions
│       └── *.json
└── mentions/                          # Webmentions storage
    ├── incoming/
    │   └── <post-slug>/
    │       └── webmention-*.md
    └── outgoing/
        └── <post-slug>/
            └── webmention-*.md

Source Code References

Directory/File Defined In Code Location
.madblog/activitypub/ ActivityPubIntegration.__init__ activitypub/_integration.py:65-72
.madblog/cache/tags-index.json TagIndex.__init__ tags/_index.py:163-164
.madblog/webmentions_sync.json FileWebmentionsStorage.__init__ webmentions/_storage.py:66-67
activitypub/ ActivityPubMixin._init_activitypub activitypub/_mixin.py:115-125
mentions/ WebmentionsMixin._init_webmentions webmentions/_mixin.py:34-36

Proposed Layout

<content_dir>/
├── markdown/                          # User content (unchanged)
│   └── *.md
└── .madblog/                          # All state consolidated here
    ├── activitypub/
    │   ├── private_key.pem            # ActivityPub private key
    │   ├── state/                     # pubby FileActivityPubStorage
    │   │   ├── followers/
    │   │   ├── objects/
    │   │   └── interactions/
    │   ├── deleted_urls.json
    │   ├── file_urls.json
    │   └── published_objects.json
    ├── cache/
    │   └── tags-index.json
    ├── mentions/                      # Webmentions storage (moved)
    │   ├── incoming/
    │   │   └── <post-slug>/
    │   │       └── webmention-*.md
    │   └── outgoing/
    │       └── <post-slug>/
    │           └── webmention-*.md
    └── webmentions_sync.json

Key Changes

  1. activitypub/.madblog/activitypub/state/

    • pubby's FileActivityPubStorage data moves under .madblog.
    • The state/ subdirectory keeps pubby's storage separate from Madblog's
      ActivityPub sync files.
  2. mentions/.madblog/mentions/

    • Webmentions markdown files move under .madblog.
    • Directory structure (incoming/, outgoing/, <post-slug>/) preserved.
  3. Configurable state directory

    • New config option: state_dir (default: <content_dir>/.madblog)
    • Environment variable: MADBLOG_STATE_DIR
    • All state paths derive from this base.

Configuration

New Config Option

# config.yaml
state_dir: /path/to/custom/state  # Optional, defaults to <content_dir>/.madblog

Environment Variable

MADBLOG_STATE_DIR=/path/to/custom/state

Config Class Changes

Add to madblog/config.py:

@dataclass
class Config:
    # ... existing fields ...
    state_dir: str | None = None  # None means use default: <content_dir>/.madblog

    @property
    def resolved_state_dir(self) -> Path:
        """Return the resolved state directory path."""
        if self.state_dir:
            return Path(self.state_dir).expanduser().resolve()
        return Path(self.content_dir).expanduser().resolve() / ".madblog"

Migration Plan

Strategy: Automatic Detection + Migration

On startup, Madblog will:

  1. Check if the old layout exists (legacy directories at content root).
  2. If detected and new layout doesn't exist, automatically migrate.
  3. Log migration actions clearly.
  4. Preserve mtime on migrated files to avoid reprocessing.

Detection Logic

def _detect_legacy_layout(content_dir: Path, state_dir: Path) -> dict:
    """
    Detect legacy directory structure.
    
    Returns dict with keys for each legacy path that exists:
    - 'activitypub': Path to <content_dir>/activitypub
    - 'mentions': Path to <content_dir>/mentions
    """
    legacy = {}
    
    # Check for legacy activitypub/ at content root
    legacy_ap = content_dir / "activitypub"
    if legacy_ap.is_dir():
        # Verify it's pubby storage (has followers/, objects/, or private_key.pem)
        if any((legacy_ap / sub).exists() for sub in ["followers", "objects", "private_key.pem"]):
            legacy["activitypub"] = legacy_ap
    
    # Check for legacy mentions/ at content root
    legacy_mentions = content_dir / "mentions"
    if legacy_mentions.is_dir():
        # Verify it's webmentions storage (has incoming/ or outgoing/)
        if any((legacy_mentions / sub).exists() for sub in ["incoming", "outgoing"]):
            legacy["mentions"] = legacy_mentions
    
    return legacy

Migration Implementation

import shutil
import os

def _migrate_legacy_state(content_dir: Path, state_dir: Path) -> None:
    """
    Migrate legacy state directories to new layout.
    
    Preserves file mtimes to avoid reprocessing.
    """
    legacy = _detect_legacy_layout(content_dir, state_dir)
    
    if not legacy:
        return
    
    logger.info("Detected legacy state layout, migrating to %s", state_dir)
    
    # Migrate activitypub/ -> state_dir/activitypub/state/
    if "activitypub" in legacy:
        src = legacy["activitypub"]
        dst = state_dir / "activitypub" / "state"
        _move_directory_preserve_mtime(src, dst)
        logger.info("Migrated %s -> %s", src, dst)
    
    # Migrate mentions/ -> state_dir/mentions/
    if "mentions" in legacy:
        src = legacy["mentions"]
        dst = state_dir / "mentions"
        _move_directory_preserve_mtime(src, dst)
        logger.info("Migrated %s -> %s", src, dst)


def _move_directory_preserve_mtime(src: Path, dst: Path) -> None:
    """
    Move directory tree preserving file modification times.
    
    Uses shutil.move for atomic moves when possible, falls back to
    copy+delete for cross-filesystem moves.
    """
    dst.parent.mkdir(parents=True, exist_ok=True)
    
    # Collect mtimes before move
    mtimes = {}
    for root, dirs, files in os.walk(src):
        for name in files:
            fpath = Path(root) / name
            try:
                mtimes[fpath.relative_to(src)] = os.stat(fpath).st_mtime
            except OSError:
                pass
    
    # Move the directory
    shutil.move(str(src), str(dst))
    
    # Restore mtimes (shutil.move may update them)
    for rel_path, mtime in mtimes.items():
        fpath = dst / rel_path
        if fpath.exists():
            try:
                os.utime(fpath, (mtime, mtime))
            except OSError:
                pass

Migration Location

The migration check should run early in application initialization, before any
subsystem that accesses state directories:

  • CLI entry point (madblog/cli.py): After init_config(), before app.start().
  • uWSGI entry point (madblog/uwsgi.py): After config initialization.

Suggested implementation:

# In madblog/state/_state.py (new module)

def ensure_state_directory() -> Path:
    """
    Ensure the state directory exists and migrate legacy layout if needed.
    
    Returns the resolved state directory path.
    """
    state_dir = config.resolved_state_dir
    content_dir = Path(config.content_dir).resolve()
    
    # Run migration if needed
    _migrate_legacy_state(content_dir, state_dir)
    
    # Ensure state directory exists
    state_dir.mkdir(parents=True, exist_ok=True)
    
    return state_dir

New state module structure:

-> madblog
  -> state
    -> __init__.py     # Public imports
    -> _state.py       # State-related logic
    -> _migrations.py  # Code used for migrations

Avoiding Reprocessing of Migrated Files

Problem

Both ActivityPubIntegration and FileWebmentionsStorage use
StartupSyncMixin to detect new/changed content on startup. They track file
mtimes in JSON cache files:

  • .madblog/activitypub/published_objects.json
  • .madblog/webmentions_sync.json

If files are moved without preserving mtimes, or if cache files reference old
paths, the system will treat migrated files as new and reprocess them.

Solution

  1. Preserve file mtimes during migration (shown above).

  2. Cache files remain valid — they track content file mtimes (markdown
    files in markdown/), not state file paths. Since content files are not
    moved, the caches remain valid.

  3. Webmentions markdown files — these are output files, not input. Moving
    them doesn't trigger reprocessing because:

    • Incoming webmentions: stored by hash, no mtime tracking.
    • Outgoing webmentions: processed based on content file mtimes in
      webmentions_sync.json, which tracks the markdown source files.
  4. ActivityPub storage — pubby's FileActivityPubStorage uses internal
    JSON files. Moving the entire directory preserves all internal state
    (followers, objects, interactions).

Verification Steps

After migration:

  1. Check that published_objects.json and webmentions_sync.json exist and
    contain valid JSON.
  2. Verify that starting Madblog does not trigger mass republishing:
    • No flood of "Publishing article to ActivityPub" log messages.
    • No flood of outgoing webmention requests.
  3. Verify that followers are preserved (check state_dir/activitypub/state/followers/).

Code Changes Required

Files to Modify

File Changes
madblog/config.py Add state_dir field and resolved_state_dir property
madblog/state/__init__.py Public imports for the state module
madblog/state/_state.py.py New module for state directory management
madblog/state/_migrations.py.py Code for one-off migrations
madblog/cli.py Call ensure_state_directory() after config init
madblog/uwsgi.py Call ensure_state_directory() after config init
madblog/activitypub/_mixin.py Use config.resolved_state_dir for ap_dir
madblog/activitypub/_integration.py Use config.resolved_state_dir for workdir
madblog/webmentions/_mixin.py Use config.resolved_state_dir for mentions_dir
madblog/webmentions/_storage.py Use config.resolved_state_dir for sync cache
madblog/tags/_index.py Use config.resolved_state_dir for cache dir

Path Resolution Changes

Before:

# activitypub/_mixin.py
ap_dir = os.path.join(config.content_dir, "activitypub")

# activitypub/_integration.py
self.workdir = Path(config.content_dir) / ".madblog" / "activitypub"

# webmentions/_mixin.py
self.mentions_dir = Path(config.content_dir) / "mentions"

# webmentions/_storage.py
self._sync_cache_file = self.root_dir / ".madblog" / "webmentions_sync.json"

# tags/_index.py
self._cache_dir = self._content_dir / ".madblog" / "cache"

After:

# activitypub/_mixin.py
ap_dir = config.resolved_state_dir / "activitypub" / "state"

# activitypub/_integration.py
self.workdir = config.resolved_state_dir / "activitypub"

# webmentions/_mixin.py
self.mentions_dir = config.resolved_state_dir / "mentions"

# webmentions/_storage.py
self._sync_cache_file = config.resolved_state_dir / "webmentions_sync.json"

# tags/_index.py
self._cache_dir = config.resolved_state_dir / "cache"

Manual Migration Script

For users who prefer manual control, provide a CLI command:

madblog migrate-state [--dry-run] [--state-dir PATH]

Implementation:

# In madblog/cli.py

def cmd_migrate_state(dry_run: bool = False, state_dir: str | None = None):
    """Migrate legacy state directories to new layout."""
    content_dir = Path(config.content_dir).resolve()
    target = Path(state_dir).resolve() if state_dir else config.resolved_state_dir
    
    legacy = _detect_legacy_layout(content_dir, target)
    
    if not legacy:
        print("No legacy state directories found.")
        return
    
    print(f"Detected legacy state directories:")
    for name, path in legacy.items():
        print(f"  {name}: {path}")
    
    if dry_run:
        print("\n[Dry run] Would migrate:")
        if "activitypub" in legacy:
            print(f"  {legacy['activitypub']} -> {target / 'activitypub' / 'state'}")
        if "mentions" in legacy:
            print(f"  {legacy['mentions']} -> {target / 'mentions'}")
        return
    
    _migrate_legacy_state(content_dir, target)
    print("Migration complete.")

Testing

Unit Tests

  1. Migration detection: Test _detect_legacy_layout() with various directory
    structures.
  2. Migration execution: Test _migrate_legacy_state() preserves:
    • All files
    • Directory structure
    • File mtimes
  3. No-op on new layout: Verify migration doesn't run when new layout exists.
  4. Path resolution: Test all state paths resolve correctly with both default
    and custom state_dir.

Integration Tests

  1. Fresh install: Verify new layout is created from scratch.
  2. Upgrade from legacy: Verify migration runs and state is preserved.
  3. No reprocessing: Verify startup after migration doesn't trigger mass
    republishing.

Rollback Plan

If migration causes issues:

  1. Move directories back manually:

    mv .madblog/activitypub/state activitypub
    mv .madblog/mentions mentions
    
  2. Or restore from backup if available.

The migration is non-destructive in that it moves (not copies) directories, so
the original data is preserved at the new location.


Timeline

  • Phase 1: Add state_dir config option and resolved_state_dir property.
  • Phase 2: Update all path references to use resolved_state_dir.
  • Phase 3: Implement migration detection and execution.
  • Phase 4: Add tests.
  • Phase 5: Update documentation (README.md, ARCHITECTURE.md).
# State Directory Refactoring Plan This document describes the plan to consolidate all Madblog-specific state under a single configurable state directory (defaulting to `.madblog`). ## Motivation Currently, Madblog stores state in multiple locations within the content directory: - `.madblog/` — partial state (ActivityPub sync, cache, webmentions sync) - `activitypub/` — pubby's `FileActivityPubStorage` data (private key, followers, objects, interactions) - `mentions/` — webmentions markdown files (incoming/outgoing) This layout clutters the content directory with implementation details. The goal is to: 1. **Consolidate** all non-content state under a single `.madblog` directory. 2. **Make it configurable** so users can point state storage to a different location (e.g., outside the content directory for backup/sync separation). 3. **Preserve existing data** through automatic migration with no reprocessing. --- ## Current Layout ``` <content_dir>/ ├── markdown/ # User content (unchanged) │ └── *.md ├── .madblog/ # Partial state (current) │ ├── activitypub/ │ │ ├── deleted_urls.json # Tracks deleted article URLs │ │ ├── file_urls.json # Maps files to AP object URLs │ │ └── published_objects.json # mtime cache for startup sync │ ├── cache/ │ │ └── tags-index.json # Tag index cache │ └── webmentions_sync.json # mtime cache for startup sync ├── activitypub/ # pubby FileActivityPubStorage │ ├── private_key.pem # RSA key for HTTP signatures │ ├── followers/ # Follower JSON files │ │ └── *.json │ ├── objects/ # Published AP objects │ │ └── *.json │ └── interactions/ # Received interactions │ └── *.json └── mentions/ # Webmentions storage ├── incoming/ │ └── <post-slug>/ │ └── webmention-*.md └── outgoing/ └── <post-slug>/ └── webmention-*.md ``` ### Source Code References | Directory/File | Defined In | Code Location | |----------------|------------|---------------| | `.madblog/activitypub/` | `ActivityPubIntegration.__init__` | `activitypub/_integration.py:65-72` | | `.madblog/cache/tags-index.json` | `TagIndex.__init__` | `tags/_index.py:163-164` | | `.madblog/webmentions_sync.json` | `FileWebmentionsStorage.__init__` | `webmentions/_storage.py:66-67` | | `activitypub/` | `ActivityPubMixin._init_activitypub` | `activitypub/_mixin.py:115-125` | | `mentions/` | `WebmentionsMixin._init_webmentions` | `webmentions/_mixin.py:34-36` | --- ## Proposed Layout ``` <content_dir>/ ├── markdown/ # User content (unchanged) │ └── *.md └── .madblog/ # All state consolidated here ├── activitypub/ │ ├── private_key.pem # ActivityPub private key │ ├── state/ # pubby FileActivityPubStorage │ │ ├── followers/ │ │ ├── objects/ │ │ └── interactions/ │ ├── deleted_urls.json │ ├── file_urls.json │ └── published_objects.json ├── cache/ │ └── tags-index.json ├── mentions/ # Webmentions storage (moved) │ ├── incoming/ │ │ └── <post-slug>/ │ │ └── webmention-*.md │ └── outgoing/ │ └── <post-slug>/ │ └── webmention-*.md └── webmentions_sync.json ``` ### Key Changes 1. **`activitypub/` → `.madblog/activitypub/state/`** - pubby's `FileActivityPubStorage` data moves under `.madblog`. - The `state/` subdirectory keeps pubby's storage separate from Madblog's ActivityPub sync files. 2. **`mentions/` → `.madblog/mentions/`** - Webmentions markdown files move under `.madblog`. - Directory structure (`incoming/`, `outgoing/`, `<post-slug>/`) preserved. 3. **Configurable state directory** - New config option: `state_dir` (default: `<content_dir>/.madblog`) - Environment variable: `MADBLOG_STATE_DIR` - All state paths derive from this base. --- ## Configuration ### New Config Option ```yaml # config.yaml state_dir: /path/to/custom/state # Optional, defaults to <content_dir>/.madblog ``` ### Environment Variable ```bash MADBLOG_STATE_DIR=/path/to/custom/state ``` ### Config Class Changes Add to `madblog/config.py`: ```python @dataclass class Config: # ... existing fields ... state_dir: str | None = None # None means use default: <content_dir>/.madblog @property def resolved_state_dir(self) -> Path: """Return the resolved state directory path.""" if self.state_dir: return Path(self.state_dir).expanduser().resolve() return Path(self.content_dir).expanduser().resolve() / ".madblog" ``` --- ## Migration Plan ### Strategy: Automatic Detection + Migration On startup, Madblog will: 1. Check if the **old layout** exists (legacy directories at content root). 2. If detected and new layout doesn't exist, **automatically migrate**. 3. Log migration actions clearly. 4. **Preserve mtime** on migrated files to avoid reprocessing. ### Detection Logic ```python def _detect_legacy_layout(content_dir: Path, state_dir: Path) -> dict: """ Detect legacy directory structure. Returns dict with keys for each legacy path that exists: - 'activitypub': Path to <content_dir>/activitypub - 'mentions': Path to <content_dir>/mentions """ legacy = {} # Check for legacy activitypub/ at content root legacy_ap = content_dir / "activitypub" if legacy_ap.is_dir(): # Verify it's pubby storage (has followers/, objects/, or private_key.pem) if any((legacy_ap / sub).exists() for sub in ["followers", "objects", "private_key.pem"]): legacy["activitypub"] = legacy_ap # Check for legacy mentions/ at content root legacy_mentions = content_dir / "mentions" if legacy_mentions.is_dir(): # Verify it's webmentions storage (has incoming/ or outgoing/) if any((legacy_mentions / sub).exists() for sub in ["incoming", "outgoing"]): legacy["mentions"] = legacy_mentions return legacy ``` ### Migration Implementation ```python import shutil import os def _migrate_legacy_state(content_dir: Path, state_dir: Path) -> None: """ Migrate legacy state directories to new layout. Preserves file mtimes to avoid reprocessing. """ legacy = _detect_legacy_layout(content_dir, state_dir) if not legacy: return logger.info("Detected legacy state layout, migrating to %s", state_dir) # Migrate activitypub/ -> state_dir/activitypub/state/ if "activitypub" in legacy: src = legacy["activitypub"] dst = state_dir / "activitypub" / "state" _move_directory_preserve_mtime(src, dst) logger.info("Migrated %s -> %s", src, dst) # Migrate mentions/ -> state_dir/mentions/ if "mentions" in legacy: src = legacy["mentions"] dst = state_dir / "mentions" _move_directory_preserve_mtime(src, dst) logger.info("Migrated %s -> %s", src, dst) def _move_directory_preserve_mtime(src: Path, dst: Path) -> None: """ Move directory tree preserving file modification times. Uses shutil.move for atomic moves when possible, falls back to copy+delete for cross-filesystem moves. """ dst.parent.mkdir(parents=True, exist_ok=True) # Collect mtimes before move mtimes = {} for root, dirs, files in os.walk(src): for name in files: fpath = Path(root) / name try: mtimes[fpath.relative_to(src)] = os.stat(fpath).st_mtime except OSError: pass # Move the directory shutil.move(str(src), str(dst)) # Restore mtimes (shutil.move may update them) for rel_path, mtime in mtimes.items(): fpath = dst / rel_path if fpath.exists(): try: os.utime(fpath, (mtime, mtime)) except OSError: pass ``` ### Migration Location The migration check should run early in application initialization, before any subsystem that accesses state directories: - **CLI entry point** (`madblog/cli.py`): After `init_config()`, before `app.start()`. - **uWSGI entry point** (`madblog/uwsgi.py`): After config initialization. Suggested implementation: ```python # In madblog/state/_state.py (new module) def ensure_state_directory() -> Path: """ Ensure the state directory exists and migrate legacy layout if needed. Returns the resolved state directory path. """ state_dir = config.resolved_state_dir content_dir = Path(config.content_dir).resolve() # Run migration if needed _migrate_legacy_state(content_dir, state_dir) # Ensure state directory exists state_dir.mkdir(parents=True, exist_ok=True) return state_dir ``` New `state` module structure: ``` -> madblog -> state -> __init__.py # Public imports -> _state.py # State-related logic -> _migrations.py # Code used for migrations ``` --- ## Avoiding Reprocessing of Migrated Files ### Problem Both `ActivityPubIntegration` and `FileWebmentionsStorage` use `StartupSyncMixin` to detect new/changed content on startup. They track file mtimes in JSON cache files: - `.madblog/activitypub/published_objects.json` - `.madblog/webmentions_sync.json` If files are moved without preserving mtimes, or if cache files reference old paths, the system will treat migrated files as new and reprocess them. ### Solution 1. **Preserve file mtimes during migration** (shown above). 2. **Cache files remain valid** — they track *content* file mtimes (markdown files in `markdown/`), not state file paths. Since content files are not moved, the caches remain valid. 3. **Webmentions markdown files** — these are *output* files, not input. Moving them doesn't trigger reprocessing because: - Incoming webmentions: stored by hash, no mtime tracking. - Outgoing webmentions: processed based on content file mtimes in `webmentions_sync.json`, which tracks the markdown source files. 4. **ActivityPub storage** — pubby's `FileActivityPubStorage` uses internal JSON files. Moving the entire directory preserves all internal state (followers, objects, interactions). ### Verification Steps After migration: 1. Check that `published_objects.json` and `webmentions_sync.json` exist and contain valid JSON. 2. Verify that starting Madblog does not trigger mass republishing: - No flood of "Publishing article to ActivityPub" log messages. - No flood of outgoing webmention requests. 3. Verify that followers are preserved (check `state_dir/activitypub/state/followers/`). --- ## Code Changes Required ### Files to Modify | File | Changes | |---------------------------------------|---------------------------------------------------------| | `madblog/config.py` | Add `state_dir` field and `resolved_state_dir` property | | `madblog/state/__init__.py` | Public imports for the `state` module | | `madblog/state/_state.py.py` | New module for state directory management | | `madblog/state/_migrations.py.py` | Code for one-off migrations | | `madblog/cli.py` | Call `ensure_state_directory()` after config init | | `madblog/uwsgi.py` | Call `ensure_state_directory()` after config init | | `madblog/activitypub/_mixin.py` | Use `config.resolved_state_dir` for `ap_dir` | | `madblog/activitypub/_integration.py` | Use `config.resolved_state_dir` for workdir | | `madblog/webmentions/_mixin.py` | Use `config.resolved_state_dir` for `mentions_dir` | | `madblog/webmentions/_storage.py` | Use `config.resolved_state_dir` for sync cache | | `madblog/tags/_index.py` | Use `config.resolved_state_dir` for cache dir | ### Path Resolution Changes **Before:** ```python # activitypub/_mixin.py ap_dir = os.path.join(config.content_dir, "activitypub") # activitypub/_integration.py self.workdir = Path(config.content_dir) / ".madblog" / "activitypub" # webmentions/_mixin.py self.mentions_dir = Path(config.content_dir) / "mentions" # webmentions/_storage.py self._sync_cache_file = self.root_dir / ".madblog" / "webmentions_sync.json" # tags/_index.py self._cache_dir = self._content_dir / ".madblog" / "cache" ``` **After:** ```python # activitypub/_mixin.py ap_dir = config.resolved_state_dir / "activitypub" / "state" # activitypub/_integration.py self.workdir = config.resolved_state_dir / "activitypub" # webmentions/_mixin.py self.mentions_dir = config.resolved_state_dir / "mentions" # webmentions/_storage.py self._sync_cache_file = config.resolved_state_dir / "webmentions_sync.json" # tags/_index.py self._cache_dir = config.resolved_state_dir / "cache" ``` --- ## Manual Migration Script For users who prefer manual control, provide a CLI command: ```bash madblog migrate-state [--dry-run] [--state-dir PATH] ``` Implementation: ```python # In madblog/cli.py def cmd_migrate_state(dry_run: bool = False, state_dir: str | None = None): """Migrate legacy state directories to new layout.""" content_dir = Path(config.content_dir).resolve() target = Path(state_dir).resolve() if state_dir else config.resolved_state_dir legacy = _detect_legacy_layout(content_dir, target) if not legacy: print("No legacy state directories found.") return print(f"Detected legacy state directories:") for name, path in legacy.items(): print(f" {name}: {path}") if dry_run: print("\n[Dry run] Would migrate:") if "activitypub" in legacy: print(f" {legacy['activitypub']} -> {target / 'activitypub' / 'state'}") if "mentions" in legacy: print(f" {legacy['mentions']} -> {target / 'mentions'}") return _migrate_legacy_state(content_dir, target) print("Migration complete.") ``` --- ## Testing ### Unit Tests 1. **Migration detection**: Test `_detect_legacy_layout()` with various directory structures. 2. **Migration execution**: Test `_migrate_legacy_state()` preserves: - All files - Directory structure - File mtimes 3. **No-op on new layout**: Verify migration doesn't run when new layout exists. 4. **Path resolution**: Test all state paths resolve correctly with both default and custom `state_dir`. ### Integration Tests 1. **Fresh install**: Verify new layout is created from scratch. 2. **Upgrade from legacy**: Verify migration runs and state is preserved. 3. **No reprocessing**: Verify startup after migration doesn't trigger mass republishing. --- ## Rollback Plan If migration causes issues: 1. Move directories back manually: ```bash mv .madblog/activitypub/state activitypub mv .madblog/mentions mentions ``` 2. Or restore from backup if available. The migration is non-destructive in that it moves (not copies) directories, so the original data is preserved at the new location. --- ## Timeline - [x] **Phase 1**: Add `state_dir` config option and `resolved_state_dir` property. - [x] **Phase 2**: Update all path references to use `resolved_state_dir`. - [x] **Phase 3**: Implement migration detection and execution. - [x] **Phase 4**: Add tests. - [x] **Phase 5**: Update documentation (`README.md`, `ARCHITECTURE.md`).
feat(config): Add configurable state directory
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
35a4d479c5
- Add state_dir option with resolved_state_dir helper
- Load state_dir from config file and MADBLOG_STATE_DIR env var
refactor: Move integration state to resolved_state_dir
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
551216e974
- ActivityPub, tags cache, and webmentions now store state under
  resolved_state_dir
- Update tests to set config.content_dir so resolved_state_dir points to
  temp dirs
refactor(state): Ensure state dir early and migrate legacy layouts
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
27f5ff199b
- Add state module with legacy migration helpers preserving mtimes
- Call ensure_state_directory from CLI and uWSGI entrypoints
test(state): Add coverage for legacy state migration and state dir resolution
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
55223ef03e
- Cover legacy layout detection for activitypub and mentions
- Validate migration behavior, key placement, and mtime preservation
- Exercise ensure_state_directory and config.resolved_state_dir behavior
docs: Document state_dir usage for ActivityPub and webmentions
All checks were successful
continuous-integration/drone/push Build is passing
continuous-integration/drone/pr Build is passing
a64197f8e6
- Update Docker volume examples to mount /data/.madblog as state
- Add state_dir option and clarify storage paths/default key path
- Update architecture doc to reference <state_dir> and state module
blacklight deleted branch refactor/unique-state-directory 2026-03-11 21:59:59 +01:00
Sign in to join this conversation.
No description provided.