A filesystem-to-S3 sync daemon that doesn't poll

RedRush does one thing: it watches a local directory and uploads changed files to S3-compatible object storage. No polling, no cron, no scheduled sync. Kernel filesystem events (inotify on Linux, kqueue on macOS), so uploads start within milliseconds of a file being written. One binary, zero config files.

Why it exists

The specific pain point: Nextflow, the workflow engine we run for researchers, writes output files to a local scratch directory during pipeline execution. Those files need to end up in object storage (MinIO in our case) once the pipeline finishes, and sometimes before it finishes, so that downstream steps on other nodes can pick them up. The existing approach was a post-run sync job via s5cmd sync. That's fine until a pipeline runs for eight hours and crashes at hour seven, and you've lost all intermediate outputs because the sync never ran.

RedRush runs as a sidecar. It starts before the pipeline, watches the output directory, and uploads files as they appear. If the pipeline crashes, everything written up to that point is already in the bucket.

Filesystem event handling

The Go implementation uses fsnotify, which wraps inotify (Linux) and kqueue (macOS) into a channel-based API:

watcher, _ := fsnotify.NewWatcher()
watcher.Add("/data/output")

for {
    select {
    case event := <-watcher.Events:
        if event.Op&(fsnotify.Create|fsnotify.Write) != 0 {
            debouncer.Submit(event.Name)
        }
    case err := <-watcher.Errors:
        log.Error("watcher error", "err", err)
    }
}

The Events channel delivers one event per filesystem operation. A single file write from a bioinformatics tool can generate multiple events: CREATE when the file descriptor is opened, WRITE as data is flushed (possibly multiple times), and sometimes CHMOD from permission changes. Uploading on every event would mean uploading a half-written 10 GB FASTQ file multiple times.

Debouncing

The debouncer solves this. It's a per-file timer map: when an event arrives for file X, start a timer for X (default 500ms). If another event arrives for X before the timer fires, reset it. When the timer fires, the file is considered stable and gets queued for upload.

type Debouncer struct {
    mu     sync.Mutex
    timers map[string]*time.Timer
    out    chan string
    delay  time.Duration
}

func (d *Debouncer) Submit(path string) {
    d.mu.Lock()
    defer d.mu.Unlock()
    if t, ok := d.timers[path]; ok {
        t.Reset(d.delay)
        return
    }
    d.timers[path] = time.AfterFunc(d.delay, func() {
        d.mu.Lock()
        delete(d.timers, path)
        d.mu.Unlock()
        d.out <- path
    })
}

The 500ms default is tuned for bioinformatics tools that write large files in buffered chunks. For tools that do an fsync after every write (some database engines), you might want a shorter delay. It's configurable via --debounce flag.

Recursive watch registration

inotify does not recursively watch subdirectories. If you add a watch on /data/output, you get events for files in that directory, but not for files in /data/output/sample_A/. You have to explicitly add a watch for every subdirectory.

The naive approach is to walk the directory tree at startup and add a watch for each directory. The problem is the race condition: between the CREATE event for a new directory and the moment you add a watch for it, files can appear inside it. Those files generate no events because the watch didn't exist yet.

The fix is a two-step process on directory creation:

case event := <-watcher.Events:
    if event.Op&fsnotify.Create != 0 {
        info, err := os.Stat(event.Name)
        if err == nil && info.IsDir() {
            watcher.Add(event.Name)
            // Scan for files that arrived in the race window
            filepath.WalkDir(event.Name, func(path string, d fs.DirEntry, err error) error {
                if !d.IsDir() {
                    debouncer.Submit(path)
                }
                return nil
            })
        }
    }

Add the watch first, then scan. Any files that appeared during the race get caught by the scan. Any files that appear after the scan get caught by the watch. The window is technically still there (between os.Stat and watcher.Add), but in practice the two calls are microseconds apart and I haven't been able to trigger a miss.

Upload pool

The upload path uses a worker pool pattern. A configurable number of goroutines (default 4) read from the debouncer's output channel and upload to S3:

for i := 0; i < workers; i++ {
    go func() {
        for path := range uploadCh {
            key := strings.TrimPrefix(path, watchDir)
            key = strings.TrimPrefix(key, "/")
            err := uploadFile(ctx, s3Client, bucket, prefix+key, path)
            if err != nil {
                log.Error("upload failed", "path", path, "err", err)
                retryQueue <- path // exponential backoff retry
            }
        }
    }()
}

The S3 key mirrors the local filesystem path: a file at /data/output/sample_A/aligned.bam becomes s3://{bucket}/{prefix}/sample_A/aligned.bam. The upload uses s3manager.Uploader with configurable part size (default 64MB for large genomics files) and concurrency (default 3 parts in parallel per file). For a 30 GB BAM file, that's about 480 parts uploaded 3 at a time.

The retry queue uses exponential backoff with jitter (initial 1s, max 60s, factor 2, jitter 0.3). Retries are bounded (default 5 attempts). After exhausting retries, the failed path is logged and the daemon continues. I considered a dead-letter mechanism but decided that logging plus the ability to re-run a manual sync covers the failure case well enough.

Kubernetes sidecar deployment

The primary use case is as a Kubernetes sidecar container. A pod has two containers: one runs the pipeline, one runs RedRush watching a shared emptyDir volume.

spec:
  containers:
    - name: pipeline
      image: nextflow/nextflow:24.04
      volumeMounts:
        - name: scratch
          mountPath: /data/output
    - name: redrush
      image: anurag1201/redrush:latest
      args: ["--watch", "/data/output", "--bucket", "results", "--prefix", "runs/$(RUN_ID)"]
      volumeMounts:
        - name: scratch
          mountPath: /data/output
  volumes:
    - name: scratch
      emptyDir: {}

The lifecycle hook is important: when the pipeline container exits, RedRush needs a few seconds to flush any remaining uploads from the debounce queue. The preStop hook sends SIGTERM, which the daemon handles by draining the upload queue (waiting for in-flight uploads to complete) before exiting. The terminationGracePeriodSeconds on the pod should be set high enough to allow the largest pending upload to finish.