Managing S3 Lifecycle Rules with s3cmd on DigitalOcean Spaces

Apr 08, 20263 min read

Object storage tends to grow quietly. Build artifacts, logs, exports, backups, temporary files - a few months later your bucket holds millions of objects and the bill starts reflecting it. I've seen this happen on nearly every project that uses object storage for any length of time. Someone uploads CI artifacts or analytics exports, nobody thinks about cleanup, and six months later you're paying real money for data nobody will ever look at again.

Lifecycle rules solve this at the storage layer. Instead of running cleanup scripts or cron jobs, you define a retention policy once and let the storage system enforce it. The bucket handles deletion based on object age - no application changes, no scheduled jobs, no custom tooling to maintain.

If your policy boils down to “delete objects older than X days” lifecycle rules are the right tool. This applies to CI artifacts, temporary exports, processed data snapshots, old backups, staging uploads - basically anything with a natural expiration.

The examples here use DigitalOcean Spaces, which is S3-compatible.

Setting up `s3cmd`

Install it:

pip install s3cmd

Then configure it for DigitalOcean Spaces:

s3cmd --configure

The key configuration values you'll need:

host_base = sfo2.digitaloceanspaces.com
host_bucket = %(bucket)s.sfo2.digitaloceanspaces.com
signature_v2 =False

Replace sfo2 with your region.

Creating a Lifecycle Rule

One important detail that tripped me up the first time: lifecycle rules apply to the bucket, not to a path. You pass only the bucket name. Prefix filtering is handled with a separate parameter.

To expire all objects in a bucket after 30 days:

s3cmd expire \
  --expiry-days=30 \
  s3://my-bucket

This deletes objects 30 days after their LastModified date.

If you want to target a specific prefix - say, only cleaning up artifacts/ - you pass the prefix separately:

s3cmd expire \
  --expiry-days=30 \
  --expiry-prefix=artifacts/ \
  s3://my-bucket

Note that the prefix doesn't go in the S3 URI. The bucket is s3://my-bucket and the prefix is its own flag. If you stick the folder in the URI, it won't work the way you expect.

Verifying that it worked

Always verify after setting a rule, especially in production. To confirm the rule exists:

s3cmd getlifecycle s3://my-bucket

You can also inspect bucket info more broadly:

s3cmd info s3://my-bucket

And if you need to remove a lifecycle configuration:

s3cmd dellifecycle s3://my-bucket

What happens to objects that already exist

Lifecycle expiration is based on LastModified, so existing objects are included. If you create a rule that expires objects after 30 days and your bucket has objects uploaded 45 days ago, those objects qualify for deletion immediately.

They won't vanish the moment you apply the rule, though. Lifecycle processing runs asynchronously inside the storage system - typically once a day. Expect 24 to 48 hours before deletion actually occurs, and potentially longer for very large buckets.

If nothing seems to happen after a few days, check the prefix, double-check the lifecycle configuration, and verify whether versioning is enabled on the bucket. Versioning complicates things: expiration creates a delete marker rather than actually removing the object, and older versions may stick around. If you're using versioning, you may need a separate rule for noncurrent versions.

Why this beats a Python cleanup script

It's tempting to write a script that lists objects, compares LastModified, and deletes anything older than a threshold. I've written that script more than once. It works, but it brings a whole set of operational baggage with it. You need a scheduler to run it, monitoring to know when it fails, pagination logic for large buckets, and throttling to avoid hitting API rate limits. Credentials expire, the script fails silently at 3 AM, and nobody notices until the bucket is bloated again.

Storage-native lifecycle rules avoid all of that. They scale automatically, run inside the provider's infrastructure, don't require compute resources on your end, and have no moving parts you need to maintain. For straightforward age-based retention, they're the obvious choice.

When you still need a script

Lifecycle rules handle time-based deletion well, but that's about the extent of their flexibility. If you need to delete based on object size, filter by metadata or tags, generate a report before deleting, do a dry run, apply per-tenant retention logic, or make deletion conditional on some state in your application database - that's script territory.

A concrete example: delete artifacts older than 30 days, but only if the associated CI job completed successfully. That requires your application to make the call, not a storage-layer rule. Same goes for one-off migration cleanup where the rules are temporary and don't fit neatly into a permanent lifecycle policy.

A few things to watch in production

Keep lifecycle rules simple and verify them after applying. I'd recommend monitoring object count periodically - not just to confirm lifecycle is working, but to catch cases where new objects are accumulating faster than expected. Be especially careful with versioned buckets, since lifecycle behavior is less intuitive there. For CI artifacts, logs, temporary exports, and analytics scratch data, lifecycle rules are usually all you need. Define the policy once and move on to something more interesting.