Journal Entry, June 23rd to June 30th 2023

June 30, 2023 · 2 min read

Digital Ocean Spaces

Due to an issue with Digital Ocean Spaces in the NYC3 region, we migrated a 16TB bucket to their newest SFO2 region. The primary bottleneck we faced was ListObjectsV2 was exceptionally slow after the bucket filled and the user load began hammering it. We tried numerous troubleshooting steps on the same region and would see some improvement before periodically crashing. It also was not consistent where things would be fine some days but not others, and I suspect load was a contributing factor.

Why mention the backstory? Because I think it's worth highlighting that sometimes the quickest solution in any cloud offering is to move regions. Hindsight was 20/20 on the whole ordeal.

k6

Where does k6 fit into this? My task was to load-test the new data center for peace of mind, knowing we couldn't simulate the type of load our users brought. I could test our droplet that leveraged Spaces, but the rate limits made that more troublesome than necessary.

This issue describes the problem I had, as well as my naivete showing some rookie mistakes. Understanding how to alter the host was crucial as Spaces do not conform to the s3.region.amazonaws.com format of AWS but use region.digitaloceanspaces.com instead.

With that out of the way, I tinkered with reimplementing listBuckets to see if I could isolate the problem. In a stroke of dumb luck and getting Insomnia functional using the insomnia-plugin-aws-iam-v4 plugin for authentication, I stumbled on the fix.

It turns out that instead of signedRequest.body || '' the snippet should be:

const res = http.request(method, signedRequest.url, signedRequest.body || null, {
    headers: signedRequest.headers,
})

Spaces cannot seemingly process a blank body, but the body can be null.

Conclusion

The problem was a fluke we've never seen in 6 years of leveraging Digital Ocean Spaces. It was intermittent to where we would never see it happen until recently when it affected everyone. Region switching wasn't easy, as several processes are slower due to droplets being in NYC3. Most problems did not require this response for some of our most annoying corrections.

I had also given up numerous times on finding a solution with k6 because it's an implementation of JavaScript on a Go runtime, so we can't always throw in random JS packages and hope for the best. I also concluded we didn't need to load test services that we couldn't stress too much.

I'm not a fan of giving up on anything though I know I can't win every battle. I don't technically have a solution here either, as my attempts to parse the XML using the same methods aren't producing the results I expected. I believe this change would possibly be a net positive for S3-compatible stores like Minio, where I would want to stress-test an instance on Fly.io or other droplets.