Amazon’s Simple Storage Service (S3) is one of the most powerful, easily used data storage options around. This isn’t too surprising; love them or hate them, Amazon are an internet superpower these days.
Intentionally designed for simplicity, S3 prefix allows a user to readily benefit from the absolutely titanic resources of Amazon for their own use. It’s no wonder it is one of the most popular data storage options around.
Ideally, Amazon claims quite a bit about S3 performance benchmarks. 55,000 read requests per second, 100–200 milliseconds small object latencies, and more. Even they admit you need to work to hit these sorts of benchmarks though.
So how can one optimize S3 for their needs? That’s what we’re going to discuss today.
Amazon has produced a pretty detailed guide about using S3; use it! Otherwise, you might miss a simple tip or feature that could make things worlds easier.
Besides going into the basics, the section we specifically just linked to also mentions optimization. There is a ton of useful, actionable info in their guide.
You don’t have to read the entire guide all at once but you should skim it for the S3 best practices and optimization tips. It is written to help customers, even if it will clearly be biased toward portraying the service positively.
It won’t tell you everything you may want to know and can be a little dry, but it’s a good starting point.
S3 requires you to rework the way we have been trained to think about files. Amazon has tossed out the traditional directory structure.
Folders and subfolders don’t really exist in the way we usually imagine them. At the same time, you certainly can still organize your files! Organizing files into folders is just a different process.
S3 basically just makes filenames do the work a directory normally would; a file might begin “/Pictures/Charts/Q3/” to indicate it is located in the “Q3” subfolder of your “Charts” subfolder in your picture folder.
It’s a simple change of thinking once you actually get used to it. One of the more confusing aspects is that it makes moving files into new folders a more complicated process. This is why we recommend reading Amazon’s guide about the basics of S3 prefix.
For simplicity’s sake, give folders and subfolders clear names. This will make recalling where particular objects are much easier.
While there was an AWS update so S3 would support a minimum of 3500 PUT/POST/DELETE requests per second to add and 5500 GET requests per second, sometimes you need more.
What’s important to note is this is per prefix. You can easily handle tens of thousands of requests per bucket with enough prefixes because there is no limit to how many prefixes you can have in one bucket.
One common mistake new S3 users make is not using any sort of partitioning at all. S3 performance is better with organized files! It will help with request rates and just make finding what you need easier.
This is such an easy fix that there is really no reason to do it if you’re having problems with your request volume. It will allow you to handle far more requests, all while keeping things better organized.
Something can always be better optimized. But if you’re using S3 for relatively small amounts of data storage, why design as if users and applications will be making huge volumes of requests? Why design for latency if an application won’t be sensitive to it?
Internet luminary and author Hank Green once spoke about how you almost never need something to be perfect. He actually emphasizes trying to get a project to be perfect can be quite wasteful.
“Perfect” is often a huge amount of time investment away from “good enough.” It’s okay to admit something is suiting your needs just fine and to spend your time elsewhere.
Consider your time a valuable resource. Spend it identifying where the issues with your S3 prefix setup actually are and focus on those.
Prioritize fixing issues that will affect customers. Then prioritize issues that affect you and any relevant internal users like staff.
“Issues” that don’t actually affect either usually aren’t important. The work you put into optimization should matter.
This is all ignoring the fact that overdesigning can also be downright expensive done poorly. For example, assuming you need more capacity for requests than you do could wind up costing you a fortune. S3 best practices dictate only designing for what you actually need!
If you notice a big spike in HTTP 503-slow down responses, that can sometimes be indicative of a problem with versioning.
Buckets with versioning enabled can sometimes suddenly have versions numbering in the millions. S3 will then automatically throttle to prevent even worse problems but this can cause slow down.
Amazon’s S3 Inventory Tool is designed to check for issues like this and Amazon also encourages people having such problems to contact their AWS Support team. They will help you make sure everything is working as intended.
One of the more likely issues to cause this is an application that is broken in some way and is just rewriting the same object over and over. This can dramatically stack up how many versions are being created in a very short amount of time. It’s also a detail that’s easy to miss if you’re not looking.
S3 prefix issues can arise if you suddenly start using it in ways you weren’t before. This obviously isn’t unique to S3 but it needs to be stated. Push something hard in a new direction, it might break!
A few years down the line, your company will likely have changed a lot. When you first integrate S3 into your model, you have one set of needs. Later, these needs can change pretty drastically.
Large influx of customers? Have you set up a social media messaging application? Review how S3 is integrating with the rest of your setup!
This advice might be obvious but your business needs to revisit old choices when big changes occur. This can keep things efficient and help prevent bigger stumbles later.
Just remember that needing more out of S3 doesn’t mean you need to go wild. Like we talked about with overdesigning, keep things efficient. This is especially true if you’re considering solutions that are going to cost you more money.
S3 best practices have changed significantly since its inception. One of the biggest changes is that you used to need to randomize object prefixes if you want optimal performance. This was a relatively big hassle and made names harder to process to a human observer.
The same update that established those minimum PUT/POST/DELETE and GET request benchmarks we mentioned changed this. Naming patterns don’t affect performance anymore so you can afford to keep it simple or logical.
This sort of stuff is worth mentioning because companies change how their programs work in what are sometimes fairly major ways. Sometimes for the better and sometimes just in ways that are different enough to render old advice obsolete.
So make sure the advice you’re taking is good! Imagine thinking you need to randomly name objects when it doesn’t even do anything. Old advice can cause chaos and might even cause bigger problems if the service has changed drastically.
Many users of S3 default to integrating it with Amazon Web Services (AWS) but this isn’t necessarily the best option. For example, NETdepot can offer as much as 80 percent savings over AWS if you go with us instead.
AWS can be a comparatively quite expensive service, especially if you’re working with large amounts of data. We at NETdepot are data and infrastructure experts in our own right too; saving money doesn’t always mean losing out.
The writers of Amazon’s marketing materials and guides are obviously incentivized to imply AWS is the way to go with S3. After all, it’s the same company behind both and companies exist to make money.
However, you would probably like to save money too. Our service is generally objectively cheaper while still having everything you need. So don’t get blinded by Amazon’s pushing of their services into making hasty decisions.
Optimization is rarely fun; it can be frustrating for something to do what you want…just slowly. We hope our advice has helped you boost your S3 performance without the usual hassle!
Amazon’s S3 is a powerful data storage tool when used correctly. None of the tips above take too long to implement either. There’s really no reason S3 has to be a bottleneck in your operations.
S3 gives you the scope to move data faster, assess data upfront, understand access, encryption and compliance, save money and use local environments for testing or production alternatives. If another option offers a more holistic solution or you plan to store your data outside of the AWS there are alternatives available.
Have an interest in a fantastic AWS alternative? Or maybe you’re worried about data backups and disaster recovery? We’ve got all that and more at NETdepot.