I Blocked an Airline as DevOps: Critical Lessons from Production Mistakes

I Blocked an Airline as DevOps: Critical Lessons from Production Mistakes

I Blocked an Airline as DevOps: Critical Lessons from Production Mistakes

Jacek Marmuszewski

I Blocked an Airline as DevOps: Critical Lessons from Production Mistakes

Listen to the full episode above or read the article below:

In this conversation with Cloud para Todos, Jacek discusses the real-world applications of DevOps, including balancing security with developer agility, the importance of platform engineering, and how one company reduced audit infrastructure costs from $23,000 to $500/month while decreasing query time from 24 hours to 11 seconds.

No fluff. Just lessons from 10+ years in production.

The moment everything stopped

Early in his career, Jacek executed a script on production. He did it prematurely, hadn't read all the parameters inside.

He accidentally deleted one of the airlines from the world.

"It wasn't deleted - it was a soft delete, so more of a lock," Jacek explains. "But the first thing I noticed was: hey, I just accidentally blocked one of the airlines. They cannot book, they cannot fly."

Thankfully, he had a great team. They didn't yell as much as he expected. They recovered quickly.

The lesson stuck: "Before you execute any script on production, you need to understand what you're doing and what the script is doing. That's something I remember every time I log into any production console."

Security versus agility: the false choice

Companies constantly struggle with this: do we prioritize security or development speed?

Jacek's answer: "I truly believe that security and agility can actually coexist."

The problem is how developers typically think about it. "What developers usually mean is: give me all the admin credentials and I will do it my way. This usually ends up in them spending two weeks building infrastructure that's pretty common in the company. It doesn't play well with teams like security, privacy, compliance."

The solution? Platform engineering.

"Instead of spending two weeks in the happy path, you spend 15 minutes getting exactly the same result."

What platform engineering actually means

"As a developer, what you want to do is run your features on production. You don't care about servers, serverless, databases. What's important is delivering the feature."

Platform engineering builds all the groundwork. Creating VPC networks, runtime environments, file storage, web assets - all the tedious work developers shouldn't worry about.

"From your perspective as a developer, the only thing that matters is: your application should be deployable, there should be health checks so you know it's up and healthy, and it should scale up and down when needed."

Companies like Heroku offer this, but they work for early startups with minimal microservices. Once you're operating hundreds of microservices, you need an internal platform optimized for your specific needs.

Architecture decisions that matter

Good architecture empowers your system to be secure and performant. Bad architecture creates bottlenecks.

Jacek sees this most often in application structure that doesn't scale well - manual processes, difficult resharding, complicated instance additions.

"Having separation of internal APIs and external APIs takes away probably 80% of the pain security teams have. A lot of communication in microservices is inter-services. You don't need to expose it to users. As much as you can put into your VPC, make it in that secure bubble - it's a lot easier for security and compliance."

The cloud cost myth

There's a joke: give someone a million dollars and tell them to spend it in one day, but they can't spend it on AWS.

"If you do it wrong, AWS can cost you a lot of money."

Jacek shares an example: a data engineer working with Redshift wanted row counts from 50-60 databases. Instead of simple select count queries, he used Redshift, compiled all the data, did a distinct select count.

"It counted for three days and cost $6,000 to compute. The result? Three record counts from three tables."

But cloud also lets you spend less - if you use it properly.

"When companies think about cloud migration, they think about it like it's static. In my server room I had thousand servers, now I'll have thousand EC2s. What they don't realize is: in a data center you're planning for your top moment, your peak traffic. In cloud, you can ditch most of the compute whenever you don't need it."

One company runs an invoicing service. For most of the month, they run on two servers. At month end when everyone processes invoices? They scale to 60 servers, then back down to two.

"We're not paying for 60 servers upfront. We're only paying for two servers, and the remaining 60 come in whenever we need them."

The half-petabyte audit challenge

A gaming/gambling client came with half a petabyte of audit data spread across Europe for compliance reasons. He was spending $23,000/month for infrastructure.

Let's Go DevOps went through all the data, classified it, changed how it's stored, moved it to cloud.

New cost? $500 for storage plus $500 for compute to crunch data and prepare audit reports.

Query time dropped from 24 hours to 11 seconds.

"Previously they needed around 24 hours to get a response for auditors. Now they need only 11 seconds for exactly the same data."

The data access opened new paths. Other teams noticed they could now crunch data and answer questions that were previously impossible.

The hardest lesson about people

Five years ago, around the beginning of the pandemic, Jacek learned something critical about building teams.

"Previously I was working in product companies. The team was built and managed by the company hiring us. During the pandemic, a lot of stuff crashed. We had to let go some people, even though it was only a budget issue."

That's when it clicked: "It doesn't have to be this way."

Starting Let's Go DevOps was about putting back together people he really enjoyed working with. People with the same drive, focused on technology, who love the same stuff.

"For many years I was thinking I'm in some environment and I need to adapt. The truth is: I'm in this environment but I need to find people I really enjoy working with. This makes a huge difference in how you look at projects."

Don't think about what you'd change

When asked what he'd do differently in his career, Jacek's answer is surprising.

"I'm not sure I would do anything different. Every lesson, everything I do should be a lesson. Even the failures."

He worked with hardcore mainframe in his first job - knowledge he doesn't use now. He worked at Oracle and got certified in half their technologies - also not using those products anymore.

"But I'm using the concepts."

Even his greatest failure - removing that airline customer from production - gave him essential knowledge and experience.

"Don't think about what you would change. Think about the lessons you learned in the past and make sure you won't make the same mistakes again."

Key takeaways:

  • Read every line of production scripts before executing

  • Security and agility can coexist through platform engineering

  • Cloud cost optimization is about scaling down, not just up

  • Architecture decisions have a massive real-world impact

  • Find people you actually enjoy working with

  • Every failure is a lesson if you pay attention

Want to expand the topic?

Want to expand the topic?

Address:

Let's Go DevOps Sp z o.o.
Zamknięta Str. 10/1.5
30-554 Cracow, Poland

View our profile
desingrush.com

Let’s arrange a free consultation

Just fill out the form below and we will contact you via email to arrange a free call to discuss your project scope and share our insights from similar projects.

© 2024 Let’s Go DevOps. All rights reserved.

Address:

Let's Go DevOps Sp z o.o.
Zamknięta Str. 10/1.5
30-554 Cracow, Poland

View our profile
desingrush.com

Let’s arrange a free
consultation

Just fill out the form below and we will contact you via email to arrange a free call to discuss your project scope and share our insights from similar projects.

© 2024 Let’s Go DevOps. All rights reserved.

Address:

Let's Go DevOps Sp z o.o.
Zamknięta Str. 10/1.5
30-554 Cracow, Poland

View our profile
desingrush.com

Let’s arrange a free consultation

Just fill out the form below and we will contact you via email to arrange a free call to discuss your project scope and share our insights from similar projects.

© 2024 Let’s Go DevOps. All rights reserved.