Goal setting and the year in review

So it’s officially 2020, and a new year brings with it all kinds of things. Retrospective, hope, dreams, and a variety of other feelings. I’m not a big party-er and have never been a massive fan of New Years Eve, but I do have to say in recent years, I have really come to appreciate two elements of new years as an important time of year for me. The first being retrospection, its a chance to look back at the year and be honest with ourselves about how things have gone. A chance to look at what worked, and what didn’t and have an honest conversation with yourself.

The second part I’ve come to enjoy is planning for the new year, sitting down and looking at my life and finding new ways to grow as a person, and improve things for the better. There’s something very empowering about sitting down and seeing a wealth of possibilities and excitement about the future prospects and opportunities that are ahead.

Now for most people, this is where the most dreaded word comes up, and its RESOLUTIONS. We’ve all heard it, and probably had it happen to us. The grand self-lie that is a resolution. Believe me over the years I’ve left a path of broken resolutions behind me, and as those who read this blog regularly know. I tend to read a lot on the subject of success, goals, and similar topics. I don’t claim to have an answer here, and over the past few years have come to the conclusion that everyone’s mileage on any option for trying to grow will vary.

Now I want to be clear about one thing here, I’m going to use the ever present weight loss example, I consider myself overweight, it is something I have struggled with I do not have the answer, and am not cla,am “throwing shade” on people who use these systems and find success. My experience only.

But what I can do, is call out some of the things I’ve tried, and how they worked out, and tell you what I’ve been finding lately:

Setting SMART Goals:

We’ve all heard this one right, making sure that your goals are “SMART”, they practically drill this into us in grade school, the only “good goals” are SMART goals. And what does SMART mean:

  • Simple
  • Measurable
  • Attainable
  • Reasonable
  • Time Bound

Now, the idea behind this is a good one, the idea behind this approach is make sure your setting goals that can be reached, and that you can verify that you have hit milestones on the path. Believe me, I do love the mantra “What gets measured, matters”, and this is based around it. It’s also though built around the satisfaction of achieving your goals. If you set something that’s measurable and attainable, then you feel pretty great when you hit that goal.

Let’s talk about an example, so an example of a “bad” goal in this model would be, “I’m going to lose weight” to steal the oldest resolution in the book. Now why is this a bad goal, because its not defined, its not something that I can measure (in a meaningful way). So a better goal would be “I’m going to lose 10 lbs, by June.” I can measure it, it has a deadline, its not outlandish by any means. Should be great right.

For a lot of people, this is a great system, and it helps them, but for me, it caused a lot more damage than it helped. The reason being is that a human being can tolerate anything for a time boxed amount of time. Look at people who have survived unimaginable conditions and then are able to return to their lives. But the problem for me, is that by doing this with the new year you aren’t doing anything to make a permanent change in your life.

Let’s go back to our weight loss example, as I’ve got to be honest, this isn’t hypothetical, its what really happened to me (more than once). You set this goal and in January you go after it…I had a coworker once who used to say “Let’s seize the day with vigor and determination never before sen by mankind.” And we’ve all been there right, we all hit the gym, get up early, and go after it.

And then a couple of outcomes happen:And then a couple of outcomes happen:

You start doing great, and by end of january you are down 5 lbs. Feeling amazing and saying “I got this”, at which point you end of convincing yourself “I can slow down, I don’t need to work as hard” and it all falls apart. And before you know it time flies and it’s June, you look at the number and say “I’m a failure”.

You start doing great, and middle of February, you hit your goal of 10 lbs down, you’re proud of yourself, and smart goals works. You move onto other things, and before you know it you fall into bad habits, and June hits and the scale looks pretty familiar, you look at the number and say “I’m a failure”

You stay on track, do what you set out to do, get to june and are down 10 lbs. You feel great, smart goals worked. You have a fun summer and end up back where you started, or god forbid worse off, look at yourself and say “I’m a failure”.

And now your probably saying “For loving the positive elements of new years, this is pretty damn depressing. And I’m not trying to be a debby downer. But this is my experience and as I said above, part of this process is honestly and retrospective. This has been my honest experience.

This is my problem here, SMART goals are built to be very short term focused to get a “job” done, but when it comes to personal growth, the job is never “done”, so the approach is fundamentally flawed. And at the end of the process those words / feelings of “I’m a failure” have a damaging and demoralizing effect that is completely counter productive.

At the end of the day, growth is a journey. And if you continue down this road and you miss your goal you are left with nothing, and feeling like you failed with nothing to show for the effort. I believe there is an old adage about eggs in a single basket for this.

10x Goals:

This is one that got a lot of attention, I’ve read the book the 10x Rule, and I have to say it is insightful,and I found it to be very interesting. For those not familiar the idea is this, take the idea of SMART goals and turn it around a bit. Keep the same ideas of goals being measurable and time boxed but instead of making them attainable, you make them 10x what the attainable goal is.

So take our weight loss example,, instead of saying “I’m going to lose 10 lbs by june” I would say “I’m going to lose 50 lbs by june”. Now before anyone jumps on me, I can do math. The idea is what could you do if you put in 10x the effort. So the idea then is if I put in the work and try to lose 50 lbs by June, one off two outcomes occur:

  • I lose 50 lbs and cheer my success.
  • I lose 30 lbs and I’m still better off than the 10lb goal.

In my experience though the problem is still the same. I haven’t changed behaviors or grown at all, I’ve hit a very finite and fixed in time goal, but the success won’t last. And at the end you still feel like a failure. And now you feel like a bigger one, because not only did you miss the 10x goal,but likely the 1x goal too.

Finite Systems / Infinite Problem:

The crux of the problem I have with the above problems is that they are systems built around finite objectives, being applied to an infinite problem. I don’t want to lose weight, I want to be healthier, I don’t want to learn one thing, but build a foundation for learning. And at the end of the day, we are trying to fit a square peg into a round hole. Personal growth isn’t something that can be timeboxed like that.

Simon Sinek covers this in his book, the “Infinite Game”, which I admit I am still reading now, but here’s a video that gives some of the highlighting principles.

The other problem I have is that in my experience this creates a lot of stress and pressure on yourself, and those words “I’m a failure” whether you say them aloud or not are devastating. If you become too fixated on goals, they can start to feel like a drug high. And I’m speaking from experience here, they become this thing where your life becomes about setting goals, pushing too hard, getting them and that feeling of euphoria, and then its on to the next one.

I was 100% in that boat, for better or worse, and don’t get me wrong I’m proud of any accomplishments I’ve made, but it really does take a toll on you mentally. While it can be satisfying to reach those goals, it isn’t always fulfilling. And if you find yourself questioning where to go next, that can be crippling in a lot of ways.

And now I’ve done it again, we are at the “Kevin, still depressing. Goals are meaningless, growth is meaningless, life is pain…”

Not quite, I’ve been doing a lot of reading and researching and had lots of discussions with people a lot wiser than me, and I’ve found something that in my opinion seems to be working better.

The final problem I have with these systems, is they make one basic assumption, and that is that pursuit of these goals exists in a vacuum. And what I mean by that is take our weightloss example, we say “I’m going to lose 10 lbs by march”, but then I get hurt, need surgery and spend 6 weeks in a cast, and then physical therapy. I know that the goal became unattainable, but I still feel like I failed.

Now again, not just weight loss, but let’s say I said “I’m going to put my phone away after dinner to spend more time with my family”. And then I get a smart watch which lets me check email without my phone, or I work with customers all over the world that have to call at off hours, then I feel like a failure due to circumstances outside of my control.

Goals vs Values:

Now I can’t take credit for this, there is a psychological principle called value based living, and the idea being this. Here’s a video that does a way better job than I ever could at summarizing it.

So looking at the above, if we get away from these ideas of goals, and look more at what we as a person value. That is what drives us, and that is what matters. And as long as the actions we take align with those values, the journey is part of the reward. If you watched the video above with Simon Sinek, this probably sounds familiar, and that should be no surprise. There is a direct through line between his concepts of actions being driven by values and value based living.

So the next question is how does this work any differently? How do I grow and push myself without goals? Is this just symantics at the end of the day. I don’t think so, but let me talk about what this journey has been like for me, and you can judge.

Step 1 : Change your definition:

One thing that my wife and I are really trying to embrace is a family mission statement, and we are in the process of writing that now. When we are done I will probably do a blog post on that too. But along with that, we as a family have focused our energy and decisions about what we do around this motto for lack of a better term.

There are only two outcomes to any action, success or you learn something.

That’s it, not ground breaking, and truth be told we stole it from the movie Meet the Robinson’s, which has a similar sentiment, “From failure you learn, success not so much”. But if you stop and think about that statement, its rather profound, if you take away failure as an outcome. Some would say you take away accountability, but I would say you take away blockers. If you can’t fail, then what is stopping you from trying?

Thomas Edison had a similar statement, when asked about the 1000 failed attempts to make a light bulb, he said “I didn’t fail, I just found 1000 ways not to do it.”

At its core this is very freeing, and we need to say we can grow and push the limits because there is no outcome that we shouldn’t feel positive about, because the journey will yield learnings, and those learnings will help us to improve for the future.

Step 2 : Define your values:

This one took a lot of soul searching for me. You need to take a step back and identify what above all else matters to you. What ideals and values do you aspire to above all else. And that’s not an easy question, and should not be taken lightly. I find that making these values something that need to be quantified in a single word helped a lot.

My values are the following:

  • Family
  • Learning
  • Impact
  • Innovation
  • Creativity

And what I mean by these, is that my guiding principles in my life, at this time are these items. When I am long gone, I want my kids to know that above all else family mattered. I want them to see that I had a love of learning. That I focused on having an impact around me whether it be my career or community. I want them to see me as someone who was innovative and creative.

These values together really some up at this stage of my life, the legacy I want to leave behind.

Step 3 : Values Define Action:

One common thread you will see in anything and everything is the idea that we as people have limited resources. Whether those be willpower, physical, financial, energy, attention, or the all mighty time. We can only put our resources into some much, and we can’t do it all. Greg McKoewn has a great book on this called “Essentialism”, which I really believe is a great book about applying your resources.

To that end, if we have values that are important to us, and we have limited resources. it isn’t a big logical leap to say that we should focus on putting our energy behind the actions that align with our values.

Not really rocket science, although it took me a while to get here if I’m being honest.

Now what I’ve found from doing this in practice in recent months is that I have seen my stress level go down, and my commitment to any actions I’ve taken go up. And results have been greater too. And at the end of the day I believe its easier to be committed to an action if it aligns to something you care deeply about.

Let me go back to our example, as mentioned above I want to get healthier, and I’d tried smart goals, 10x goals, etc. I tried a keto diet, joining a gym, nothing seemed to stick. And even when they did I could never cross the 15 lbs mark. And it was devastating to me. I have had to actively sit on the side lines at both work and family functions because of my body weight.

This all came to a head for me, when I took my son to Hershey, and all he wanted to do was ride a roller coaster, and he’s much too small to ride the big coasters, but we saw a roller coaster called the “coco cruiser” (a little kid roller coaster), and he wanted to ride it. We got in line, and when we got to the front, he was too short to ride by himself, and I couldn’t fit into the coaster. He and I stood on the platform, while his friends rode, and then he rode with one of their mom’s. Having to explain to your son that he can’t have what he wants because of your body weight is one of the lowest points in my life. I wanted to curl up and die.

I still could never get past that 15 lbs mark, and life would get in the way. I took a step back, and said…forget the numbers. I want to get healthy because it will let me be more to my family. I found a cross fit gym that I really like, with great people and a great coach. I try to go as much as I can, unfortunately recently being sick sidelined. But just out of curiosity I got on the scale today, I’m down 25 lbs from that horrible day. I feel better and have better energy, and even though I fell off the wagon and am going back when travel slows down. I feel like a success and look back on all the victories and fulfillment I feel with a positive attitude.

The attention here being on the action, not the outcome. Its having a lasting impact as it leads to behavioral change.

But let’s not make this all about weight, even if that is an easy example. Take my professional life, I decided to focus more on impact and now measuring all my actions by impact they have. This has led to greater results in my office with me feeling better about the work I’ve done, and if you look at the metrics much greater returns. My stress level has gone down, and I’ve stopped measuring myself against the impact and activities of my colleagues.

Final Thoughts:

I know this has been a much longer blog post than normal, but thanks for sticking with me through this. The end result of which is this, I’m not going to be setting any resolutions this year. My new plan is to reaffirm and re-evaluate my values, and then make sure that I devote my energy and resources to actions that align. This will allow me the flexability to enjoy life, while still finding new ways to grow.

This is something my wife and i both feel strongly about and are working with our kids to internalize and I hope it at least sparks some thought for you about where you are and where you want to go.

Weekly Links – 1/13/2020

Well its officially 2020, and this year wasted absolutely no time getting moving. My kids activities started ramping up, and life moved at a sprint speed out of the gate.

See the source image

But down to the business..

Cloud:

Fun Stuff:

So for this week, its nothing in nerd pop culture, but a fascinating video I saw. This was recommended by a friend, and it sparked a lot of research for me around personal growth (blog post coming soon). But the video is “What does game theory teach us about war?” and there are a lot of parallels that you can draw to how your own life functions in terms of finite vs infinite games.

Return of weekly links – 12/30

Hello all, so I know that I totally fell off the wagon when it comes to weekly links. December has been a crazy month, I had three business trips, getting ready for Christmas. Due to family coming to town we effectively had 3 Christmas’ and on top of that I got sick.

I’m not complaining just explaining what happened. The holidays are a rough time for a lot of people, and we’ve all had experiences we have to carry with us. So remember in all the craziness that some might be suffering in silence. If you find yourself in this position, please reach out. You are not alone and there are people to help you.

Down to the business…

Cool Stuff:

So as always I have a post here about something fun. And this week I wanted to post about how my wife blew me away with her gift this Christmas. We had our 11th anniversary at Dave and Busters, we decided to double down on stupid fun. When we walked in, they were having a silent auction for MakeAWish and when I was in the bathroom she bid on this item and won.

This is a picture from the Dark Knight, my favorite movie of all time, signed by Christian Bale and the late Heath Ledger.

Something different… Things I learned from Hallmark Christmas Movies.

The holidays are upon us and as something fun this week. I thought I would share a running joke I have with my wife.

My wife is obsessed with Hallmark Channel Christmas movies, so I’ve seen a lot of them, and below is what I’ve learned.

  1. All corporations are evil
  2. All mom and pop businesses are in danger of being shut down by “the bank” and the bank is open and acts on those foreclosures on Christmas day.
  3. If you don’t like each other when you meet, you will be madly in love in one weeks time.
  4. If you don’t know the meaning of Christmas it will be shown to you by an attractive member of the opposite sex.
  5. Automation is the end of all good things.
  6. Every event coordinator looks at a group and says “Everyone got the plan” and they immediate execute perfect.  And nothing will ever go wrong and no one had questions.
  7. If you are trying to decorate for christmas, the whole town will show up, unprompted to help.
  8. If you are visiting your home town after a long absence you will immediate leave your whole life behind and never leave.
  9. The only people who can find true love are Lacey Chabert and Candace Cameron Burree.
  10. If you have a life without a family, you will immediately realize your career is not what you want and throw it away at christmas.
  11. Small towns all hold a winter festival that is the center of everyone’s world in December, and they have Unlimited budgets
  12. Not owning a Balsalm hill tree is a federal crime.  Punishable by death.
  13. Every kid is wise beyond their years and know the true meaning of love and christmas.
  14. The most desirable positions in the world are: Designer, event planner, lodge owner and writer.
  15. If you dont like Christmas you will be the poster child for Christmas spirit within 1 week, guarenteed.
  16. There is always a long lost document, or legal loop that will solve all problems and it can only be found the night before foreclosure.
  17. If you are a prince / princess you will be told that you can only marry royalty.  This is false, royalty only marry american commoners.
  18. No one from a foreign kingdom has an accent or speak any language other than english.
  19. Every small European monarchy has a royal family that their marriage is the media focus at christmas.
  20. DESPITE the above two rules, no american has heard of any monarchy, royal family, or can identify the royal family on sight.
  21. Every castle was designed by a single architect and built to spec and all look identical.
  22. If you try to leave your small town, guarenteed all airports will shut down due to snow.
  23. The only drinks anyone drinks are coffee, hot coco, and wine.
  24. Everyone loves the christmas spirit, and if you dont you will convert or never be seen again.
  25. Everyone is originally from a small town, no one of any importance is born in a city.

RT? – Making Sense of High Availability

Hello all, in keeping with the last post on the blog, I started doing some posts around High Availability, so ultimately the focus here is how do I architect my solution to ensure that is meets the availability demands of my customers.

See the source image

So odds are if you’ve started down this direction, you’ve heard 3 acronyms:

  • SLA – Service Level Agreement
  • RTO – Recovery Time Objective
  • RPO – Recovery Point Objective

So what do each of these items mean, and how do they relate to your solution. For SLA, I covered this pretty extensively in my previous post. So I would direct you there for a definition and then recommendations around how to approach that topic.

So the next question is really what are RTO and RPO? And how do they relate to High availability?

What is RTO?

RTO stands for Recovery Time Objective, and basically, in software terms, this refers to when something happens, how fast do you recover?

So let’s take an example because I work best with examples. So if I have a solution that is deployed in multiple regions, and my solution uses Traffic Manager and has replication of the solution into another region. If the Traffic manager is checking the endpoint every 5 seconds, and 3 failures cause a failover…that means my RTO is 15 seconds.

By using a dual region deployment, I’m able to keep my RTO relatively low. Now the above example is pretty simplistic. But really we should do this analysis per service in our architecture, to determine how long our failover takes, and then the longest of that is your solutions RTO.

How do we improve RTO?

Now, remember that this is really a measure of continuity of business, so really looking at High Availability and Disaster Recovery. So ultimately we are talking about service uptime more than anything else.

So the best way to improve RTO is to enable the replication and take steps to increase the speed of recovery. So if you look at the last discussion of SLA, we took steps to minimize downtime by increasing SLA. This conversation will be about how do we minimize the downtime caused by those failovers.

The most important things involved in this are the following:

  • Monitoring
  • Response time
  • Data Replication
  • Failover

So the key metric to pay attention to is how long it takes to get up and running.

Monitoring is the cornerstone of your RTO target. If you don’t know there is a problem, you can’t find it. Many blogs and articles will focus on the next 3 parts, but let’s be honest, if you don’t know there’s a problem, you can’t respond. If your logs operate on a 5-minute delay, then you need to factor in the 5 minutes into your RTO.

From there the next piece is response time. And I mean this in the true sense of how quickly can you trigger a failover to your DR state. How quickly can you triage the problem and respond to the situation? The best RTO targets leverage as much automation as possible here.

Next, by looking at data replication, we can ensure that we are able to bring back up any data stores quickly and maintain continuity of business. This is important because every time we have to restore a data store, that takes time and pulls out our RTO. If you can failover in 2 minutes it doesn’t do you much good if it takes 20 minutes to get the database up.

Finally, failover. If you are in a state where you need to failover, how long does that take and what automation and steps can you take to shorten that time significantly.

Let’s give an example if I have a solution that is the following in one region:

  • Azure App Service
  • Azure SQL

If I’m deployed in a single environment, and my DR plan is to standup another region in the event of a disaster. Now that solution has a pretty high RTO, if it takes 15 minutes to standup that environment and deploy it, then the RTO is 15 minutes. If I wanted to lower that, there a couple of things I can and those would be:

  • I can increase the automation I use to reduce that time.
  • I can do is spin up another region, or leverage options to do replication.
  • I can set up automation around detection and response.

What is RPO?

RPO stands for Recovery Point Objective, which really focuses on the idea of improving the ability to recover from a data perspective. So if you have a disaster, how much data would be lost? What would the impact be?

When looking at RPO, the key comes to data and potential data loss. So how do we minimize the window for data loss and lower the chances of lost transactions in your application?

There are a few key elements that can assist with this, looking at how your application handles eventual consistency. It is possible to get to an RPO of 0, as you have constant data replication in your solution.

Now the most important part of the replication is that the replication needs to be executed in a synchronous fashion, meaning that it must write and replicate the data before sending an acknowledgment. This means that eventual consistency will keep your RPO higher than zero because it means that the replication will “eventually” get there.

How do we improve RPO?

The most important factor here is replication and data consistency. So we really need to make sure that the strength of transactions is maintained about that consistency rules are enforced. This is why data stores like Cosmos gain popularity in terms of requirements for zero RPO and low RTO because it supports models where they can enforce this type of logic.

https://mathequality.files.wordpress.com/2014/01/math-meme-math-test-easy-or-wrong.png

Needless to say, this all comes down to operations and math and ultimately the requirements of your solution and balancing that against cost and impact. You really want to make sure you only take this to the level you need to as it can add a lot of cost and substantially raise the complexity of your solution.

Weekly Links – 11/25

Hello All, So this week marks 2 cool things in my mind. First is that we are 30 days to Christmas Day, which is awesome. I know this is the holiday season for a lot of different religions.

So I wish you and you’re a great holiday season. I am a big fan of the time of year because… it’s crazy, it’s busy, and its a great time filled with great moments. So please make sure to take a moment and enjoy it. This is a big-time for my family, as my wife and I were married 11 years ago this month.

The other big thing this week is a moment I’ve been waiting for since my kids were born. Last night I introduced my kids to Star Wars, watching the first movie together and my kids loved it. The Best revelations being the following

My daughter had the following revelations:

  • Princess Leia is awesome, she doesn’t need anyone to save her and she fights back.
  • The Millenium Falcon is an ugly ship.

My son had the following revelations:

  • Darth Vader is the best thing that ever happened.
  • Storm Troopers should have lightsabers.
  • Chewbacca is the best pilot ever.
See the source image

So enough of all that, and down to business:

Development:

Cloud:

Audio / Video:

Fun Stuff:

Ok, I sort of did this at the top, but what the hell, let’s do another one. I’m a comic fan, we all know this. And I’ve enjoyed the CW Arrowverse, and the big event, Crisis on Infinite Earths is going to be premier Dec 8. I’m super excited about this. There are going to be a lot of interesting things including bringing back fan favorites and a lot of interesting events.

Weekly Links – 11/18

So things have been a little crazy this month, but overall they are finally getting to a place where I can manage. I didn’t say they were slowing down, just that things were manageable.

Down to business…

Development:

Cloud:

Audio / Video:

Fun Stuff:

So Disney+ launched, and I think its a toss-up as to who was more excited, me or my kids. But along with it came “The Mandalorian”, which mixes the western / crime genre with star wars. What’s not to love. Going to watch the mandalorian tonight. So excited!

Keeping the lights on! – Architecting for availability?

Hello all, It’s been a while since I did a blog post outside of the weekly updates. But I wanted to do one in terms of conversations that I’ve been having a lot lately and seems to be largely universal. High Availability. So more and more, software is becoming a critical part of every aspect of our lives. To that end, we really see as developers / engineers, the following scenarios have become a constant reality:

  • For end customer software, not having access for an extend timeframe to an app or service can be the final nail in the coffin for a lot of users. Their tolerance for down time continues to drop. If you don’t believe me, research the metrics around how long someone will wait for a video to load before leaving according to YouTube.
  • For enterprises, organizations are becoming more and more reliant on software to function at the most basic level, meaning that outages or downtime windows have an even greater impact on their business, causing more parts of the organization to have to function at a diminished capacity or not at all during an outage.

The end result of these perceptions / realities is that the demands put on software solutions for maintaining availability are going higher and higher. And it becomes important to architect and plan for high availability to start with, as if you don’t it can be very expensive and difficult to retro-fit your applications to meet these demands.

This is a huge topic, and one that I’m not going to be able to cover in one blog post, but I’m hoping that we can identify ways to help if you are being tasked with meeting these demands.

Defining SLA

See the source image

So the first part of this conversation, always in my experience starts the same, “What’s our SLA?”, so let’s talk through what an SLA is? SLA stands for Service Level Agreement, and this is a legal agreement of what level of service you are required to provide.

Now the key part of that, is a “legal agreement”, this is not strictly a software function or engineering concept, but a business agreement in the sense that if an SLA is not met, there is a financial obligation from the organization to compensate the customer (in an enterprise setting).

Be Reasonable…

See the source image
Let’s not get crazy!

So the most common mistake I hear when someone starts down this road is “we need 100% SLA”, which is a bad place to start this process. Realistically this is almost impossible, the idea that you will never have an outage is extreme. And to get this level of resiliency you can expect to pay for it, and its easy to get upside down on your costs by starting out here. And really mean need to be realistic about the ask here.

Let’s walkthrough an example, let’s say you have a software the provides grant processing for a municipality, and that grant reviews are done monday to friday during business hours (8-6pm). If your customer says “We need a 100% SLA”, I would make the counter argument of “Do you really?” If the system is down from 1-2am on a saturday, does that really affect you and the nature of the business? Or is this just a matter of needing the solution to be up during those core business operating hours?

Conversely let’s go the other way, and say that you are providing a solution that provides emergency service communication in terms of a natural disaster? Would your customer be ok with a 5-minute downtime at 2am in the middle of a hurricane? Probably not. So tolerance should be measured in terms of actual impact to the end user and ability to function.

High Availability is like insurance, I can get add-ons to my policy for everything that could ever happen, but that means that I will likely be paying for things I don’t need. I can get volcano insurance in Pennsylvania, but the odds of needing it are so low to make it ridiculous.

So what we should be doing is finding a happy balance between what we can realistically do, and do by following recommended processes, and way the business calculation, and cost.

Let me give you a high level example, let’s say I deploy my production environment to one region, and I’ve calculated that the composite SLA (more on this later) to be 99.9% for one region. That means that right now I am telling my customers that I am expecting about 43.2 minutes of downtime a month.

But if I stood up a secondary region, and built out a lot of automation around failover and monitoring (lets say 80 hours of work), I could raise that SLA from 99.9% to 99.99% which would mean a downtime of 4.32 minutes.

Now what I need to weigh is the following:

  • 80 hours worth of labor costs
  • opportunity cost of not using that labor resource on new features
  • doubling my environment costs (2 active regions)
  • Potential advantage by supporting a higher SLA.

And I look at that and say, I’m saving 38.88 minutes of downtime in the process. So the question is, does that help my business and make sense from a financial position, or am I “ok” taking a financial hit and having only 1 environment up, and paying out if we are down for more than the 99.99% and rolling the dice on that.

I can’t say in the above discussion what the right answer is, because ultimately it depends on the type of business and resiliency of the application. You might be comfortable with that, you might not.

My point is that at the end of the day this is both an engineering problem and a business problem, and likely the right answer is somewhere in the middle.

Now to be clear, other times, especially in enterprise software, the customer may require a certain SLA, and at that point you might have to show that you meet that SLA by having specific redundancies in place. I’ll talk about this more in our next section.

Calculating a composite SLA

See the source image

Another common area of question, is “How do I calculate the SLA of my service?” And this is more straight forward than people realize. Let’s take the following example:

Note: You can find all of azure’s SLAs here.

ServiceSLA
App Service99.95%
Azure SQL99.99%

So based on the table above, the composite SLA would be:

.9995 * .9999 = .9994 = 99.94%

So that would imply that your cloud provider is standing behind these service to have downtime of :

730 (Hours per month) * (1 – .9994) = 26.28 minutes

Now the above is an estimate, but it would be around that time that we could expect to be our monthly downtime. This calculation doesn’t change the more services you add.

Now its important to note, this is the platform SLA, not your SLA. And I say that because at the end of the day, this is assuming that your application doesn’t have issues that cause downtime, so that should be considered as well.

How do we improve our SLA, start with “what is down?”

See the source image

Now for many cloud services, Microsoft and every other cloud provider gives recommendations to enhance resiliency and improve your SLA. One way to do that is to leverage items like Availability Zones and multi-region deployments. This allows you to spread out your application across multi-regions and it makes the probability of an outage drop substantially.

Really the first step here is to do a failure mode analysis, and determination of critical functionality. And what I mean by that is we need to define what constitutes the system being “Down”. So let’s take for instance you have an eCommerce platform, something like NopCommerce, and you have the following use-cases:

  1. Browse the catalog
  2. Add items to shopping cart
  3. Purchase items
  4. Publish blogs
  5. Send out notifications of deals / sales
  6. Process Orders

Now based on the above, we could identify 1,2,3, and 5 as mission critical, if we can’t allow our customers to shop, buy, and receive their products, that means that we are out of business. If we can’t publish a blog when we want to, or if a sale notice goes out a little late, its not ideal, but its not the end of the world. And let’s say that we have azure functions sending the notifications, and the blogs and promotions are managed by Cosmos DB.

So now based on that, we need to examine our architecture and identify what components are required to maintain the 4 key uses cases we identified. Notice I left off the elements that are not part of our key functionality for our SLA.

Let’s say we have the proposed architecture:

Now based on the above, I can calculate our primary region SLA to be:

ServiceSLA
Application Gateway99.95%
App Service99.95%
Azure SQL99.99%
Total SLA99.89%

So as a result of the above, we need to examine what elements of our solution are critical to the meeting our uptime SLA, and then doing a failure analysis. So based on the above use-cases, we can assume that the Traffic Manager, Application Gateway, App Service, and Azure SQL are essential to our meeting of our SLA. For the sake of this example, let’s say that the caching layer meets with industry recommendations and is used only for speed of access, if not available the application will just reach out to the database.

So how do we calculate the compound SLA for the two regions, we do that with the following math:

We basically have to figure out the probability of both regions being offline, so if we take the region “unavailability” of .12% and multiply it by one another:

0.12% * 0.12% = 0.0121%

Convert it back to availability:

100 % – 0.0121% = 99.99%

Now we take that multiplied by traffic manager SLA:

.9999 * .9999 = 99.99%

Failure Mode Analysis:

See the source image

A failure analysis means that we pick apart each element of the infrastructure and identify the following:

  • What potential failures could occur?
  • What are the different “modes” or “states” can this component be in?
  • How likely is a failure of this component?
  • What is the impact of each failure “mode” or “state” on the application?

After examining the above, you need to look at each of the “modes” or “states” and identify the following:

  • How you will respond and recover?
  • How you will monitor for this situation, before, during, and after?

So let’s take an example, because to me that always helps. If we examine the above solution, and say Azure SQL Database. If I were to do a failure mode analysis, I would find the following:

  • The database is offline in the following situations:
    • The database can be offline due to a platform issue
    • the database is shutdown
    • the database is deleted
  • The database is in a degraded state in the following situations:
    • Database is performing slowly due to high website demand.
    • Database is running slowly due to bad query optimization
    • Database is experiencing deadlocks

Now this is by no means an exhaustive list, but it hits the high points for our ecommerce site. Now in those states, I need to identify what do for these scenarios. So the question is how do we respond and recover. In the case of the database, the most common recommendations are, to use a standard tier, and to use active geo-replication.

So for “How do we respond, recover?” I would say we setup active geo-replication of our production database to a secondary region. In the event the database is “offline” we fail-over to a secondary region and leverage traffic manager to route to the backup site. We would see some data loss during the failover, but for this exercise, let’s say that is manageable.

The next question is the most important, how do we monitor for this? The answer is we could do this a couple of ways:

  • Setup alerts via azure monitor around specific metrics.
  • Setup alerts in Application Insights for Dependency failures for database calls.
  • Build a page within our application that Traffic Manager can prob to identify when the database is unreachable and trigger failover.

The next mode was “degraded” and if we examine that the response is to increase the performance tier of the database to respond to increased demand, or do more in-depth analysis around the performance of the database. Again the monitoring would be similar of setting up alerts around these conditions to make appropriate staff aware.

So all kidding aside, this is a huge topic, and one I want to boil down more on how best to implement these solutions. This post didn’t begin to discuss the differences between RTO / RPO, or how you make sure to ensure resiliency through transient fault tolerance or distributed architectures, and that’s just scratching the surface, so more to come.

Weekly Links – 11/4

Hello All, so I goofed and we missed last week’s post. I actually was at the International Association of the Chief’s of Police conference. Which is always a whirlwind and crazy experience.

See the source image

So that being said, down to business.

Development:

  • Build Great Xamarin Apps with App Center : As you probably know by now, I’m a big DevOps fan, and firm believer in its value. So this is a great description on how to use App Center to implement DevOps on your Xamarin Mobile Apps.

Cloud:

Video / Audio:

Fun Stuff:

So I’m a big fan of the Witcher books, honestly for as much as I enjoy games like Dungeons and Dragons, I’m usually not a fan of fantasy fiction. Honestly in general I find most of the genre to be overly slow in its story telling, and I get bored. But one of the exceptions to that is the Witcher, which tells engaging and dynamic stories really quickly. Geralt is a great character who lives in a really fascinating world. So when I heard NetFlix was adapting it, I was hopeful but concerned. Well the trailer dropped, and I’m really excited, this looks great.

Weekly links – 10/14

So this past week, I spent every free moment working on a shed in my backyard, and like any constructive project its had a slew of delays. But we are powering through:

See the source image

Down to business…

Development:

Cloud:

Audio / Video:

Fun Stuff:

So as always I’m a big comic fan, and I’ve said before I’m a fan of the CW Arrowverse. For as much as DC movies are terrible, their TV shows are quite excellent. And the standout last year was Supergirl, it really tapped into what makes for the best Superman / Supergirl stories. The best stories are all based around problems that they can’t “super power their way out of”. Last season tackled real topics like trust of the media, xenophobia, racism, and others. This season is already moving towards tackling technology and its ability to change the way that we view reality and connect with each other.