Good nuf’ development

I’ve been called a Copy/Paste Coder and a Duct Tape Developer more then a few times over my career. For some those are derogatory terms, for me they are just part of who I am. But know that those are part of me also doesn’t mean I’m not a Software Craftsmen.

OLYMPUS DIGITAL CAMERA         Most of work in an agile environment, but we tend not to really live agile. You can view a good overview of them here. One of the principals is to “Apply the 80/20 Rule”, where 80% of what your customer immediate needs will be from 20% of your work.

How many times have you said “Were going to need this in the future” or “lets do this now to set ourselves up”. Only to not need what you did, or have to make drastic changes to it when it came time to actually use it?

We are awful at predicting the future, there is no way around it. When we doing work for an unknown future we have to guess and estimate. More often then not our guesses and estimations are wrong, if not horrible inaccurate.

In every phase and section of development, from architecture to UI design without hard and fast realities we tend to overestimate what’s required and what’s truly needed. I’ve personally stood up extremely complex architectures to handle disconnected operations for Resgrid only to find that multiple people would never interact with the dataset the way I built the architecture to handle. I basically threw away weeks worth of work because I made the problem more complex then it really was and all it took was interacting with a customer to figure that out.

It’s not hard to practice “Good Nuff” development. If your starting to hypothesize about needs, requirements, use cases or imagine scenarios that haven’t been given to you don’t code anything unless you run those by a customer.

It gets fuzzy when your talking about architecture or the foundation of your product or app. But when your designing or implementing the architecture ask yourself, is this to address an immediate or concrete need that’s in support of a vetted customer requirement? Notice I added the word ‘vetted’, when creating your architecture don’t assume or guess, talk to a customer and pitch the emit of your architecture. If it’s an 80% requirement for a customer, put it in, but if it’s in the “nice to have” area, don’t add it.

Complicating an architecture from the onset will slow development, increase mental load to work in it and westwater-nedra-garden-maze-portugal-europereduce the ability to onboard new people onto your team. Think of your architecture as a garden maze, the more complicated it is, the harder it will be for people to navigate. Don’t take the hit from the onset unless you absolutely have to.

Finding the right balance is important, but if your guessing or hypothesizing about future needs there needs to be input from your consumer. Keep it as small and simple as you can for as long as you can, simple is good.

You shouldn’t “kick the can down the road”, address your problems and tech debt as soon as you can. But don’t justify building things just because you ‘may’ need it in the future.

If you’re a First Responder or know one check out Resgrid which is a SaaS product utilizing Microsoft Azure, providing logistics, management and communication tools to first responder organizations like volunteer fire departments, career fire departments, EMS, search and rescue, CERT, public safety, disaster relief organizations.

Kicking the can down the road

Being a developer is an interesting profession. On one hand there are an engineering feel to it but on the other hand it’s far more like art. For me calling myself a “Software Engineer” is more for marketing then anything else. But lets face it, little of what developers/programmers do is actual engineering.

kicking-the-canEngineers are certified go through an apprenticeship process then design and create things in the real world that, for the most part, have to stand the test of time. To me Electrical, Mechanical, Structural, etc are the real engineers and us developers haven’t yet earned that right to use that title in the way we do.

How often in a development meeting do we sit there, talk about a design/architecture/code flaw that could cause issues down the road but decide to not even begin to address it? It happens so often we have an industry term for it “Tech Debt”. Can you imagine some structural engineer’s having the same discussion?

Tom: “So I cobbled together this bridge design from a bunch of other actual bridge designs and sample designs and we look ready to go.”

Mike: “Nice, but looking at these designs if we have the amount of traffic we estimate, and we fully expect the bridge to be popular in the near future, it will start leaning to one side”

Tom: “Yea, but we have a deadline, we can always go back and add more supports latter”

Structural Engineers can’t go back and refactor the bridge after it’s been built as easily as developers can with code. There are also lots of other differences, but it really boils down to is that engineers can’t kick the can down the road as easily as developers can, and so we do.

We talk about “performance as a feature”, but what about “ease of maintenance as a feature”, “scalable architecture as a feature”, “security as a feature” or “testing as a feature”? Every time we kick something down the road, label it as “Tech Debt” and put it in the backlog if were being truthful with ourselves we know full well that unless it catches fire we will almost never be back to address it.

Eventually all that Tech Debt will catch up to you and when it does, it can cost you business, alienate customers, hurt your image or even cause your company to fail. I’ve talked about this problem before and at Resgrid I try and follow a 60/40 approach. 60% new features/customer facing bugs, 40% tech debt, testing, automation and tooling.

Here are some guidelines I feel will help stop us from kicking cans down the road and start turning developers into engineers.

  1. Design/Code for the developers around you.
    One thing I’ve always admired about the military is the “Do it for the person next to you” attitude most service members have. The same can be said for the fire service. Sure you start off wanting a thrill, but after the first few times your going in because it’s your duty and because your doing it for your crew. Developers need to have the same mentality, don’t code for yourself or for your company, craft code for the developers on your team. Do it for the person in the cube next to you.
  2. Balance Architecture & Implementation
    I’ve seen my fair share of architecture astronauts in my day and they can do more harm then good. The architecture implemented needs to match the problem domain and how it’s going to be implemented. The architecture you use for a internal LoB app will be different then one that will be deployed on the cloud. No architecture is perfect and it never will be. Ideally you need to design your Architecture for the worst case scenario for 5 years out? How many users do expect in 5 years? Double that, then how are you going to handle that? How many web servers, databases will you need? Is that the scale you should have been using eventing, CQRS, etc?
  3. Code for now and for the future
    If your response to something new is “well that’s the way we’ve always done it” or some variation of it your coding in the past. Yes it’s painful to constantly keep up to date with the latest trends and best practices, but your hurting yourself and the entity you working for by not incorporating the current best practices. You will find it increasingly hard to find developers that want to work, or even know how to work, in your ‘brand new’ code base. I’ve been interviewing a lot of developers lately and almost none of them with less then 7 years experience have ever worked on an WebForms app. Starting a new project? Using WebForms? You will find it very difficult to find developers that know that technology in the near future. This same principal goes for patterns & practices.
  4. Don’t Investing in Tooling/Automation too Early
    When you first starting out a project get the minimum amount of tooling/automation you need. This is a train of thought used by startups. Time spent on setting up elaborate tooling, complex automation is time not spent properly architecting your project, implementing features or fixing bugs. Because tooling/automation lives out-of-band of your core project you can cycle on it more quickly and once you’ve done things by hand a while you know exactly where the pain points are. When you first start off, you’ll just be guessing. A lot of tooling/automation is coupled to your environment as well, so you may start off deploying locally, but then move to Azure, your tooling will have to change at that point. Start working on automation/tooling once your project is established, is maintaining good velocity and your target environments are well known.
  5. 60/40 Every Sprint, In Every SDLC Phase
    Whether your just starting your app or are in maintenance mode balance the work between features/bugs (the 60%) and tech debt, testing, automation and tooling (the 40%). Break down complex technical debt items into smaller pieces and work on them every sprint. I call the 40% bucket “Preventative Code Maintenance” and it should be the #3 priority on your backlog at all times.
  6. Document your culture and live it
    Have coding contracts and guidelines before you start any project. Your codebase, especially early on, should be unified and feel cohesive. If I got into Mikes code it should feel like Sally’s. Utilize tools like Style/FX Cop and ReSharper to nudge people in the right direction. Utilize Pull Requests, Peer Reviews or Pair Programming to keep your codebase like a well run HOA. No one developer should ‘own’ code, go in clean up code and fix broken windows. Practice Scout Coding at all times. Not everyone on the team needs to be in complete agreement on something, but once the team commits to it everyone needs to be on board. You can either succeed as a team or fail individually.
  7. “Whatever” as a feature
    When your documenting what your application or system should do. Right there should be “It should be performant, it should scale, it should be maintainable and it should be secure”. Even for internal only applications if your app goes down or is slow, it’s costing you money in wasted employee time. You shouldn’t, on a whim, sacrifice performance or security to push out a new feature without the business knowing the full extent of the tradeoff.
  8. Automate Deployments with Smoke Tests
    At first glance this may seem contradictory to #4, but that item is based on timing. When you first start working on the project you won’t be deploying to a production or pre-prod environment. Much latter down the road (hence the timing aspect) you should completely automate your deployments. In the day of Containers, Slot Deployments, CI servers you should never be manually modifying your production environment. One day you will mess up and it could cost you. Yes, the “rm –rf” guy was a hoax, but it’s also a cautionary tale.

We have to balance getting features out or getting the product out with all of the ‘back of the house’ concerns. But we have to remember that we spend our days in the ‘back of the house’. If the architecture it’s good, code isn’t well formed or meets standards, patterns and practices aren’t being followed it’s only going to slow development down, cause bugs and arguments.

If you’re a First Responder or know one check out Resgrid which is a SaaS product utilizing Microsoft Azure, providing logistics, management and communication tools to first responder organizations like volunteer fire departments, career fire departments, EMS, search and rescue, CERT, public safety, disaster relief organizations.

You will never be bug free

Recently I was on a call with a miffed client dealing with issues in a software product. Understandably they were upset that what they were paying for had an issue that impacted them. All of that is pretty SOP but then an IT guy pipes up:

Do you guys have any testing? How did this make it into production? You should be catching all bugs during development!”.

bug-featureComing from a business person this is somewhat of an expected statement, but from someone in the technology field this is borderline ignorant. Here is the cold hard fact, you will never, ever be bug free. If you think you are 100% bug free, your not, they just haven’t been exposed or reported yet.

This has nothing to do with your methodology (Agile/Waterfall/Kanban), your delivery (SaaS, Mobile App, Desktop App) or your audience (Consumer,  Business, Gov). This is just the reality of software development, but it’s not limited to just software.

Mariner 1

On July 22 1962 Mariner 1 was launched on it’s way to Venus. A few minutes after launch Mariner 1 began to fly off course and the guidance system failed to correct it. As the spacecraft started to veer toward North Atlantic shipping lanes it was destroyed by the safety officer.

So what caused the issue? It was a typo.


The year, 1994 and Intel launched their brand new Pentium check to the masses. After many years in development this was the first new chip to usher in a new era after the 486. How could this go wrong? Well it  did, the chips made some mistakes during floating point division.

What’s to blame? A faulty division table.

Mars Climate Orbiter

1962 too old? They didn’t have QA, or automated testing back then you say. Plus Pentium is a hardware issue! Well on December 11st, 1998 the Mars Climate Orbiter launch and was on it’s way to Mars. On September 23rd, 1999 NASA lost contact with the orbiter as the craft started to enter orbit.

What caused this issue? One team used English units and another team used Metric units.

F22 Raptor

I have first hand knowledge how much dealing with dates, times and time zones sucks. But thankfully I’ve never encountered an issue quite like this one. In 2007 during a 15 hour flight from Hawaii to Okinawa, Japan multiple on-board computers crashed when the planes crossed the International Date Line. The F-22 Raptor with a simulated KD Ratio of 241-2 was grounded due to a software issue dealing with the International Date Line.

How many lines a code does it take to cripple an advanced fighter? Just a couple.

Flash Crash 2010

It was calm and sunny day in NYC. The date, May 6th 2010, the time, 2:32 PM. The next 36 minutes will go down in history for the for the NYSE. A trading algorithm ran amok by some accounts or worked too well by others. Causing a massive and almost instantaneous drop of the DJIA by 9.2%, an intra-day swing of 1,010.14 points.

How much damage can a guy in his basement do? A lot apparently.

Knight Capital

August 2nd 2012 between 9:30AM and 10:00AM Knight Capital’s automated market making trading platform started generating erratic trades. We all know to buy low and sell high, but apparently in this case the the system didn’t get that memo and started buying high and selling low.

The cost of some bad logic? $440 Million and a company.

The above stories are just a super small sampling of software bugs that made it past, in some cases, some very rigorous testing and QA. You think you have it tough running your software changes through QA, talk to anyone who’s worked in the aircraft or defense industry.

We should do everything possible to catch, fix or remediate bugs or issues before they make it into production. But no matter how rigorous your testing, QA or automation process is, they will always be bugs in production. Once you accept this as a fact of life you can look into minimizing the impact those bugs have.

First monitor production system from every angle you can. Hardware, software, traffic, errors, analytics, etc. When you develop a baseline of your system running normally you can use that profile to determine when it isn’t running properly.

Second, pay attention to production logs and automation notification of actual errors. Don’t put a bunch of Exception logging code all over the place and have a system notify you every time it’s tripped, that will just become noise and you’ll ignore it.

Third, make it stupid simple for users to report errors and issues. Your users, for the most part, will not bother to inform you of most errors they encounter. They just don’t care about your software or service that much. It’s when it truly impacts their life or business that you will hear about it and by then it’s too late.

Finally, jump on customer impacting production bugs right away. You ideally want to fix these issues within hours, not days. I’ve pulled many all nighters fixing production issues and it’s not fun, but your users appreciate it. Keep them informed and over communicate with a status/service page, on social media and via email.

Bugs in production happen, you are just fooling yourself if you think otherwise. It’s how quickly you address those bugs and how you handle the affected customers that really matter. Don’t put all your eggs in the “catch all bugs during development” basket, save some for the production and customer service side.

If you’re a First Responder or know one check out Resgrid which is a SaaS product utilizing Microsoft Azure, providing logistics, management and communication tools to first responder organizations like volunteer fire departments, career fire departments, EMS, search and rescue, CERT, public safety, disaster relief organizations.

Go to Top