000

Index Labels

Cleaning bad code

.

Guess what! You've just inherited a stinking, steaming pile of messy old code. Congratulations! It's all yours.

Bad code can code can come from all kinds of places. Middleware, the internet, perhaps even your own company.

You know that nice guy in the corner that nobody had time to check up on? Guess what he was doing all that time. Churning out bad code.

Or remember that module someone wrote years ago, just before she left the company. That module that twenty different people have then added hacks, patches and bug fixes to, without really understanding what they were doing. Yup, that one.

Or what about that open source thing you downloaded that you knew was horrible, but it solved a very specific and quite hairy problem that would have taken you ages to do by yourself.

Bad code doesn't have to be a problem, as long as it's not misbehaving, and nobody pokes their bloody nose in it. Unfortunately, that state of ignorant bliss rarely lasts. A bug will be discovered. A feature requested. A new platform released. Now you have to dig into that horrible mess and try to clean it up. This article offers some humble advice for that unfortunate situation.

0. Is it worth doing?

The first thing you need to ask yourself is whether the code is worth cleaning. I'm of the opinion that when it comes to code cleaning you should either karate do "yes", or karate do "no". Either you assume full responsibility for the code and rework it until you end up with something that you are actually happy to maintain and proud to have in your codebase.

Or you decide that even though the code looks horrible, it isn't cost-effective to take time out of your busy schedule to fix it. So instead you just do the smallest change possible that solves your current problem.

In other words, you either regard the code as yours or theirs.

There are merits to both alternatives. Good programmers get an itch when they see bad code. They bring out their torches and pitchforks and chant: "Unclean! Unclean!" And that is a good instinct.

But cleaning code is also a lot of work. It is easy to underestimate the time it takes. It can be nearly as time consuming as rewriting the whole thing from scratch. And it doesn't bring any short term benefits. Two weeks cleaning code won't add any new features to the game, but it might give you some new bugs.

On the other hand, the long term effects of never cleaning your code can be devastating. Entropy is the code-killer.

So, never an easy choice. Some things to consider are:

  • How many changes do you expect to make to the code?

    Is it just this one small bug that you need to fix, or is this code that you expect to return to many times to tweak and tune and add new features. If it's just this one bug, then perhaps it is best to let sleeping dogs lie. However, if this is a module that you will need to mess around with a lot, then spending some time to clean it up now, will save a lot of headache later.

  • Will you need/want to import upstream changes?

    Is this an open source project that is under active development? If so, and you want to pull the changes made upstream you can't make any big changes to the code or you will be in merge hell every time you pull. So just be a nice team player, accept its idiosyncrasies and send patches with your bug fixes to the maintainer.

  • How much work is it?

    How many lines of code can you realistically clean in a day? An order of magnitude estimate says more than 100 and less than 10 000, so let's say 1 000. So if the module has 30 000 lines, you might be looking at a month of work. Can you spend that? Is it worth it?

  • Is it a part of your core functionality?

    If what the module does is something peripheral, like say font rendering or image loading, you might not care that it is messy. You might swap out the whole thing for something else in the future, who knows. But you should own the code that relates to your core competence.

  • How bad is it?

    If the code is just slightly bad, then perhaps you can live with it. If it is mind-numbingly, frustratingly incomprehensibly bad, then perhaps something needs to be done.

1. Get a test case

Seriously cleaning a piece of code means messing around with it a lot. You will break things.

If you have a decent test case with good coverage you will immediately know what has broken and you can usually quite quickly figure out what stupid mistake you just made. The time and anxiety this saves over the course of the cleaning process is just ridiculous. Get a test case. It's the first thing you should do.

Unit tests are best, but all code is not amenable to to unit testing. (Test fanatics, send your hate mail now!) If unit tests are too cumbersome, use an integration test instead. For example, fire up a game level and run the character through a specific set of actions related to the code you are cleaning.

Since such tests ate more time consuming, it might not make sense to run it after every change you make, which would be ideal. But as you put every single change you make into source control, it's not so bad. Run the test every once in a while (e.g., every five changes). When it discovers a problem you can do a binary search of those last few commits to find out which one caused the problem.

If you discover an issue that wasn't detected by your test, make sure that you add that to the test, so that you capture it in the future.

2. Use source control

Do people still have to be told to use source control? I sure hope not.

For cleaning work it is absolutely crucial. You will be making lots and lots of small changes to the code. If something breaks you want to be able to look back in the revision history and find out where it broke.

Also, if you are anything like me, you will sometimes start down a refactoring path (like removing a stupid class) and realize after a while that it wasn't such a good idea, or, that it was a good idea, but that everything would be a lot simpler if you did something else first. So you want to be able to quickly revert everything you just did and begin anew.

Your company should have a source control system in-place that allows you to do these changes in a separate branch and commit as much as you like without disturbing anybody else.

But even if it doesn't, you should still use source control. In that case, download mercurial (or git), create a new repository and put the code that you checked out of your company's stupid system there. Do your changes in that repository, committing as you go. When you are done you can merge everything back into the stupid system.

Cloning the repository into a sensible source control system only takes a few minutes. It is absolutely worth it. If you don't know mercurial, spend an hour to learn it. You will be happy you did. Or if you prefer, spend 30 hours to learn git instead. (I kid! Not really. Nerd fight now!)

3. Make one (small) change at a time

There are two ways of improving bad code: revolution and reform. The revolution method is to burn everything with fire and rewrite it from scratch. The reform method is to refactor the code with one small change at a time without ever breaking it.

This article is about the reform method. I'm not saying that revolutions never are necessary. Sometimes things are so bad that they just need to go. But people who get frustrated with the slow pace of reform and advocate revolution often fail to realize the full complexity of the problem and thus don't give the existing system enough credit for the things it does.

Joel Spolsky has written a classic article about this without falling into the trap of making strained political metaphors.

The best way of reforming code is to make one minimal change at a time, test it and commit it. When the change is small it is easier to understand its consequences and make sure that it doesn't affect the existing functionality. If something goes wrong, you only have a small amount of code that you need to check. If you start doing a change and realize that it is bad, you won't loose much work by reverting to the last commit. If you notice after a while that something has gone subtly wrong, a binary search in the revision history will let you find the small change that introduced the problem.

A common mistake is to do more than one thing at the same time. For example, while getting rid of an unnecessary level of inheritance you might notice that the API methods are not as orthogonal as you would like them to be and start to rearrange them. Don't! Get rid of the inheritance first, commit that and then fix the API.

Smart programmers organize the way they work so that they don't have to be that smart.

Try to find a path that takes you from what the code is now to what you want it to be in a sequence of small steps. For example, in one step you might rename the methods to give them more sane names. In the next, you might change some member variables to function parameters. Then you reorder some algorithms so that they are clearer. And so on.

If you start doing a change and realize that it was a bigger change than you originally thought, don't be afraid to revert and find a way of doing the same thing in smaller, simpler steps.

4. Don't clean and fix at the same time

This is a corollary to (3), but important enough to get its own point.

It is a common problem. You start to look at a module because you want to add some new functionality. Then you notice that the code is really badly organized, so you start reorganizing it at the same time as you are adding the new functionality.

The problem with this is that cleaning and fixing has diametrically opposite goals. When you clean, you want to make the code look better without changing its functionality. When you fix, you want to change its functionality to something better. If you clean and fix at the same time it becomes very hard to make sure that your cleaning didn't indadvertedly change something.

Do the cleaning first. Then, when you have a nice clean base to work with, add the new functionality.

5. Remove any functionality that you are not using

The time it takes to clean is proportional to the amount of code, its complexity and its messiness.

If there is any functionality in the code that you are currently not using and don't plan to be using in the foreseeable future -- get rid of it. That will both reduce the amount of code you will have to go through and its complexity (by getting rid of unnecessary concepts and dependencies). You will be able to clean faster and the end result will be simpler.

Don't save code because "who knows, you might need it some day". Code is costly -- it needs to be ported, bug checked, read and understood. The less code you have, the better. In the unlikely event that you do need the old code, you can always find it in the source repository.

6. Delete most of the comments

Bad code rarely has good comments. Instead, they are often:

// Pointless:

// Set x to 3
x = 3;

// Incomprehensible:

// Fix for CB (aug)
pos += vector3(0, -0.007, 0);

// Sowing fear and doubt:

// Really we shouldn't be doing this
t = get_latest_time();

// Downright lying:

// p cannot be NULL here
p->set_speed(0.7);

Read through the code. If a comment doesn't make sense to you and doesn't further your understanding of the code -- get rid of it. Otherwise you will just waste mental energy on trying to understand that comment on each future reading of the code.

The same goes for dead code that has been commented or #ifdef'ed out. Get rid of it. It's there in the source repository if you need it.

Even when comments are correct and useful, remember that you will be doing a lot of refactoring of the code. The comments may no longer be correct when you are done. And there is no unit test in world that can tell you if you have "broken the comments".

Good code needs few comments because the code itself is clearly written and easy to understand. Variables with good names do not need comments explaining their purpose. Functions with clear inputs and outputs and no special cases or gotchas require little explanation. Simple, well written algorithms can be understood without comments. Asserts document expectations and preconditions.

In many cases the best thing to do is just to get rid of all old comments, focus on making the code clear and readable, and then add back whatever comments are needed -- now reflecting the new API and your own understanding of the code.

7. Get rid of shared mutable state

Shared mutable state is the single biggest problem when it comes to understanding code, because it allows for spooky "action at a distance", where one piece of code changes how a completely different piece of code behaves. People often say that multithreading is difficult. But really, it is the fact that the threads share mutable state that is the problem. If you get rid of that, multithreading is not so complex.

Since your goal is to write high-performant software, you won't be able to get rid of all mutable state, but your code can still benefit enormously from reducing it as much as possible. Strive for programs that are "almost functional" and make sure you know exactly what state you are mutating where and why.

Shared mutable state can come from several different places:

  • Global variables. The classic example. By now everybody surely knows that global variables are bad. But note (and this is a distinction that people sometimes fail to make), that it is only shared mutable state that is problematic. Global constants are not bad. Pi is not bad. Sprintf is not bad.

  • Objects -- big bags of fun. Objects are a way for a large number of functions (the methods) to implicitly share a big bag of mutable state (the members). If a lazy programmer needs to pass some information around between methods, she can just make a new member that they can read and write as they please. It's almost like a global variable. How fun! The more members and the more methods an object has, the bigger this problem is.

  • Megafunctions. You have heard about them. These mythic creatures that dwell in the deepest recesses of the darkest codebases. Broken programmers talk about them in dusky bars, their sanity shattered by their encounters: "I just kept scrolling and scrolling. I couldn't believe my eyes. It was 12 000 lines long."

    When functions are big enough, their local variables are almost as bad as global variables. It becomes impossible to tell what effect a change to a local variable might have 2 000 lines further down in the code.

  • Reference and pointer parameters. Reference and pointer parameters that are passed without const can be used to subtly share mutable state between the caller, the callee and anyone else who might be passed the same pointer.

Here are some practical ideas for getting rid of shared mutable state:

  • Split big functions into smaller ones.

  • Split big objects into smaller ones by grouping members that belong together.

  • Make members private.

  • Change methods to be const and return the result instead of mutating state.

  • Change methods to be static and take their arguments as parameters instead of reading them from shared state.

  • Get rid of objects entirely and implement the functionality as pure functions without side effects.

  • Make local variables const.

  • Change pointer and reference arguments to const.

8. Get rid of unnecessary complexity

Unnecessary complexity is often a result of over-engineering -- where the support structures (for serialization, reference counting, virtualized interfaces, abstract factories, visitors, etc) dwarf the code that performs the actual functionality.

Sometimes over-engineering occurs because software projects start out with a lot more ambitious goals than what actually gets implemented. More often, I think, it reflects the ambitions/esthetics of a programmer who has read books on design patterns and the waterfall model and believes that over-engineering makes a product "solid" and "high-quality".

Often, the heavy, rigid, overly complex model that results is unable to adapt to feature requests that were not anticipated by the designer. Those features are then implemented as hacks, bolt-ons and backdoors on top of the ivory tower resulting in a schizophrenic mix of absolute order and utter chaos.

The cure against over-engineering is YAGNI -- you are not gonna need it! Only build the things that you know you need. Add more complicated stuff when you need it, not before.

Some practical ideas for cleaning out of unnecessary complexity:

  • Remove the functionality you are not using (as suggested above).

  • Simplify necessary concepts, and get rid of unneeded ones.

  • Remove unnecessary abstractions, replace with concrete implementations.

  • Remove unnecessary virtualization and simplify object hierarchies.

  • If only one setting is ever used, get rid of the possibility of running the module in other configurations.

9. That is all

Now go clean your room!

Blog Archive

Labels

.NET Programming 2D Drafting 3D Animation 3D Art 3D Artist 3D design 3D effects 3D Engineering 3D Materials 3D Modeling 3D models 3D presentation 3D Printing 3D rendering 3D scanning 3D scene 3D simulation 3D Sketch Inventor 3D Texturing 3D visualization 3D Web App 3ds Max 4D Simulation ACC Adaptive Clearing adaptive components Add-in Development Additive Manufacturing Advanced CAD features Advanced Modeling AEC Technology AEC Tools affordable Autodesk tools AI AI animation AI Assistance AI collaboration AI Design AI Design Tools AI Experts AI for Revit AI Guide AI in CAD AI in CNC AI in design AI in Manufacturing AI in Revit AI insights AI lighting AI rigging AI Tips AI Tools AI troubleshooting AI workflow AI-assisted AI-assisted rendering AI-enhanced Animation animation pipeline animation tips Animation workflow annotation AR architectural design architectural modeling architectural preservation architectural visualization Architecture architecture design Architecture Engineering Architecture Firm Architecture Productivity architecture software architecture technology Architecture Workflow Arnold Renderer Arnold Shader Artificial Intelligence As-Built Model Asset Management augmented reality AutoCAD AutoCAD advice AutoCAD API AutoCAD Basics AutoCAD Beginner AutoCAD Beginners AutoCAD Civil 3D AutoCAD Civil3D AutoCAD commands AutoCAD efficiency AutoCAD Expert Advice AutoCAD features AutoCAD File Management AutoCAD Layer AutoCAD Layers AutoCAD learning AutoCAD print settings AutoCAD productivity AutoCAD Teaching AutoCAD Techniques AutoCAD tips AutoCAD tools AutoCAD training. AutoCAD tricks AutoCAD Tutorial AutoCAD workflow AutoCAD Xref Autodesk Autodesk 2025 Autodesk 2026 Autodesk 3ds Max Autodesk AI Autodesk AI Tools Autodesk Alias Autodesk AutoCAD Autodesk BIM Autodesk BIM 360 Autodesk Certification Autodesk Civil 3D Autodesk Cloud Autodesk community forums Autodesk Construction Cloud Autodesk Docs Autodesk Dynamo Autodesk features Autodesk for Education Autodesk Forge Autodesk FormIt Autodesk Fusion Autodesk Fusion 360 Autodesk help Autodesk InfraWorks Autodesk Inventor Autodesk Inventor Frame Generator Autodesk Inventor iLogic Autodesk Knowledge Network Autodesk License Autodesk Maya Autodesk mistakes Autodesk Navisworks Autodesk news Autodesk plugins Autodesk productivity Autodesk Recap Autodesk resources Autodesk Revit Autodesk Software Autodesk support ecosystem Autodesk Takeoff Autodesk Tips Autodesk training Autodesk tutorials Autodesk update Autodesk Upgrade Autodesk Vault Autodesk Video Autodesk Viewer Automated Design Automation Automation Tutorial automotive design automotive visualization Backup Basic Commands Basics Batch Plot Beginner Beginner Tips beginner tutorial beginners guide Big Data BIM BIM 360 BIM Challenges BIM collaboration BIM Compliance BIM Coordination BIM Data BIM Design BIM Efficiency BIM for Infrastructure BIM Implementation BIM Library BIM Management BIM modeling BIM software BIM Standards BIM technology BIM tools BIM Trends BIM workflow Block Editor Block Management Block Organization Building Design Software Building Maintenance building modeling Building Systems Building Technology ByLayer CAD CAD API CAD assembly CAD Automation CAD Blocks CAD CAM CAD commands CAD comparison CAD Customization CAD Data Management CAD Design CAD errors CAD Evolution CAD File Size Reduction CAD Integration CAD Learning CAD line thickness CAD management CAD Migration CAD mistakes CAD modeling CAD Optimization CAD plugins CAD Productivity CAD Rendering CAD Security CAD Skills CAD software CAD software 2026 CAD software training CAD standards CAD technology CAD Tips CAD Tools CAD tricks CAD Tutorial CAD workflow CAM car design software Case Study CEO Guide CGI design Character Rig cinematic lighting Civil 3D Civil 3D hidden gems Civil 3D productivity Civil 3D tips civil design software civil engineering Civil engineering software Clash Detection Class-A surfacing clean CAD file cleaning command client engagement Cloud CAD Cloud Collaboration Cloud design platform Cloud Engineering Cloud Management Cloud Storage Cloud-First CNC CNC machining collaboration command abbreviations Complex Renovation concept car conceptual workflow Connected Design construction Construction Analytics Construction Automation Construction BIM Construction Cloud Construction Planning Construction Scheduling Construction Technology contractor tools Contractor Workflow Contraints corridor design Cost Effective Design cost estimation Create resizable blocks Creative Teams CTB STB Custom visual styles Cutting Parameters Cybersecurity Data Backup data management Data Protection Data Reference Data Security Data Shortcut Design Automation Design Career Design Collaboration Design Comparison Design Coordination design efficiency Design Engineering Design Hacks Design Innovation design optimization Design Options design productivity design review Design Rules design software design software tips Design Technology design tips Design Tools Design Workflow design-to-construction Designer Designer Tools Digital Art Digital Assets Digital Construction Digital Construction Technology Digital Content Digital Design Digital engineering digital fabrication Digital Manufacturing digital marketing digital takeoff Digital Thread Digital Tools Digital Transformation Digital Twin Digital Twins digital workflow dimension dimensioning Disaster Recovery drafting Drafting Efficiency Drafting Shortcuts Drafting Standards Drafting Tips Drawing Drawing Automation drawing tips Dref Dynamic Block Dynamic Block AutoCAD Dynamic Blocks Dynamic doors Dynamic windows Dynamo Dynamo automation early stage design eco design editing commands Electrical Systems Emerging Features Energy Analysis energy efficiency Engineering Engineering Automation engineering data Engineering Design Engineering Innovation Engineering Productivity Engineering Skills engineering software Engineering Technology engineering tools Engineering Tools 2025 Engineering Workflow Excel Export Workflow Express Tools External Reference facial animation Facial Rigging Facility Management Families Fast Structural Design Field Documentation File Optimization File Recovery Flame flange tips flat pattern Forge Development Forge Viewer FreeCAD Fusion 360 Fusion 360 API Fusion 360 tutorial Future of Design Future Skills Game Development Gamification Generative Design Geospatial Data GIS Global design teams global illumination grading optimization green building Green Technology Grips Handoff HDRI health check Healthcare Facilities heavy CAD file Heavy CAD Files heritage building conservation hidden commands Hospital Design HVAC HVAC Design Tools HVAC Engineering Hydraulic Modeling IK/FK iLogic Import Workflow Industry 4.0 Infrastructure infrastructure design Infrastructure Monitoring Infrastructure Planning Infrastructure Technology InfraWorks innovation Insight intelligent modeling Interactive Design interactive presentation Interior Design Inventor Inventor API Inventor Drawing Template Inventor Frame Generator Inventor Graphics Issues Inventor IDW Inventor Tips Inventor Tutorial IoT ISO 19650 joints Keyboard Shortcuts keyframe animation Keyframe generation Landscape Design Large Projects Laser Scan Layer Management Layer Organization Learn AutoCAD Legacy CAD Licensing light techniques Lighting and shading Lighting Techniques Linked Models Machine Learning Machine Learning in CAD Machine Optimization Machining Efficiency maintenance command Management manufacturing Manufacturing Innovation Manufacturing Technology Mapping Technology marketing visuals Material Creation Maya Maya character animation Maya lighting Maya Shader Maya Tips Maya tutorial measurement Mechanical Design Mechanical Engineering Media & Entertainment MEP Modeling Mesh-to-BIM Metal Structure modal analysis Model Management Model Optimization Modeling Secrets Modular Housing Motion capture motion graphics motion simulation MotionBuilder Multi Office Workflow Multi-User Environment multileader Navisworks Navisworks Best Practices Net Zero Design ObjectARX .NET API Open Source CAD Organization OVERKILL OVERKILL AutoCAD Page Setup Palette Parametric Components parametric design parametric family Parametric Modeling particle effects particle systems PDF PDM system Personal Brand Phasing PlanGrid Plot Settings Plot Style Plot Style AutoCAD Plotting Plugin Tutorial Plumbing Design point cloud Portfolio Post Construction Post-Processing Practice Drawing preconstruction workflow predictive analysis predictive animation Predictive Maintenance Predictive rigging Prefabrication Presentation-ready visuals Printing Printing Quality Procedural animation procedural motion Procedural Rig Procedural Textures Product Design Product Development product lifecycle product rendering Productivity productivity tools Professional 3D design Professional CAD Professional Drawings professional printing Professional Tips Project Documentation project efficiency project management Project Management Tools Project Visualization PTC Creo PURGE PURGE AutoCAD Rail Transit Rapid Prototyping realistic rendering ReCap Redshift Shader reduce CAD file size Render Render Passes Render Quality Render Settings Rendering rendering engine Rendering Engines Rendering Optimization rendering software Rendering Tips Rendering Workflow RenderMan Renewable Energy Renovation Project Renovation Workflow Reports Resizable Block restoration workflow Revit Revit add-ins Revit API Revit automation Revit Best Practices Revit Collaboration Revit Documentation Revit Family Revit integration Revit MEP Revit Performance Revit Phasing Revit Plugins Revit Scripting Revit skills Revit Standards Revit Template Revit Tips Revit tutorial Revit Workflow Ribbon Rigging robotics ROI Scale Autodesk Schedules screen Sculpting Secure Collaboration Sensor Data Shader Networks Sheet Metal Design Sheet Metal Tricks Sheet Set Manager shortcut keys Shortcuts Siemens NX Simulation simulation tools Sketch Sketching Tricks Small Firms Smart Architecture Smart Block Smart Building Design Smart City Smart Design Smart Engineering Smart Factory Smart Infrastructur Software Compliance software ecosystem Software Management Software Trends software troubleshooting Software Update Solar Energy Solar Panels SolidWorks Startup Design static stress Steel Structure Design Structural Optimization subscription model Subscription Value Surface Modeling sustainability sustainable design Sustainable Manufacturing system performance T-Spline team training guide Technical Drawing technical support Template Setup text style Texture Mapping Texturing thermal analysis Time Management time saving tools Title Blocks toolbar Toolpath Optimization Toolpaths Topography Troubleshooting Tutorial Tutorials urban planning User Interface (UI) UV Mapping UV Unwrap V-Ray Vault Best Practices Vault Lifecycle Vault Mistakes Vector Plotting vehicle modeling VFX Viewport configuration Virtual Environments virtual reality visual effects visualization workflow VR VR Tools VRED Water Infrastructure Water Management Weight Painting What’s New in Autodesk Wind Energy Wind Turbines Workbook workflow Workflow Automation workflow efficiency Workflow Optimization Workflow Tips Worksets Worksharing Workspace XLS Xref Xrefs เขียนแบบ