000

Index Labels

Strings Redux

.
Simpler programs are better programs. Today's target: strings. In this post I will show you three ways of improving your code by simplifying your strings.

1. Use UTF-8 everywhere


When I issue programming tests I always have some question about different string encodings. It is a good way of testing if a candidate can distinguish what data represents from how it is represented. But when I write code I just use UTF-8 everywhere, both in memory and on disk. Why? UTF-8 has many advantages and no serious disadvantages.

Advantages:

  • Using the same encoding everywhere means there is never any confusion about what encoding a certain string or file should be in. If it is not in UTF-8, then it is wrong. Period.
  • UTF-8 uses the standard C data types for strings: char * and char [].
  • ASCII strings look like ASCII strings and all functions, parsers, etc that operate on ASCII strings work on UTF-8 strings without modification.

The most common disadvantages claimed for UTF-8 are:

  • UTF-8 can waste memory.
  • Finding the i’th glyph in a UTF-8 string is expensive (O(n) rather than O(1)).

There is some truth to the first point. Yes, if your text is in Japanese, UTF-8 probably uses more memory than Shift-JIS. But I don’t think that is a major issue. First, while UTF-8 is worse than other encodings for some languages, it does pretty well on average. Second, strings aren’t a big part of a game’s memory usage anyway (if they are, you are most likely doing something wrong). And third, if you care that much about string memory usage you should probably compress your string data.

Compression will pretty much nullify any differences in memory usage caused by using different encodings, since the entropy of the underlying data is the same regardless of how it is encoded. (At least in theory, it would be interesting to see someone test it in practice.)

The second point is true but also moot, since accessing glyphs at random indices in a string is a much rarer operation than you might think. For most string operations: concatenation, parsing, etc you never have to access individual glyphs. You can just use the same implementation as you would use for an ASCII-string and it will work without modification.

In the few cases where you do need to convert to glyphs (for example for rendering) you typically do that sequentially, from the start to the end. This is still a fast operation, it is only random access of glyphs that is significantly slower with UTF-8 than with UTF-32. Another interesting thing to note is that since all continuation bytes in UTF-8 follow the pattern 10xxxxxx you can quickly find the start and end of the next or previous glyph given a char * to anywhere within a UTF-8 string.

In fact I can't think of any string operation that requires fast random access to glyphs other than completely contrived examples (given 10000 long strings, find the 1000th glyph in each). I urge my readers to try to come up with something.

2. You do not need a string class


String classes are highly overrated.

Generally speaking, code that deals with strings can be divided into two categories: code that looks at static strings (parsers, data compilers, script callbacks, etc) and code that builds dynamic strings (template formatters, debug logging, etc). In a typical game project there is a lot more of the first than the latter. Ironically, string classes don’t do a very good job with either!

For code that deals with static strings you should always use const char * rather than const string &. The former is more flexible. It allows the caller to store her strings however she likes rather than adhering to some memory model imposed by the string class. It also means that if you call the function with a static string it doesn’t get pointlessly converted to a string object.

But string classes aren’t very good for dynamic strings either, as anyone who has written something like this can attest to:

string a;
for (i = 0; i<10000; ++i)
a += "xxx";

Depending on how your string class is implemented this can be horribly inefficient, reallocating and copying the string memory for every iteration of the loop. There are various ways of addressing this: reserving memory for the string up front or using some kind of "rope" or "stringstream" class.

The simpler approach is to just use:

vector<char> a;
for (i=0; i<10000; ++i)
string::append(a, "xxx");

We represent the string as a vector of chars and provide a library of functions for performing "common string operations" on that representation.

The advantage of this over using a regular string class is that it provides a clear distinction between strings that can grow (vector<char>) and strings that can't (char *) and emphasizes what the cost of growing is (amortized linear time). Do you know the cost of growing in your string class?

3. You should almost never use strings in your runtime


The variable length nature of strings make them slow, memory consuming and unwieldy (memory for them must be allocated and freed). If you use fixed length strings you will either use even more memory or annoy the content creators because they can't make their resource names as descriptive as they would like too.

For these reasons I think that strings in the runtime should be reserved for two purposes:

  • User interface text
  • Debugging


In particular, you shouldn't use strings for object/resource/parameter names in the runtime. Instead use string hashes. This lets you use user friendly names (strings) in your tools and fast ints in your runtime. It is also a lot easier to use than enums. Enums require global cooperation to avoid collisions. String hashes just require that you hash into a large enough key space.

We hash names during our data compile stage into either 32-bit or 64-bit ints depending on the risk of collision. If it is a global object name (such as the name of a texture) we use 64-bit ints. If it is a local name (such as the name of a bone in a character) we use 32-bit ints. Hash collision is considered a compile error. (It hasn't happened yet.)

Since user interface text should always be localized, all user interface strings are managed by the localizer. The localized text is fetched from the localizer with a string lookup key, such as "menu_file_open" (hashed to a 64-bit int of course).

This only leaves debugging. We use formatted strings for informative assert messages when something goes wrong. Our profiler and monitoring tools use interned strings to identify data. Our game programmers use debug-prints to root out problems. Of course, non of this affects the end user, since the debugging strings are only used in debug builds.

Hashes can be problematic when debugging. If there is an error in the resource 0x3e728af10245bc71 it is not immediately obvious that it is the object vegetation/trees/larch_3.mesh that is at fault.

We handle this with a lookup table. When we compile our data we also create a reverse lookup table that converts from a hash value back to the original string that generated it. This table is not loaded by the runtime, but it can be accessed by our tools. So our game console, for instance, uses this table to automatically translate any hash IDs that are printed by the game.

However, recently I've started to also add small fixed-size debug strings to the resources themselves. Something like this:

HashMap<IdString64, MeshResource *> _meshes;

struct MeshResource
{
char debug_name[32];

};

As you can see, all the lookup tables etc, still use the 64-bit hash to identify the resource. But inside the resource is a 32-byte human friendly name (typically, the last 32 characters of the resource name), which is only used for debugging. This doesn't add much to the resource size (most resources are a lot bigger than 32 bytes) but it allows us to quickly identify a resource in the debugger or in a raw memory dump without having to open up a tool to convert hashes back to strings. I think the time saved by this is worth those extra bytes.

Blog Archive

Labels

.NET Programming 2D Drafting 3D 3D Animation 3D Art 3D Artist 3D CAD 3D Character 3D design 3D design tutorial 3D Drafting 3D effects 3D Engineering 3D Lighting 3D Materials 3D Modeling 3D models 3D Navigation 3D presentation 3D Printing 3D rendering 3D scanning 3D scene 3D simulation 3D Sketch Inventor 3D Texturing 3D visualization 3D Web App 3ds Max 4D Simulation ACC Adaptive Clearing adaptive components Add-in Development Additive Layers Additive Manufacturing Advanced CAD features Advanced Modeling advanced plot styles Advanced Sketch AEC Technology AEC Tools AEC Workflow affordable Autodesk tools AI AI animation AI Assistance AI collaboration AI Design AI Design Tools AI Experts AI for Revit AI Guide AI in 3D AI in Architecture AI in CAD AI in CNC AI in design AI in engineering AI in Manufacturing AI in Revit AI insights AI lighting AI rigging AI Strategies AI Tips AI Tools AI Tricks AI troubleshooting AI workflow AI-assisted AI-assisted rendering AI-Assisted Workflow AI-enhanced AI-powered templates Animation Animation Curves Animation Layers animation pipeline animation tips Animation Tutorial Animation workflow annotation Annotation Scaling annotation standards Annotations AR Architectural AI Architectural CAD architectural design Architectural Drawing architectural drawings architectural modeling architectural preservation Architectural Productivity architectural visualization Architecture architecture CAD architecture design Architecture Engineering Architecture Firm Architecture Productivity architecture projects architecture software architecture technology architecture tools Architecture Visualization Architecture Workflow Arnold Renderer Arnold Shader Artificial Intelligence As-Built Model assembly techniques Asset Management augmented reality Auto Rig Maya AutoCAD AutoCAD advice AutoCAD AI tools AutoCAD API AutoCAD automation AutoCAD Basics AutoCAD Beginner AutoCAD Beginners AutoCAD Blocks AutoCAD Civil 3D AutoCAD Civil3D AutoCAD commands AutoCAD efficiency AutoCAD Expert Advice AutoCAD features AutoCAD File Management AutoCAD Guide AutoCAD Hub AutoCAD Layer AutoCAD Layers AutoCAD learning AutoCAD print settings AutoCAD productivity AutoCAD scripting AutoCAD Scripts AutoCAD Sheet Set tips AutoCAD Teaching AutoCAD Techniques AutoCAD Templates AutoCAD tips AutoCAD tools AutoCAD training. AutoCAD tricks AutoCAD Tutorial AutoCAD workflow AutoCAD Xref Autodesk Autodesk 2025 Autodesk 2026 Autodesk 3ds Max Autodesk AI Autodesk AI Tools Autodesk Alias Autodesk AutoCAD Autodesk BIM Autodesk BIM 360 Autodesk Certification Autodesk Civil 3D Autodesk Cloud Autodesk community forums Autodesk Construction Cloud Autodesk Docs Autodesk Dynamo Autodesk features Autodesk for Education Autodesk Forge Autodesk FormIt Autodesk Fusion Autodesk Fusion 360 Autodesk help Autodesk InfraWorks Autodesk Inventor Autodesk Inventor Frame Generator Autodesk Inventor iLogic Autodesk Knowledge Network Autodesk License Autodesk Maya Autodesk mistakes Autodesk Navisworks Autodesk news Autodesk plugins Autodesk productivity Autodesk Recap Autodesk resources Autodesk Revit Autodesk Software Autodesk support ecosystem Autodesk Takeoff Autodesk Tips Autodesk training Autodesk tutorials Autodesk update Autodesk Upgrade Autodesk Vault Autodesk Video Autodesk Viewer Automate automate drawing updates Automate Printing automate publishing automate repetitive tasks Automated Design automated publishing Automated Sheets Automation Automation in AutoCAD Automation Tools Automation Tutorial automotive design automotive visualization Backup Basic Commands Basics batch drawing validation Batch Plot Batch Plotting Beginner beginner CAM Beginner Tips beginner tutorial beginners guide Bend Tools Best Practices Big Data BIM BIM 360 BIM Challenges BIM collaboration BIM Compliance BIM Coordination BIM Data BIM Design BIM Efficiency BIM for Infrastructure BIM Implementation BIM Library BIM Management BIM modeling BIM software BIM Standards BIM technology BIM Tips BIM tools BIM Trends BIM workflow Block Editor Block Management Block Organization Boolean Operations Building design Building Design Software Building Efficiency Building Maintenance building modeling Building Systems Building Technology business tools ByLayer CAD CAD API CAD assembly CAD Automation CAD best practices CAD Blocks CAD CAM CAD collaboration CAD commands CAD comparison CAD consistency CAD Customization CAD Data Management CAD Design CAD drawing checks CAD efficiency CAD errors CAD Evolution CAD file management CAD File Size Reduction CAD Integration CAD Learning CAD libraries CAD line thickness CAD management CAD Migration CAD mistakes CAD modeling CAD Optimization CAD organization CAD Oversight CAD plugins CAD Productivity CAD project management CAD Projects CAD Rendering CAD Scripting CAD Security CAD Sheet Management CAD sheet sets CAD Shortcuts CAD Skills CAD software CAD software 2026 CAD software training CAD standardization CAD standards CAD Tables CAD team CAD teams CAD technology CAD templates CAD Tips CAD Tools CAD Tracking CAD tricks CAD Tutorial CAD version control CAD workflow CAD workflow optimization CAD workflows CAM CAM Best Practices CAM for beginners CAM Optimization CAM simulation CAM strategies CAM Tips CAM tutorial CAM Workflow car design software Case Study central hub Central Hub Solutions centralized commands centralized documentation centralized management Centralized Sheet Set centralizing CAD CEO Guide CG Workflow CGI CGI design Character Animation Character Rig Character Rigging cinematic lighting Civil 3D Civil 3D hidden gems Civil 3D productivity Civil 3D tips civil design software civil engineering Civil engineering software Clash Detection Class-A surfacing clean CAD file cleaning command client engagement Cloth Simulation Cloud CAD cloud CAD storage Cloud Collaboration Cloud design platform Cloud Engineering Cloud Management Cloud Storage Cloud-Based CAD Cloud-First CNC CNC machining collaboration collaboration in CAD Collaboration Tools Collaborative CAD collaborative design Collaborative Drafting color management command abbreviations Complex Projects Complex Renovation concept car conceptual workflow Connected Design construction Construction Analytics Construction Automation Construction BIM Construction Cloud construction documentation construction drawings construction management Construction Phases Construction Planning Construction Project Construction Projects Construction Scheduling Construction Technology construction tools construction tracking Contractor contractor tools Contractor Workflow Contraints corridor design Cost Effective Design cost estimation Create resizable blocks Creative Teams creative tools CTB CTB STB Custom Hatch custom scripts custom tool palettes Custom visual styles Cutting Parameters Cybersecurity Data Backup Data Extraction data management Data Protection Data Reference Data Security Data Shortcut deadline tracking Demolition Design Design Automation Design Career Design Collaboration Design Comparison Design consistency Design Coordination Design Documentation design efficiency Design Engineering design errors Design Hacks Design Innovation design management design optimization Design Options Design Oversight design productivity design review Design Reviews design revisions Design Rules design software design software tips design standardization design standards Design Teams Design Technology design templates design tips Design Tools design tracking Design Workflow design-to-construction Designer designer hacks Designer Tools Designer Workflow Digital Art Digital Assets Digital Construction Digital Construction Technology Digital Content Digital Design Digital Drafting digital drawing Digital engineering digital fabrication Digital Library Digital Manufacturing digital marketing digital takeoff Digital Thread Digital Tools Digital Transformation Digital Twin Digital Twins digital workflow dimension dimension styles dimensioning Disaster Recovery document management Document Organization Documentation drafting drafting automation Drafting Efficiency Drafting productivity Drafting Shortcuts Drafting Standards Drafting Tips drafting tools Drafting Workflow Drawing Drawing Accuracy Drawing Automation drawing consistency drawing management Drawing Organization drawing revisions Drawing standards drawing templates drawing tips Dref DWG files DXF Export Dynamic Block Dynamic Block AutoCAD Dynamic Blocks dynamic data management Dynamic doors Dynamic windows Dynamics Dynamics Simulation Dynamo Dynamo automation early stage design eco design editing commands Efficiency efficient CAD efficient project management Electrical Systems Emerging Features Energy Analysis energy efficiency Energy Simulation Engineering Engineering Automation engineering CAD engineering data Engineering Design Engineering Documentation Engineering Drawing engineering drawings engineering efficiency Engineering Innovation Engineering Productivity engineering projects Engineering Skills engineering software Engineering Technology engineering tips engineering tools Engineering Tools 2025 Engineering Workflow Error Reduction Excel Export Workflow Express Tools External Reference Fabric Simulation facial animation Facial Rigging Facility Management Families Fast Structural Design faster delivery Field Documentation file auditing File Management file naming File Optimization File Recovery Fire Flame flange tips flat pattern Fluid Effects Fluid Simulation Forge Development Forge Viewer FreeCAD Fusion 360 Fusion 360 API Fusion 360 guide Fusion 360 Tips Fusion 360 tutorial Future of Design Future Skills Game Design Game Development Game Effects Gamification Generative Design Geospatial Data GIS Global design teams global illumination GPU Acceleration grading optimization Graph Editor Green Architecture green building Green Technology Grips Handoff Hatch Patterns HDRI health check Healthcare Facilities heavy CAD file Heavy CAD Files heritage building conservation hidden commands Hospital Design Hub Workflows HVAC HVAC Design Tools HVAC Engineering HVAC Optimization Hydraulic Modeling IK/FK iLogic Import Workflow Industrial Design Industry 4.0 Infrastructure infrastructure design Infrastructure Monitoring Infrastructure Planning Infrastructure Technology InfraWorks innovation Insight Intelligent AutoCAD Hub Intelligent automation Intelligent Design intelligent modeling Intelligent Repetition Control Intelligent Sheet Management Intelligent Sheet Sets intelligent tools Intelligent Workflow Interactive Design interactive presentation Interior Design Inventor Inventor API Inventor Drawing Template Inventor Frame Generator Inventor Graphics Issues Inventor IDW Inventor Tips Inventor Tutorial IoT ISO 19650 joints Keyboard Shortcuts keyframe animation Keyframe generation Landscape Design Large Projects Laser Scan layer conventions Layer Management Layer Organization layer standards layouts Learn AutoCAD Legacy CAD Library components Licensing light techniques Lighting Lighting and shading Lighting Techniques lineweight Linked Models Liquid Machine Learning Machine Learning in CAD Machine Optimization Machining Efficiency machining productivity Macros maintenance command Manage multiple projects from a single hub with a centralized project management system that improves collaboration Management manual plotting manufacturing Manufacturing Innovation Manufacturing Technology Mapping Technology marketing visuals master sheet index Material Creation Material Libraries Maya Maya Animation Maya character animation Maya lighting Maya Python Maya Rigging Maya Shader Maya Tips Maya tutorial Maya Workflow measurement Mechanical Design Mechanical Engineering Media & Entertainment MEP MEP Modeling Mesh-to-BIM Metal Fabrication Metal Structure milestone tracking modal analysis Model Clarity Model Management Model Optimization model space Modeling Secrets Modular Housing Monitoring Progress Motion capture Motion Design motion graphics motion simulation MotionBuilder Multi Office Workflow multi-axis machining Multi-Body Modeling Multi-Project Multi-Project Management Multi-User Environment multileader multiple sheet sets naming convention Navisworks Navisworks Best Practices nCloth Net Zero Design New Construction ObjectARX .NET API Open Source CAD Optimization Organization OVERKILL OVERKILL AutoCAD Override Layers Page Setup Palette paper space parametric assembly Parametric Components Parametric Constraints parametric design parametric family Parametric Modeling particle effects particle systems PDF PDF Export PDM system Personal Brand Phase Filters Phasing photorealism Photorealistic photorealistic render PlanGrid plot automation Plot Settings Plot Style Plot Style AutoCAD plot styles Plotting Plotting automation Plugin Tutorial Plumbing Design PM Tools point cloud Portfolio Post Construction Post-Processing Practice Drawing precision machining preconstruction workflow predictive analysis predictive animation Predictive Maintenance Predictive rigging Prefabrication Preloaded families Presentation-ready visuals Printing Printing Quality Problem Solving Procedural animation procedural motion Procedural Rig Procedural Textures Product Design Product Development product lifecycle product rendering Product Visualization Productivity productivity and workflow efficiency. productivity tips productivity tools Professional 3D design Professional CAD Professional Drawings professional printing Professional Tips Professional Workflow progress management Project Accuracy project automation Project Collaboration project consistency Project Coordination project dashboard Project Documentation project efficiency Project Goals project management Project Management Tools project milestones Project Monitoring project organization Project Oversight project planning Project Progress project quality project timeline project tracking Project Visualization project workflow PTC Creo Publish Drawings PURGE PURGE AutoCAD Rail Transit Rapid Prototyping Realism realistic rendering realistic scenes ReCap Redshift Shader reduce CAD errors reduce CAD file size Reduce Errors reduce manual updates Reducing redundancy Redundant Work Render Render Optimization Render Passes Render Quality Render Settings render tips Rendering rendering engine Rendering Engines Rendering Optimization rendering settings rendering software Rendering Techniques Rendering Tips Rendering Workflow RenderMan Renewable Energy Renovation Project Renovation Workflow repetition-free workflow repetitive drawing Repetitive Elements repetitive-free Reports Resizable Block restoration workflow Reusable Components Revision Control Revision Tracking Revit Revit add-ins Revit API Revit automation Revit Best Practices Revit Collaboration Revit Documentation Revit Family Revit integration Revit MEP Revit Performance Revit Phasing Revit plugin Revit Plugins Revit Scripting Revit skills Revit Standards Revit Strategies Revit Structure Revit Tags Revit Template Revit templates Revit Tips Revit tutorial Revit Workflow Ribbon Rigging Rigid Body robotics ROI Room planning save hours of work Save Time save time CAD Scale Autodesk Schedules screen Scripts Sculpting Secure Collaboration Sensor Data Shader Networks sheet management Sheet Metal Sheet Metal Design Sheet Metal Tricks Sheet organization sheet set Sheet Set Automation Sheet Set Efficiency Sheet Set fields Sheet Set Management Sheet Set Manager Sheet Set Optimization Sheet Set Organization Sheet Set Software Sheet Set Standards Sheet Set Tips Sheet Set Tools Sheet Sets sheet sets workflow Sheets shortcut keys Shortcuts Siemens NX Simulation simulation tools Sketch Sketching Tricks Small Firms Smart Architecture Smart Block Smart Building Design Smart CAD smart CAD tools Smart City Smart Design smart dimensioning Smart Engineering Smart Factory Smart Infrastructur Smart Project Smart Sheet Management Smart Sheet Set Tools Smart Sheet Sets Smart Workflows Smoke Soft Body Software Compliance software ecosystem Software Management Software Trends software troubleshooting Software Update Solar Energy Solar Panels SolidWorks Space planning standard part libraries Standardization Standardize standardized templates Startup Design static stress STB Steel Structure Design Stress-Free Structural Design Structural Modeling Structural Optimization subscription model Subscription Value surface finish Surface Modeling sustainability sustainable design Sustainable Manufacturing system performance T-Spline task management team collaboration Team Efficiency Team Productivity Team Projects team training guide technical documentation Technical Drawing technical support Template management Template Setup Template usage templates text settings text style Texture Mapping Texturing thermal analysis time efficiency Time Management time saving tools time savings time-saving time-saving tools Title Block title block automation Title Blocks Tool Libraries Tool Management Tool Palette Guide toolbar toolpath Toolpath Optimization Toolpaths Topography Track Track changes Troubleshooting Tutorial Tutorials Unfolding Techniques urban planning User Interface (UI) UV Mapping UV Unwrap V-Ray Vault Best Practices Vault Lifecycle Vault Mistakes Vector Plotting vehicle modeling version control VFX View Filters Viewport configuration viewports Virtual Environments virtual reality visual effects visualization workflow VR VR Tools VRED Water Infrastructure Water Management Weight Painting What’s New in Autodesk Wind Energy Wind Turbines Workbook workflow Workflow Automation workflow efficiency Workflow Optimization Workflow Tips Worksets Worksharing Workspace XLS Xref Xrefs เขียนแบบ