000

Index Labels

Allocation Adventures 2: Arrays of Arrays

.

Last week's post ended with a puzzle: How can we allocate an array of dynamically growing and shrinking things in an efficient and data-oriented way? I.e. using contiguous memory buffers and as few allocations as possible.

The example in that post was kind of complicated, and I don't want to get lost in the details, so let's look at a simpler version of the same fundamental problem.

Suppose we want to create a TagComponent that allows us to store a number of unsigned tags for an entity.

These tags will be hashes of strings such as "player", "enemy", "consumable", "container", etc and the TagComponent will have some sort of efficient lookup structure that allows us to quickly find all entities with a particular tag.

But to keep things simple, let's ignore that for now. For now we will just consider how to store these lists of tags for all our entities. I.e. we want to find an alternative to:

std::vector< std::vector < unsigned> > data;

that doesn't store every list in a separate memory allocation.

Fixed size

If we can get away with it, we can get rid of the "array of arrays" by setting a hard limit on the number of items we can store per entity. In that case, the data structure becomes simply:

enum {MAX_TAGS = 8};
struct Tags
{
unsigned n;
unsigned tags[MAX_TAGS];
};
Array<Tags> data;

Now all the data is contained in a single buffer, the data buffer for Array<Tags>.

Sometimes the hard limit is inherent in the problem itself. For example, in a 2D grid a cell can have at most four neighbors.

Sometimes the limit is a widely accepted compromise between cost and quality. For example, when skinning meshes it is usually consider ok to limit the number of bone influences per vertex to four.

Sometimes there is no sensible limit inherent to the problem itself, but for the particular project that we are working on we can agree to a limit and then design the game with that limit in mind. For example we may know that there will never be more than two players, never more than three lights affecting an object, never more than four tags needed for an entity, etc.

This of course requires that we are writing, or at least configuring, the engine for a particular project. If we are writing a general engine to be used for a lot of games it is hard to set such limits without artificially constraining what those games will be able to do.

Also, since the fixed size must be set to the maximum array size, every entity that uses fewer entries than the maximum will waste some space. If we need a high maximum this can be a significant problem and it might make sense to go with a dynamic solution even though there is an upper limit.

So while the fixed size approach can be good in some circumstances, it doesn't work in every situation.

Linked list

Instead of using arrays, we can put the tags for a particular entity in a linked list:

struct Tag
{
unsigned tag;
Tag *next;
};
Array<Tag *> data;

Using a linked list may seem like a very bad choice at first. A linked list can give us a cache miss for every next pointer we follow. This would give us even worse performance than we would get with vector < vector < unsigned > >.

But the nodes in the linked list do not necessarily have to be allocated individually on the heap. We can do something similar to what we did in the last post: allocate the nodes in a buffer and refer to them using offsets rather than pointers:

struct Node
{
unsigned tag;
unsigned next;
};
Array<Node> nodes;

With this approach we only have a single allocation -- the buffer for the array that contains all the tag nodes -- and we can follow the indexes in the next field to walk the list.

Side note: Previously I have always used UINT_MAX to mark an nil value for an unsigned. So in the struct above, I would have used UINT_MAX for the next value to indicate the end of the list. But recently, I've switched to using 0 instead. I think it is nice to be able to memset() a buffer to 0 to reset all values. I think it is nice that I can just use if (next) to check if the value is valid. It is also nice that the invalid value will continue to be 0 even if I later decide to change the type to int or uint_16t. It does mean that I can't use the nodes[0] entry, since that is reserved for the nil value, but I think the increased simplicity is worth it.

Using a single buffer rather than separate allocations gives us much better cache locality, but the next references can still jump around randomly in that buffer. So we can still get cache misses. If the buffer is large, this can be as bad as using freely allocated nodes.

Another thing to note is that we are wasting a significant amount of memory. Only half of the memory is used for storing tags, the rest of it is wasted on the next pointers.

We can try to address both these problems by making the nodes a little bigger:

enum {MAX_TAGS_PER_NODE = 8};
struct Node
{
unsigned n;
unsigned tags[MAX_TAGS_PER_NODE];
unsigned next;
};
Array<Node> nodes;

This is just as before, except we have more than one tag per node. This gives better cache performance because we can now process eight tags at a time before we have to follow a next pointer and jump to a different memory location. Memory use can also be better. If the nodes are full, we are using 80 % of the memory for actual tags, rather than 50 % as we had before.

However, if the nodes are not full we could be wasting even more memory than before. If entities have three tags on average, then we are only using 30 % of the memory to store tags.

We can balance cache performance and memory use by changing MAX_TAGS_PER_NODE. Increasing it gives better cache coherence, because we can process more tags before we need to jump to a different memory location. However, increasing it also means more wasted memory. It is probably good to set the size so that "most entities" fit into a single node, but a few special ones (players and enemies maybe) need more.

One interesting thing to note about the cache misses is that we can get rid of them by sorting the nodes. If we sort them so that the nodes in the same next chain always appear directly after one another in the array, then walking the list will access the data linearly in memory, just as if we were accessing an array:

--------------------------------------------------
| A1 --|--> A2 --|--> A3 | B | C1 --|--> C2 |
--------------------------------------------------

Note that a complete ordering is not required, it is enough if the linked nodes end up together. Single nodes, such as the B node above could go anywhere.

Since these are dynamic lists where items will be added and removed all the time, we can't really do a full O(n log n) sort every time something changes. That would be too expensive. But we could sort the list "incrementally". Every time the list is accessed, we do a little bit of sorting work. As long as the rate of mutation is low compared to the rate of access, which you would expect in most circumstances, our sorting should be able to keep up with the mutations and keep the list "mostly sorted".

You would need a sorting algorithm that can be run incrementally and that works well with already sorted data. Two-way bubble sort perhaps? I haven't thought too deeply about this, because I haven't implemented this method in practice.

Custom memory allocator

Another option is to write a custom memory allocator to divide the bigger buffer up into smaller parts for memory allocations.

You might think that this is a much too complex solution, but a custom memory allocator doesn't necessarily need to be a complex thing. In fact, both the fixed size and linked list approaches described above could be said to be using a very simple kind of custom memory allocator: one that just allocates fixed blocks from an array. Such an allocator does not need many lines of code.

Another criticism against this approach is that if we are writing our own custom memory allocator, aren't we just duplicating the work that malloc() or new already does? What's the point of first complaining a lot about how problematic the use of malloc() can be and then go on to write our very own (and probably worse) implementation of malloc()?

The answer is that malloc() is a generic allocator that has to do well in a lot of different situations. If we have more detailed knowledge of how the allocator is used, we can write an allocator that is both simpler and performs better. For example, as seen above, when we know the allocations are fixed size we can make a very fast and simple allocator. System software typically uses such allocators (check out the slab allocator for instance) rather than relying on malloc().

In addition, we also get the benefit that I talked about in the previous post. Having all of a system's allocations in a single place (rather than mixed up with all other malloc() allocations) makes it much easier to reason about them and optimize them.

As I said above, the key to making something better than malloc() is to make use of the specific knowledge we have about the allocation patterns of our system. So what is special about our vector < vector < unsigned > > case?

1. There are no external pointers to the data.

All the pointers are managed by the TagComponent itself and never visible outside that component.

This means that we can "move around" memory blocks as we like, as long as the TagComponent keeps track of and updates its data structures with the new locations. So we don't have to worry (that much) about fragmentation, because when we need to, we can always move things around in order to defrag the memory.

I'm sure you can build something interesting based on that, but I actually want to explore another property:

2. Memory use always grows by a factor of two.

If you look at the implementation of std::vector or a similar class (since STL code tends to be pretty unreadable) you will see that the memory allocated always grows by a factor of two. (Some implementations may use 1.5 or something else, but usually it is 2. The exact figure doesn't matter that much.)

The vector class keeps track of two counters:

  • size which stores the number of items in the vector and
  • capacity which stores how many items the vector has room for, i.e. how much memory has been allocated.

If you try to push an item when size == capacity, more memory is needed. So what typically happens is that the vector allocates twice as much memory as was previously used (capacity *= 2) and then you can continue to push items.

This post is already getting pretty long, but if you haven't thought about it before you may wonder why the vector grows like this. Why doesn't it grow by one item at a time, or perhaps 16 items at a time.

The reason is that we want push_back() to be a cheap operation -- O(1) using computational complexity notation. When we reallocate the vector buffer, we have to move all the existing elements from the old place to the new place. This will take O(n) time. Here, n is the number of elements in the vector.

If we allocate one item at a time, then we need to allocate every time we push and since re-allocate takes O(n) that means push will also take O(n). Not good.

If we allocate 16 items at a time, then we need to allocate every 16th time we push, which means that push on average takes O(n)/16, which by the great laws of O(n) notation is still O(n). Oops!

But if we allocate 2*n items when we allocate, then we only need to reallocate after we have pushed n more items, which means that push on average takes O(n)/n. And O(n)/n is O(1), which is exactly what we wanted.

Note that it is just on average that push is O(1). Every n pushes, you will encounter a push that takes O(n) time. For this reason, push is said to run in amortized constant time. If you have really big vectors, that can cause an unacceptable hitch and in that case you may want to use something other than a vector to store the data.

Anyways, back to our regular programming.

The fact that our data (and indeed, any kind of dynamic data that uses the vector storage model) grows by powers of two is actually really interesting. Because it just so happens that there is an allocator that is very good at allocating blocks at sizes that are powers of two. It is called the buddy allocator and we will take a deeper look at it in the next post.

Blog Archive

Labels

.NET Programming 2D Drafting 3D Animation 3D Art 3D Artist 3D design 3D effects 3D Engineering 3D Materials 3D Modeling 3D models 3D presentation 3D Printing 3D rendering 3D scanning 3D scene 3D simulation 3D Sketch Inventor 3D Texturing 3D visualization 3D Web App 3ds Max 4D Simulation ACC Adaptive Clearing adaptive components Add-in Development Additive Manufacturing Advanced CAD features Advanced Modeling AEC Technology AEC Tools affordable Autodesk tools AI AI animation AI Assistance AI collaboration AI Design AI Design Tools AI Experts AI for Revit AI Guide AI in CAD AI in CNC AI in design AI in Manufacturing AI in Revit AI insights AI lighting AI rigging AI Tips AI Tools AI troubleshooting AI workflow AI-assisted AI-assisted rendering AI-enhanced Animation animation pipeline animation tips Animation workflow annotation AR architectural design architectural modeling architectural preservation architectural visualization Architecture architecture design Architecture Engineering Architecture Firm Architecture Productivity architecture software architecture technology Architecture Workflow Arnold Renderer Arnold Shader Artificial Intelligence As-Built Model Asset Management augmented reality AutoCAD AutoCAD advice AutoCAD API AutoCAD Basics AutoCAD Beginner AutoCAD Beginners AutoCAD Civil 3D AutoCAD Civil3D AutoCAD commands AutoCAD efficiency AutoCAD Expert Advice AutoCAD features AutoCAD File Management AutoCAD Layer AutoCAD Layers AutoCAD learning AutoCAD print settings AutoCAD productivity AutoCAD Teaching AutoCAD Techniques AutoCAD tips AutoCAD tools AutoCAD training. AutoCAD tricks AutoCAD Tutorial AutoCAD workflow AutoCAD Xref Autodesk Autodesk 2025 Autodesk 2026 Autodesk 3ds Max Autodesk AI Autodesk AI Tools Autodesk Alias Autodesk AutoCAD Autodesk BIM Autodesk BIM 360 Autodesk Certification Autodesk Civil 3D Autodesk Cloud Autodesk community forums Autodesk Construction Cloud Autodesk Docs Autodesk Dynamo Autodesk features Autodesk for Education Autodesk Forge Autodesk FormIt Autodesk Fusion Autodesk Fusion 360 Autodesk help Autodesk InfraWorks Autodesk Inventor Autodesk Inventor Frame Generator Autodesk Inventor iLogic Autodesk Knowledge Network Autodesk License Autodesk Maya Autodesk mistakes Autodesk Navisworks Autodesk news Autodesk plugins Autodesk productivity Autodesk Recap Autodesk resources Autodesk Revit Autodesk Software Autodesk support ecosystem Autodesk Takeoff Autodesk Tips Autodesk training Autodesk tutorials Autodesk update Autodesk Upgrade Autodesk Vault Autodesk Video Autodesk Viewer Automated Design Automation Automation Tutorial automotive design automotive visualization Backup Basic Commands Basics Batch Plot Beginner Beginner Tips beginner tutorial beginners guide Big Data BIM BIM 360 BIM Challenges BIM collaboration BIM Compliance BIM Coordination BIM Data BIM Design BIM Efficiency BIM for Infrastructure BIM Implementation BIM Library BIM Management BIM modeling BIM software BIM Standards BIM technology BIM tools BIM Trends BIM workflow Block Editor Block Management Block Organization Building Design Software Building Maintenance building modeling Building Systems Building Technology ByLayer CAD CAD API CAD assembly CAD Automation CAD Blocks CAD CAM CAD commands CAD comparison CAD Customization CAD Data Management CAD Design CAD errors CAD Evolution CAD File Size Reduction CAD Integration CAD Learning CAD line thickness CAD management CAD Migration CAD mistakes CAD modeling CAD Optimization CAD plugins CAD Productivity CAD Rendering CAD Security CAD Skills CAD software CAD software 2026 CAD software training CAD standards CAD technology CAD Tips CAD Tools CAD tricks CAD Tutorial CAD workflow CAM car design software Case Study CEO Guide CGI design Character Rig cinematic lighting Civil 3D Civil 3D hidden gems Civil 3D productivity Civil 3D tips civil design software civil engineering Civil engineering software Clash Detection Class-A surfacing clean CAD file cleaning command client engagement Cloud CAD Cloud Collaboration Cloud design platform Cloud Engineering Cloud Management Cloud Storage Cloud-First CNC CNC machining collaboration command abbreviations Complex Renovation concept car conceptual workflow Connected Design construction Construction Analytics Construction Automation Construction BIM Construction Cloud Construction Planning Construction Scheduling Construction Technology contractor tools Contractor Workflow Contraints corridor design Cost Effective Design cost estimation Create resizable blocks Creative Teams CTB STB Custom visual styles Cutting Parameters Cybersecurity Data Backup data management Data Protection Data Reference Data Security Data Shortcut Design Automation Design Career Design Collaboration Design Comparison Design Coordination design efficiency Design Engineering Design Hacks Design Innovation design optimization Design Options design productivity design review Design Rules design software design software tips Design Technology design tips Design Tools Design Workflow design-to-construction Designer Designer Tools Digital Art Digital Assets Digital Construction Digital Construction Technology Digital Content Digital Design Digital engineering digital fabrication Digital Manufacturing digital marketing digital takeoff Digital Thread Digital Tools Digital Transformation Digital Twin Digital Twins digital workflow dimension dimensioning Disaster Recovery drafting Drafting Efficiency Drafting Shortcuts Drafting Standards Drafting Tips Drawing Drawing Automation drawing tips Dref Dynamic Block Dynamic Block AutoCAD Dynamic Blocks Dynamic doors Dynamic windows Dynamo Dynamo automation early stage design eco design editing commands Electrical Systems Emerging Features Energy Analysis energy efficiency Engineering Engineering Automation engineering data Engineering Design Engineering Innovation Engineering Productivity Engineering Skills engineering software Engineering Technology engineering tools Engineering Tools 2025 Engineering Workflow Excel Export Workflow Express Tools External Reference facial animation Facial Rigging Facility Management Families Fast Structural Design Field Documentation File Optimization File Recovery Flame flange tips flat pattern Forge Development Forge Viewer FreeCAD Fusion 360 Fusion 360 API Fusion 360 tutorial Future of Design Future Skills Game Development Gamification Generative Design Geospatial Data GIS Global design teams global illumination grading optimization green building Green Technology Grips Handoff HDRI health check Healthcare Facilities heavy CAD file Heavy CAD Files heritage building conservation hidden commands Hospital Design HVAC HVAC Design Tools HVAC Engineering Hydraulic Modeling IK/FK iLogic Import Workflow Industry 4.0 Infrastructure infrastructure design Infrastructure Monitoring Infrastructure Planning Infrastructure Technology InfraWorks innovation Insight intelligent modeling Interactive Design interactive presentation Interior Design Inventor Inventor API Inventor Drawing Template Inventor Frame Generator Inventor Graphics Issues Inventor IDW Inventor Tips Inventor Tutorial IoT ISO 19650 joints Keyboard Shortcuts keyframe animation Keyframe generation Landscape Design Large Projects Laser Scan Layer Management Layer Organization Learn AutoCAD Legacy CAD Licensing light techniques Lighting and shading Lighting Techniques Linked Models Machine Learning Machine Learning in CAD Machine Optimization Machining Efficiency maintenance command Management manufacturing Manufacturing Innovation Manufacturing Technology Mapping Technology marketing visuals Material Creation Maya Maya character animation Maya lighting Maya Shader Maya Tips Maya tutorial measurement Mechanical Design Mechanical Engineering Media & Entertainment MEP Modeling Mesh-to-BIM Metal Structure modal analysis Model Management Model Optimization Modeling Secrets Modular Housing Motion capture motion graphics motion simulation MotionBuilder Multi Office Workflow Multi-User Environment multileader Navisworks Navisworks Best Practices Net Zero Design ObjectARX .NET API Open Source CAD Organization OVERKILL OVERKILL AutoCAD Page Setup Palette Parametric Components parametric design parametric family Parametric Modeling particle effects particle systems PDF PDM system Personal Brand Phasing PlanGrid Plot Settings Plot Style Plot Style AutoCAD Plotting Plugin Tutorial Plumbing Design point cloud Portfolio Post Construction Post-Processing Practice Drawing preconstruction workflow predictive analysis predictive animation Predictive Maintenance Predictive rigging Prefabrication Presentation-ready visuals Printing Printing Quality Procedural animation procedural motion Procedural Rig Procedural Textures Product Design Product Development product lifecycle product rendering Productivity productivity tools Professional 3D design Professional CAD Professional Drawings professional printing Professional Tips Project Documentation project efficiency project management Project Management Tools Project Visualization PTC Creo PURGE PURGE AutoCAD Rail Transit Rapid Prototyping realistic rendering ReCap Redshift Shader reduce CAD file size Render Render Passes Render Quality Render Settings Rendering rendering engine Rendering Engines Rendering Optimization rendering software Rendering Tips Rendering Workflow RenderMan Renewable Energy Renovation Project Renovation Workflow Reports Resizable Block restoration workflow Revit Revit add-ins Revit API Revit automation Revit Best Practices Revit Collaboration Revit Documentation Revit Family Revit integration Revit MEP Revit Performance Revit Phasing Revit Plugins Revit Scripting Revit skills Revit Standards Revit Template Revit Tips Revit tutorial Revit Workflow Ribbon Rigging robotics ROI Scale Autodesk Schedules screen Sculpting Secure Collaboration Sensor Data Shader Networks Sheet Metal Design Sheet Metal Tricks Sheet Set Manager shortcut keys Shortcuts Siemens NX Simulation simulation tools Sketch Sketching Tricks Small Firms Smart Architecture Smart Block Smart Building Design Smart City Smart Design Smart Engineering Smart Factory Smart Infrastructur Software Compliance software ecosystem Software Management Software Trends software troubleshooting Software Update Solar Energy Solar Panels SolidWorks Startup Design static stress Steel Structure Design Structural Optimization subscription model Subscription Value Surface Modeling sustainability sustainable design Sustainable Manufacturing system performance T-Spline team training guide Technical Drawing technical support Template Setup text style Texture Mapping Texturing thermal analysis Time Management time saving tools Title Blocks toolbar Toolpath Optimization Toolpaths Topography Troubleshooting Tutorial Tutorials urban planning User Interface (UI) UV Mapping UV Unwrap V-Ray Vault Best Practices Vault Lifecycle Vault Mistakes Vector Plotting vehicle modeling VFX Viewport configuration Virtual Environments virtual reality visual effects visualization workflow VR VR Tools VRED Water Infrastructure Water Management Weight Painting What’s New in Autodesk Wind Energy Wind Turbines Workbook workflow Workflow Automation workflow efficiency Workflow Optimization Workflow Tips Worksets Worksharing Workspace XLS Xref Xrefs เขียนแบบ