Making My Own Smart Pointers

You can find all the source code in this article on my repo.

I have a plan to implement my own Memory Manager, replacing C/C++’s standard dynamic memory allocation in malloc and new which invoke the operating system’s memory manager. However, in order to make my implementation work, I’ll first need to re-implement C++’s Smart Pointers. I’ll get more into the why and how of the memory manager in a future post.

Types of Smart Pointers

The standard library has three types of smart pointers.

shared_ptr is a reference-counted smart pointer. Every copy increments the reference count and every destructor call decrements it. When the last reference is deleted the object is finally destroyed. What I really like about shared_ptr is that it implements garbage collection only RAII. There’s no overhead of some garbage collector or dead memory waiting around to be garbage collected.

weak_ptr is a non-owning reference to a shared_ptr. A weak_ptr can only exist in context to an existing shared_ptr. Through the weak_ptr you can check if the object is still alive and also retrieve a shared_ptr to that object.

unique_ptr is like a shared_ptr but the reference count is locked at 1. It cannot be copied, only moved. When the unique_ptr goes out of scope, the object it references is destroyed. You can also release ownership to get a raw pointer. Then, if you wish, you can hand that raw pointer to a shared_ptr to let it manage the memory for you.

In my engine I’m only using shared_ptr and weak_ptr so far, so those are the only two I’m going to implement for now. I leave unique_ptr as an exercise to the reader :)

The problem with `std::shared_ptr`

The above diagram is a simplified depiction of how a shared_ptr works in C++. A shared_ptr has to data members:

A pointer to the reference count.
A pointer to the shared data.

Notice the ... between each piece of data. That is intended to depict some sort of heap-allocation that was created and later freed. Fragmentation is the bane of memory managers because it increases the amount of time it takes to find a suitable location to allocate space and limits the maximum size allowed for an allocation.

std::shared_ptr is prone to increase fragmentation like any other standard library dynamic memory allocation in C/C++. The only way I know of to eliminate fragmentation is to have a memory manager that periodically will shuffle the memory around such that all free memory is completely contiguous.

Performing this “shuffle” would be completely impossible with std::shared_ptr. The memory manager would need to be able to go to every instance of a std::shared_ptr and change the pointer they’re storing to the new address.

I’ve decided to go with a handle-based approach where a Smart Pointer doesn’t actually reference the shared data directly but instead references a shared handle which references the data. This way, when the memory manager defragments the heap it can just go through all of the handles and update their addresses.

My SharedPtr only needs one data member: the pointer to the Handle. A Handle stores both the reference count and the pointer to the shared memory. The Handles are guaranteed to never be shuffled by the memory manager.

There’s one glaring issue with my approach: Accessing the data requires two dereferences. This has the potential to thrash the cache and completely ruin performance. However, I’m not so certain this is the end of the world. Higher-level languages such as C# which implement even more complex systems must surely have at least two dereferences under the hood. I won’t know for sure how much I’m paying for this defragmentation until I’ve finished implementing and benchmarking it, so we’ll have to wait and see.

Something I do really like about my system is that my SharedPtr is half the size of std::shared_ptr. We’re only talking 8 bytes verses 16 bytes, but I’d like to think that with how much I’m using Smart Pointers in my engine that these will add up.

The Code

Something I’d first like to point out is that, for now, my Smart Pointers are using C++’s standard allocation with operator new. In the future I will integrate my own memory manager and all Handles will go into a static data structure. The challenge of how to implement Smart Pointers is separate from where exactly they allocate memory, so this article goes on.

std::weak_ptr cannot directly modify the referenced object without first acquiring a std::shared_ptr by calling std::weak_ptr::lock(). For convenience, I decided that I’d like to be able to dereference my own WeakPtr. With this change, there’s a lot SharedPtr and WeakPtr have in common. As such, I’ve decided to make them share a common base class.

template<typename T>
class SmartPtr
{
protected:
	struct Handle final
	{
		T* ptr;
		uint32_t sharedCount;
		uint32_t weakCount;
		
		explicit Handle(T* ptr, const uint32_t count) noexcept :
			ptr(ptr),
			sharedCount(count),
			weakCount(0) {}					
		
		Handle() = default;
		Handle(const Handle&) = delete;
		Handle(Handle&&) noexcept = delete;
		Handle& operator=(const Handle&) = delete;
		Handle& operator=(Handle&&) noexcept = delete;
		~Handle() = default;
	};

	Handle* handle;

	explicit SmartPtr(Handle* handle = nullptr) noexcept :
		handle(handle) {}

	SmartPtr(const SmartPtr& other) noexcept :
		handle(other.handle) {}

	SmartPtr(SmartPtr&& other) noexcept :
		handle(other.handle)
	{
		other.handle = nullptr;
	}

	~SmartPtr() = default;
};

An odd thing you might notice is that Handle has two counts, one for SharedPtrs and WeakPtrs. We need to keep track of both because we can’t destroy the Handle itself until no more SmartPtrs reference it at all. Let describe an example of something that could happen without the weakCount:

We have a SharedPtr and a WeakPtr that reference the same object.
The last SharedPtr is destroyed. As such, the shared object as well as the Handle are destroyed.
The WeakPtr now references stale memory. But it gets worse.
A new SharedPtr is created and its Handle happens to go in the same place the previous Handle was.
The WeakPtr now references a completely different object from what it originally referenced.

template<typename T>
T& SmartPtr<T>::operator*()
{
	if (handle && handle->ptr)
	{
		return *handle->ptr;
	}
	throw NullReferenceException();
}

template<typename T>
const T& SmartPtr<T>::operator*() const
{
	return const_cast<SmartPtr<T>*>(this)->operator*();
}

template<typename T>
T* SmartPtr<T>::operator->()
{
	return &operator*();
}

template<typename T>
const T* SmartPtr<T>::operator->() const
{
	return const_cast<SmartPtr<T>*>(this)->operator->();
}

template<typename T>
SmartPtr<T>::operator bool() const noexcept
{
	return handle && handle->ptr;
}

template<typename T>
size_t SmartPtr<T>::ReferenceCount() noexcept
{
	return handle ? handle->sharedCount : 0;
}

template<typename T>
T* SmartPtr<T>::Raw() noexcept
{
	return handle ? handle->ptr : nullptr;
}

The remaining operators and methods are exactly what you’d expect. They perform a double-dereference to access the data and wisely call each-other to reduce duplicate code. Similar to the standard library, I provide an operator bool because many people (including myself) like to treat pointers as booleans rather than comparing to nullptr.

With the base class established, the derived class SharedPtr hardly needs any methods, only constructors and assignment operators.

template<typename T>
class SharedPtr : public SmartPtr<T>
{
	using Base = SmartPtr<T>;

public:
	explicit SharedPtr(T* ptr) noexcept;

	template<Concept::Related<T> U>
	SharedPtr(const SharedPtr<U>& other) noexcept;
	template<Concept::Related<T> U>
	SharedPtr(SharedPtr<U>&& other) noexcept;
	
	template<Concept::Related<T> U>
	SharedPtr& operator=(const SharedPtr<U>& other) noexcept;
	template<Concept::Related<T> U>
	SharedPtr& operator=(SharedPtr<U>&& other) noexcept;

	SharedPtr() noexcept = default;
	SharedPtr(const SharedPtr& other) noexcept;
	SharedPtr(SharedPtr&& other) noexcept;
	SharedPtr& operator=(const SharedPtr& other) noexcept;
	SharedPtr& operator=(SharedPtr&& other) noexcept;
	~SharedPtr();

	template<typename... Args>
	static SharedPtr Make(Args... args)
	{
		return SharedPtr(new T(std::forward<Args>(args)...));
	}
};

The standard library has a std::make_shared free function for creating a std::shared_ptr. I’ve decided to move this inside of my class as a static method. But why exactly do we have this method anyway? Why not just use the constructor?

The issue comes with default construction. If SharedPtr’s constructor could take variadic arguments to pass along to the T’s constructor, then what happens to the default constructor? Do we construct a default-constructed T? Or do we create a SharedPtr to nullptr?

To avoid this ambiguity, std::make_shared was created. My alternative is SharedPtr::Make.

template<typename T>
SharedPtr<T>::SharedPtr(T* ptr) noexcept :
	Base(new typename Base::Handle(ptr, 1)) {}

The constructor which takes a raw pointer is useful for if you have a raw pointer (perhaps taken from a unique ptr) and want to let it be managed by RAII reference counting.

Notably, the pointer must be allocated in the same way SharedPtr::Make would allocate it. Meaning that if it was allocated outside of the memory manager then this will eventually lead to undefined behavior either when it comes time to defragment or when the object must be destroyed.

This actually isn’t quite the full implementation of this constructor. See the section on EnableSharedFromThis below.

template<typename T>
SharedPtr<T>::SharedPtr(const SharedPtr& other) noexcept :
	Base(other)
{
	if (this->handle)
	{
		++this->handle->sharedCount;
	}
}

template<typename T>
SharedPtr<T>::SharedPtr(SharedPtr&& other) noexcept :
	Base(std::move(other)) {}

The move/copy constructors are simple. It’s important we increment the reference count when copying. That’s the whole point of reference counted smart objects.

template<typename T>
SharedPtr<T>& SharedPtr<T>::operator=(const SharedPtr& other) noexcept
{
	if (this != &other)
	{
		this->~SharedPtr();
		this->handle = other.handle;
		if (this->handle)
		{
			++this->handle->sharedCount;
		}
	}
	return *this;
}

template<typename T>
SharedPtr<T>& SharedPtr<T>::operator=(SharedPtr&& other) noexcept
{
	if (this != &other)
	{
		this->~SharedPtr();
		this->handle = other.handle;
		other.handle = nullptr;
	}
	return *this;
}

The assignment operators do nearly the same thing except they invoke the destructor before taking on the other’s Handle. This is because we’re essentially destroying this reference to the handle. The destructor happens to contain all the logic necessary for handling what happens when we do that.

Something interesting is that we must invoke the destructor with this->~SharedPtr() rather than simply ~SharedPtr(). This is because we have an implicit operator bool. So what the compiler actually evaluates ~SharedPtr() to is calling the default constructor, calling operator bool and then performing a bitwise not on the result.

template<typename T>
SharedPtr<T>::~SharedPtr()
{
	if (this->handle)
	{
		--this->handle->sharedCount;
		if (this->handle->sharedCount == 0)
		{
			if (this->handle->weakCount == 0)
			{
				delete this->handle;
				this->handle = nullptr;
			}
			else
			{
				delete this->handle->ptr;
				this->handle->ptr = nullptr;
			}
		}
	}		
}

The destructor has numerous checks to make sure it’s safe to destroy the shared reference. If there’s no more weak pointers to the Handle either, we can also destroy the handle itself.

template<typename T>
template<Concept::Related<T> U>
SharedPtr<T>::SharedPtr(const SharedPtr<U>& other) noexcept :
	Base(reinterpret_cast<const SharedPtr&>(other))
{
	if (this->handle)
	{
		++this->handle->sharedCount;
	}
}

template<typename T>
template<Concept::Related<T> U>
SharedPtr<T>::SharedPtr(SharedPtr<U>&& other) noexcept :
	Base(std::move(reinterpret_cast<SharedPtr&&>(other))) {}

template<typename T>
template<Concept::Related<T> U>
SharedPtr<T>& SharedPtr<T>::operator=(const SharedPtr<U>& other) noexcept
{
	this->~SharedPtr();
	this->handle = reinterpret_cast<const SharedPtr&>(other).handle;
	if (this->handle)
	{
		++this->handle->sharedCount;
	}
	return *this;
}

template<typename T>
template<Concept::Related<T> U>
SharedPtr<T>& SharedPtr<T>::operator=(SharedPtr<U>&& other) noexcept
{
	this->~SharedPtr();
	auto& handle = reinterpret_cast<const SharedPtr&>(other).handle;		
	this->handle = handle;
	handle = nullptr;
	return *this;
}

It’s absolutely essential that we have copy/move semantics with SharedPtrs of different types. This is so we can instantiate a SharedPtr of a derived type and assign it to a SharedPtr of a base type. However, we shouldn’t allow just any conversion. Only related types. I’m using C++20 concepts to constrain the types allowed.

template<typename T, typename U>
concept Related = std::derived_from<T, U> ||
				  std::is_base_of_v<T, U> ||
				  std::is_same_v<T, U>;

There’s plenty of ways a Related concept could be defined, this is the one I chose. Keep in mind that performance with concepts is typically not something to worry about at all. These are evaluated at compile-time so they have no effect on the runtime performance of your application. That said, they could have the potential to slow down compilations. If this were a project being built frequently and it was taking an inordinate amount of time to compile, I might look at simplifying this concept. Realistically, it would be much more likely for a build to be taking longer for reasons other than a 3-boolean type-check.

WeakPtr turns out to be nearly the same code, just with the destructor not destroying the shared reference, only the Handle when this happens to be the last WeakPtr being destroyed.

template<typename T>
class WeakPtr : public SmartPtr<T>
{
	using Base = SmartPtr<T>;

public:
	WeakPtr(const SharedPtr<T>& shared) noexcept :
		Base(reinterpret_cast<const WeakPtr&>(shared))
	{
		if (this->handle)
		{
			++this->handle->weakCount;
		}
	}
	
	WeakPtr() noexcept = default;

	WeakPtr(const WeakPtr& other) noexcept :
		Base(other.handle)
	{
		if (this->handle)
		{
			++this->handle->weakCount;
		}
	}

	WeakPtr(WeakPtr&& other) noexcept :
		Base(std::move(other)) {}

	WeakPtr& operator=(const WeakPtr& other) noexcept
	{
		if (this != &other)
		{
			this->~WeakPtr();
			this->handle = other.handle;
			if (this->handle)
			{
				++this->handle->weakCount;
			}
		}
		return *this;
	}

	WeakPtr& operator=(WeakPtr&& other) noexcept
	{
		if (this != &other)
		{
			this->~WeakPtr();
			this->handle = other.handle;
			other.handle = nullptr;
		}
		return *this;
	}

	~WeakPtr() noexcept
	{
		if (this->handle)
		{
			--this->handle->weakCount;
			if (this->handle->sharedCount == 0 && this->handle->weakCount == 0)
			{
				delete this->handle;
				this->handle = nullptr;
			}
		}
	}

	operator SharedPtr<T>() noexcept
	{
		return *reinterpret_cast<SharedPtr<T>*>(this);
	}

	bool Expired() const noexcept
	{
		return !this->handle || this->handle->sharedCount == 0; 
	}
};

Something neat to point out is that the implicit conversion operator for getting a SharedPtr from a WeakPtr only needs to do a reinterpret_cast. This is because on return the copy constructor of SharedPtr will be invoked and the reference count will end up being properly incremented.

EnableSharedFromThis

If you read my post where I created an Entity you would know that I used std::enable_shared_from_this to be able to get a std::shared_ptr from the this pointer. To be able to transition the rest of my engine to using my own smart pointers, I would need to re-implement this as well. My implementation works exactly the same as the standard library’s.

EnableSharedFromThis works by having a single data member: a WeakPtr. That WeakPtr references the same handle that all SharedPtr to that object reference. The object must be instantiated as a SharedPtr such as through SharedPtr::Make. If the object was not, such as if it were stack allocated, then the WeakPtr has nothing to reference and the behavior is undefined.

template<typename T>
SharedPtr<T>::SharedPtr(T* ptr) noexcept :
	Base(new typename Base::Handle(ptr, 1))
{
	if constexpr (std::derived_from<T, EnableSharedFromThis<T>>)
	{
		reinterpret_cast<SharedPtr&>(ptr->weakThis).handle = this->handle;
		++this->handle->weakCount;
	}
}

EnableSharedFromThis actually doesn’t do anything itself to set up the WeakPtr. All of the work is actually done inside of SharedPtr’s constructor. At compile-time, we can determine if the type we’re storing inherits from EnableSharedFromThis. If it does, then we can assign the WeakPtr properly.

template<typename Derived>
class EnableSharedFromThis
{	
	WeakPtr<Derived> weakThis{};

protected:
	constexpr EnableSharedFromThis() noexcept = default;

	EnableSharedFromThis(const EnableSharedFromThis&) noexcept :
		weakThis() {}

	EnableSharedFromThis& operator=(const EnableSharedFromThis&) noexcept
	{
		return *this;
	}

	EnableSharedFromThis(EnableSharedFromThis&&) noexcept = default;
	EnableSharedFromThis& operator=(EnableSharedFromThis&&) noexcept = default;
	~EnableSharedFromThis() noexcept = default;

public:
	SharedPtr<Derived> SharedFromThis()
	{
		return SharedPtr<Derived>(WeakFromThis());
	}

	SharedPtr<const Derived> SharedFromThis() const
	{
		return const_cast<EnableSharedFromThis*>(this)->SharedFromThis();
	}

	WeakPtr<Derived> WeakFromThis()
	{
		return weakThis;
	}

	WeakPtr<const Derived> WeakFromThis() const
	{
		return const_cast<EnableSharedFromThis*>(this)->WeakFromThis();
	}
};

The only funky thing about EnableSharedFromThis itself is that on copy we don’t touch the WeakPtr. It may seem strange, but it’s actually not the job of EnableSharedFromThis to modify its own data member at all. That’s because inside of EnableSharedFromThis’s constructors, the SharedPtr we need to reference isn’t yet created. It’s not until SharedPtr’s constructors that we can think about initializing the WeakPtr.

The principal of reference-counted smart pointers is pretty simple. Just increment the count on construction and copy while decrementing it on destruction. What makes smart pointers tricky are all the edge cases. I had 90% of the code finished in under and hour but it wasn’t until I started integrating it into my engine that the bugs started popping up. These edge cases weren’t obvious enough for me to know to explicitly unit test them beforehand, so debugging them within my engine was a chore, especially since I’m instantiating a lot of smart pointers at static initialization time and Visual Studio’s debugger seems to have trouble knowing the values of things before main.

Overall, it wasn’t too difficult though and now I’ve got a piece of the puzzle for my memory manager ready.

_{You can find all the source code in this article on my repo.}

Types of Smart Pointers

The problem with std::shared_ptr

The Code

EnableSharedFromThis

The problem with `std::shared_ptr`