Getting right back at it
Alright, last time we finished writing the test case. It revealed to us an interesting behavior. Namely — sometimes when we think we deleted something from the storage, it actually still lingered in there. At least partially. Really, please read through Part 1.
Now, let’s analyze what’s going on! Don’t worry, I’ll guide you through most of it so it’s not NEARly as complicated as it might seem.
We shall investigate it by looking at the raw storage. Adding this lines at the end of our test function should help us:
let storage = contract.view_state().await?;
println!("{storage:#?}");
This should print a HashMap that is holding the whole contract’s storage.
Official documentation on BorshSerialization can be found here, but long story short — contracts store data as key-value pairs. You can think of it as a (Hash)Map (which is quite literally what the view_state function returns).
Now, we should see a bunch of different entries. There should be one with the key equal to: [83, 84, 65, 84, 69]. If we convert those bytes to ASCII we can see that this key is a serialized word: “STATE”.
Wait. What?! ASCII?! Rust is UTF-8, you NEARsighted fool of an author! Yeah, yeah, I know it is. Fun fact — UTF-8 is compliant with ASCII (or is it the ASCII that’s compliant with UTF-8?). Anyway, the characters that are defined in ASCII are also defined in UTF-8. And they do encode to the same values, so… although it technically is UTF-8, functionally it’s going to be the same for our example. For the sake of this article ASCII and UTF-8 will be used interchangeably.
The value for this key should be [3, 0, 0, 0, 109, 97, 112] (if you used the code from this article). What is this? Well, converting [109, 97, 112] to ASCII/UTF-8 gives us “map”, which is the prefix we used for the LookupMap in our storage! What is that “[3, 0, 0, 0]” at the beginning? Well, this is just the length of our prefix. Remember, NEAR uses wasm32-unknown-unknown target architecture for the smart contracts. wasm32 means that it is 32-bit, so the integers, longs, and pointers are 32 bit values. So, if we serialize “3” as a 32-bit little-endian integer, we get 0x030000. And as a vector of bytes that is [3, 0, 0, 0]. Perfect, now we know what the “STATE” key means. What about the others? Well, since all we have is a LookupMap that is pointing to various UnorderedSets then it must be related to that. And that is precisely the case.
Now, we’ve been using a value of 1 as a key for our LookupMap. If you look closely at the storage, you should see that the map also contains a key equal to [109, 97, 112, 1]. That is just our “map” prefix concatenated with a byte value of 1. The value for that key appears to be not as straightforward as the one for “STATE”, although it is
related to the UnorderedSet. To understand this, we need to take a peek at the UnorderedSet’s implementation. Yes, we go beyond the documentation. It’s time to read the library code itself.
NEAR SDK’s definition is as follows:
#[derive(BorshDeserialize, BorshSerialize)]
pub struct UnorderedSet<T, H = Sha256>
where
T: BorshSerialize + Ord,
H: ToKey,
{
elements: FreeList<T>,
index: LookupMap<T, FreeListIndex, H>,
}
Fun fact, UnorderedSet is using a LookupMap under the hood! We already have some idea of how LookupMap might impact that value, but what is a FreeList?
Well, it is a struct that is also defined within the NEAR SDK and it looks like this:
/// Index for value within a bucket.
#[derive(BorshSerialize, BorshDeserialize, Debug, Hash, PartialEq, Eq, Clone, Copy)]
pub struct FreeListIndex(pub(crate) u32);
/// Unordered container of values. This is similar to [`Vector`] except that values are not
/// re-arranged on removal, keeping the indices consistent. When an element is removed, it will
/// be replaced with an empty cell which will be populated on the next insertion.
pub struct FreeList<T>
where
T: BorshSerialize,
{
first_free: Option<FreeListIndex>,
occupied_count: u32,
elements: Vector<Slot<T>>,
}
//? Manual implementations needed only because borsh derive is leaking field types
// https://github.com/near/borsh-rs/issues/41
impl<T> BorshSerialize for FreeList<T>
where
T: BorshSerialize,
{
fn serialize<W: borsh::maybestd::io::Write>(
&self,
writer: &mut W,
) -> Result<(), borsh::maybestd::io::Error> {
BorshSerialize::serialize(&self.first_free, writer)?;
BorshSerialize::serialize(&self.occupied_count, writer)?;
BorshSerialize::serialize(&self.elements, writer)?;
Ok(())
}
}
Okay, so it contains some Optional values, a u32 and a Vector. If we look at its serialize implementation, we can see that its serialized version is simply serializing every field and concatenating them together (that’s what borsh does anyway, but go ahead and read through the linked issue to know why the manual implementation was preferred).
Now, you can check for yourself, or you can just believe me (I highly encourage you to check for yourself, though). But Option<T> is serialized as 1 byte for Option itself (0 is None and 1 is Some) and then follows nothing if the variant is None or the serialized T if it is Some. I know, technically you’d only need a single bit to serialize Option, but it is serialized to a byte (don’t kill the messenger, we’ll learn why later). Then we have a u32, we already know that’s going to be serialized to a 4-byte long array (or vector, or slice, however you want to think of it at the moment). And then the Vector! The Vector contains values of type Slot<T>. For now we don’t need to know anything about it, though. The Vectors are serialized like this:
// pseudo-code(?)
[vector's lenght, length of prefix to find vector's elements, the prefix itself]
We are actually using the new function in our contract, but it’s just a wrapper for with_hasher. Our prefix for the UnorderedSet was just a byte 1. Go ahead and analyze the code, but to keep things short (in already a pretty lengthy article), FreeList’s new function takes its prefix which is equal to [1, 118] in our case (ASCII “v” is 118). This prefix is called vec_key, which gives us the hint that it is going to be the prefix for that Vector mentioned before. The map_key used as a prefix for a LookupMap is equal to [1, 109](again — “m” as a byte is 109).
Alright, let’s try to interpret the serialized UnorderedSet. In my case (which should also be your case), it’s like this:
[
0, // FreeList.first_free is None
1, // occupied_count = 1
0,
0,
0,
1, // The element's vector length is 1
0,
0,
0,
2, // To find it's content look for a 2byte long prefix
0,
0,
0,
1, // prefix is 1
118, // v = 1v
// This is the LookupMap part, analogous to what we discussed with `STATE`
2, // 2-byte long prefix
0,
0,
0,
1, // prefix is 1
109, // m = 1m
]
Something interesting already? Well, yes. This serialized value tells us that there is only one element in our UnorderedSet! But hey, we saw in our test that the contract claims that every AccountId we previously added is also present there. So what gives? When we get all of the entries from the UnorderedSet we get only that last AccountId we added (as expected). Did we interpret something wrong? Nope, we did it correctly. We can even verify it. Now, we know that if we’d like to access the elements of that Vec in the UnorderedSet we need to look for the [1, 118] prefix!
But before we do that, it’s already been a pretty long article (and there’s still more). Now would be a good time to take a break if you need one. Go ahead, stretch your legs, have some water. And enjoy this picture of a cat.
The cat is not judging you, that’s only its face!
Now, this would probably be a good place to wrap this part and point you to the next one… but who wants to wait? I won’t force you to.
Don’t worry, the article NEARs its end. Let’s keep grinding!
What’s next? Ah, right. You should see an entry with the key [1, 118, 0, 0, 0, 0]. Why is that the key? It is the prefix for the Vec containing the values from the UnorderedSet along with 0 serialized as a 32-bit number. It is the key for the first value in the Vec (since they are zero-indexed)!
It is a quite long entry, but it should look like this: [0, 33, 0, 0, 0, more_bytes_here]. Go ahead and copy the more_bytes_here part. Treat every byte as an UTF-8 (in that case it’s the same as ASCII) and convert it to string. You will get the AccountId that we added last, that “new” one after the deletion. The [33, 0, 0, 0] part is… the length of the AccountId. AccountId is essentially a wrapper around a String, and Strings are serialized as their length concatenated with the actual String. Alright, but what is that 0 in front? It’s totally not relevant to what we want to get to… but I won’t leave you hanging! If you recall, the Vec that is a member of a FreeList (which is a member of UnorderedSet) is not a Vec<T>, instead it is Vec<Slot<T>> . I said it wasn’t relevant then. Well, it is now. The Slot is an enum:
#[derive(BorshDeserialize, BorshSerialize, Debug)]
enum Slot<T> {
/// Represents a filled cell of a value in the collection.
Occupied(T),
/// Representing that the cell has been removed, points to next empty cell, if one previously
/// existed.
Empty { next_free: Option<FreeListIndex> },
}
Now, enums serialize very similarly to Options. They also serialize into one byte and whatever is inside a variant (if at all) is appended after this. Well, it is actually the other way around, because Option is in fact an enum. Why only one byte since we can have more than 256 variants in an enum? The answer is simple — BorshSerialization library simply implemented it to support up to 256 variants (and let’s face it… if you need more, then something is probably wrong with your software’s architecture). So, now we know that the AccountId in the Vec is an Occupied variant.
Everything should make sense for now. So, let’s address the elephant in the room. If you’ve looked through our key-value pairs in the storage, you should notice that there is also a key [1, 118, 1, 0, 0 ,0]. As we now know, this means that it’s the value for the second element in the Vec (with index of 1). Heeey! It was supposed to be of length equal to 1! Alright, let’s analyze what happens.
First, we deleted the UnorderedSet by calling the remove function on the LookupMap that it contains.
pub fn remove<Q: ?Sized>(&mut self, k: &Q) -> Option<V>
where
K: Borrow<Q>,
Q: BorshSerialize + ToOwned<Owned = K>,
{
self.get_mut_inner(k).replace(None)
}
Essentially, we get the UnorderedSet and we replace it (quite literally with core::mem::replace, go ahead, see for yourself!) with None. It means that we make the LookupMap “think” it is now empty, but nothing was deleted. It is actually quite an efficient approach and is honestly not a bad way of doing it!
What happens next is that we simply create a new set using the same prefix by indirectly calling that with_hasher function we saw earlier. Since we used the same prefix, the calculated prefixes for the internal FreeList and LookupMap are also the same as before, and calling new on it creates a new struct that thinks it does not have any elements:
impl<T> FreeList<T>
where
T: BorshSerialize,
{
pub fn new<S: IntoStorageKey>(prefix: S) -> Self {
Self { first_free: None, occupied_count: 0, elements: Vector::new(prefix) }
}
// (...)
}
Now, it is a wrapper around contains_key function, that is called on… a LookupMap! The contains_key function is slightly more complex:
impl<K, V, H> LookupMap<K, V, H>
where
K: BorshSerialize + Ord,
V: BorshSerialize + BorshDeserialize,
H: ToKey,
{
pub fn contains_key<Q: ?Sized>(&self, k: &Q) -> bool
where
K: Borrow<Q>,
Q: BorshSerialize + ToOwned<Owned = K> + Ord,
{
// Check cache before checking storage
let contains = self
.cache
.map_value_ref(k, |v| v.value.get().and_then(|s| s.value().as_ref()).is_some());
if let Some(is_some) = contains {
return is_some;
}
// Value is not in cache, check if storage has value for given key.
let storage_key = H::to_key(&self.prefix, k, &mut Vec::new());
let contains = env::storage_has_key(storage_key.as_ref());
if !contains {
// If value not in cache and not in storage, can set a cached `None`
let cache = self.cache.get(k.to_owned());
let _ = cache.value.set(CacheEntry::new_cached(None));
let _ = cache.hash.set(storage_key);
}
contains
}
// (...)
}
The critical part here is the env::storage_has_key call. It looks directly at the storage (essentially what we have been doing so far). If it finds a specific key, then it means value is present in the LookupMap and, in this case, in our UnorderedSet as well. This storage key is constructed in a specific manner, however. It uses the prefix for the LookupMap, along with the key we want, to see what is present (in our case that’s going to be one of the AccountIds) and an empty mutable Vec. Remember the UnorderedMap definition? Here is a refresher:
#[derive(BorshDeserialize, BorshSerialize)]
pub struct UnorderedSet<T, H = Sha256>
where
T: BorshSerialize + Ord,
H: ToKey,
{
elements: FreeList<T>,
index: LookupMap<T, FreeListIndex, H>,
}
The concrete type for the generic type H in the case of an UnorderedSet is a Sha256. Alright, then. The implementation is, as we could have anticipated, as follows:
#[derive(Debug, Clone, Copy, Eq, PartialEq)]
pub enum Sha256 {}
impl ToKey for Sha256 {
type KeyType = [u8; 32];
fn to_key<Q: ?Sized>(prefix: &[u8], key: &Q, buffer: &mut Vec<u8>) -> Self::KeyType
where
Q: BorshSerialize,
{
// Prefix the serialized bytes, then hash the combined value.
buffer.extend(prefix);
key.serialize(buffer).unwrap_or_else(|_| env::abort());
env::sha256_array(buffer)
}
}
So, we simply consider the empty Vec as a buffer to use (which we could have guessed because of the mutable reference). Then, we add the LookupMaps prefix to the buffer. And then, by calling serialize we also add a key to it (it will simply add the serialized key after the prefix). After that, we calculate a sha256 on that data.
But hey! During this lengthy adventure through the storage internals, we actually figured out what those prefixes and keys are! We also still have some entries in our storage HashMap that we did not clarify yet. Could that be…? Let’s see!
Consider the following helper function:
fn get_hash() {
use near_sdk::store::key::Sha256;
use near_sdk::AccountId;
let acc = AccountId::from_str("dev-20240610161054-44872460494479").unwrap();
let vec_key: Vec<u8> = vec![1];
let prefix = [vec_key.as_slice(), b"m"].concat();
let storage_key = Sha256::to_key(&prefix, &acc, &mut Vec::<u8>::new());
println!("{storage_key:?}");
// or return it and use it programmatically!
}
Implementation is basically calculating the storage key in the same manner as the contains_key function. However, our short research revealed that we know the prefix of that internal LookupMap, which is a byte “m”. The account I’ve hardcoded there is the one we are absolutely sure needs to be in the UnorderedSet. That is the one we added after recreating it. If you run this test, you will see that the storage_key variable contains a vector of bytes. If you look through our storage HashMap you should notice that this value is indeed one of the keys present there. Its corresponding value is equal to [0, 0, 0, 0] (which, as you now know, is just a serialized 32-bit 0). It actually makes sense! This key is in the storage only to make sure something exists. We do not need its value to mean anything. However, that specific value does, we’ll see that in the moment.
Now, go ahead and calculate storage_key values for each of the three AccountIds and look through the storage HashMap. Take your time, I’ll wait.
Did you notice something? Two of those keys should have a value of [0, 0, 0, 0] and one of them should be [1, 0, 0, 0]. You might already have an educated guess about why that’s the case. Let’s not guess, though. Let’s simply check what happens. Thankfully, that’s our last dive into the library’s code itself. Here is the UnodrderedSets insert function:
pub fn insert(&mut self, value: T) -> bool
where
T: Clone + BorshDeserialize,
{
let entry = self.index.get_mut_inner(&value);
if entry.value_mut().is_some() {
false
} else {
let element_index = self.elements.insert(value);
entry.replace(Some(element_index));
true
}
}
Considering that the entry in this case is the storage key (go ahead and analyze the get_mut_inner function), if it’s already there — we do nothing. However, if it’s the first time our internal LookupMap sees that value, we modify its entry ( its related storage key) to contain its index in the Vec from the FreeList. Hence, we have two keys with values corresponding to the first element in that Vec — the one we had originally and one we added after recreating the UnorderedSet.
Fun fact! You can replace LookupMap with UnorderedMap and/or UnorderedSet with a LookupSet and similar behaviors can be observed!
Summary
What we did here was a (hopefully) not too complicated NEAR storage walkthrough (I tried to keep the difficulty curve as liNEAR as possible). The goal of this article, apart from being educational, is also to let you know about this “reappearing storage” behavior. It is not a secret that this happens, it’s just how it was implemented. It is important to note, however, that such a reappearing storage might lead to unexpected errors during execution and in extreme cases to exploitable vulnerabilities. Thankfully, it is quite rare that a protocol’s functionality requires deleting and then recreating the Sets inside of Maps with the same prefixes. If you don’t do that — don’t worry.
If your protocol requires deleting, then make sure that you first delete all values from the inner Set before deleting it from the Map itself. Or, try to revise your architecture. It is possible to implement recreating the Set that wouldn’t be impacted by it, though it would require the contract’s state to also hold additional information (exercise for the reader — how would you implement it?).
Note
The exemplary code was created using NEAR SDK version 4.1.1. However, those structures are also present in the version 5.1.0 and they also are impacted by this behavior! However, the Unordered… structs are marked as deprecated due to suboptimal iteration performance. The Lookups…are not deprecated!
About The Author
Michal Bajor is a Senior Security Engineer at Resonance. He specializes in Web3 security. He has exposure to numerous different protocols. He has experience both in Solidity and Rust programming languages with interest in other, more exotic (at least in blockchain) technologies. His favorite protocol is, unsurprisingly, NEAR.
His role at Resonance is providing top-notch security services to the clients. That most often means auditing the client’s code looking for security flaws, possible optimizations, and potential design or architectural issues.