A Hashtable is an unordered collection of values, with each value indexed by a "key," which is a value of any type that's used to look up a value stored in the collection. In effect, this class provides what some programming languages call an associative array, because it allows a value to be associated with an arbitrary key, and then efficiently found given the same key.
To use hash tables, you must define the Hashtable intrinsic class in your source code. The easiest way to do this is to include the system header file "hashtab.h", which is included with the compiler.
The term "hash table" refers to the design of the internal implementation of this class. For the most part, there is no practical need for you to know that the class is implemented as a hash table, because this fact is largely invisible when you're using Hashtable objects. You might therefore wonder why the class is named as it is, rather than something like "AssociativeArray" that is descriptive of its function rather than of its implementation. The reason is that, even though you don't need to know that the class is implemented as a hash table, it is sometimes useful to know it, because hash tables have certain performance trade-offs that some programmers might want to consider when deciding how to store particular collections.
A hash table operates by computing a hash value for each key, and then using the hash as an index into an array of "buckets," each of which contains one or more key/value pairs. A hash value is an integer calculated from a key value to take the place of the key value. Ideally, each distinct key value would have a unique hash value, but this obviously cannot be the case, because we're reducing the amount of information used to represent a key down to a single integer value with a fixed precision. Clearly, there are more possible strings than there are different integers you can store in an n-bit integer for any given n, and then we have lists and all the other types to hash as well. Therefore, a hash table must be able to cope with "hash collisions," where two distinct key values stored in the table have the same hash value. Hash tables deal with collisions by using the hash value only as a first-order approximation to finding the key; once the hash value is computed, a hash table must look at each key stored in the table at that hash value to find the particular key it's seeking.
A hash table is most efficient when most of the keys stored in the table have hash values different from those of all the other keys, in which case most of the hash value slots will have no more than one key associated with them. Finding a key in such a table is fast: compute the hash value, look at the slot for the hash value, and confirm that the first key matches the key being sought. A hash table is less efficient when several keys have the same hash, because the table might have to look at several keys stored in a given hash value slot before finding the one being sought.
It isn't possible in general to guarantee that a hash table will have no collisions, if for no other reason than that you usually can't know in advance what data the table would contain. However, you frequently will have at least some idea of how much data the table might contain, and this lets you set up a Hashtable to make it more likely that there will be an efficient distribution of hash values. In particular, you should choose the number of buckets so that it is a reasonably high fraction of the number of values you expect. A bucket is a possible hash value; if a table has ten buckets, it can store ten unique hash values. (For any hash value that doesn't fit in the number of buckets available, the hash table simply takes the remainder after dividing by the bucket count: for a ten-bucket table, a hash value of 25 would turn into 5.)
You create a hash table with the "new" operator. The hash table constructor takes two arguments: the number of buckets for the hash table, and the initial number of entries.
A bucket is a unique hash value; if you create a table with ten buckets, it can store ten unique hash values. A high number of buckets will reduce the likelihood of hash collisions and thus will generally make searches faster, but will use more memory. You should choose a bucket count that's a reasonably high fraction of the total number of entries you expect the table to contain, although just how high a fraction depends on the total number of entries as well. If you only expect ten or twenty values to be stored in the table, you should probably use a bucket count of about the same number; if you expect a thousand values, 500 buckets would probably be reasonable.
The bucket count is fixed over the life of the hash table.
Note that your choice of bucket count won't affect the functioning of the table, apart from efficiency. A hash table with two buckets and ten thousand entries will work the same way as a table with a thousand buckets and the same ten thousand entries, but the thousand-bucket version will be a lot faster. A table with a thousand buckets and five entries will work the same as a table with five buckets and the same five entries, but the thousand-bucket table will take up a lot more memory for no good reason.
The initial number of entries is purely advisory, and sets the amount of memory the table initially uses so that it can accommodate the given number of stored values. If you eventually add more values to the table than the initial entry count, the hash table will automatically expand to accommodate more entries. For maximum efficiency, you should choose a value that's about the same as the number of entries you ultimately expect; this will ensure that you don't use excessive memory by allocating many more initial slots than you end up using, while at the same time avoiding repeated expansion of the table.
// create a hash table with 16 buckets and 32 initial entries
local symtab = new Hashtable(16, 32);
You use a hash table as though it were an array, except that you can use any value as an index. For example, we could use a hash table to associate objects with strings:
symtab['book'] = book;
symtab['ball'] = ball;
symtab['table'] = table;
Now, if we want to look up the object associated with one of these words, we simply index the hash table:
str = 'ball';
obj = symtab[str];
If you store a value in the hash table, and a value was already stored at the same key, the old value at the key is discarded and replaced by the new value. If you look up a key, but the key was never stored in the table, the result is nil.
Like the Dictionary intrinsic class, the Hashtable class stores "weak" references to the values stored in the table. This means that storage of a value in a hashtable will not by itself prevent the garbage collector from discarding the object. If you want to ensure that an object stored as a value in a hash table is not discarded, you must store the value somewhere else as well, such as in a list stored in a property of an object.
If an object referenced as a value in a Hashtable becomes otherwise unreachable, and the garbage collector discards the object, the hash table entry referring to the object is automatically removed from the hash table.
Whereas the values in a hash table are weakly referenced, the key values are strongly referenced, so the garbage collector will never discard any key value stored in a hash table.
There's a good reason for the disparity between values (weak references) and keys (strong). A hash table is usually used as a way of indexing a set of objects (or other values) that have some other meaning in the program; the hash table is simply a fast way of accessing the values. Since the hash table is simply an index, it doesn't make sense for the index alone to force an object to stay in memory, if the object is otherwise unusable. Keys, on the other hand, might exist only to provide the index, so they're needed as long as the hash table itself is needed. (An example would be a string that's constructed simply to act as a key; the string probably wouldn't be stored anywhere else after being added as a key to the hash table, but clearly shouldn't be discarded until the hash table itself is discarded.)
Hashtable is a subclass of Collection, so you can use the createIterator() method to create an Iterator to iterate over the elements of the hash table. The Iterator that a Hashtable creates is called a HashtableIterator; it visits the values stored in the hash table in an arbitrary order.
Hashtable is a subclass of Collection, and thus includes all Collection methods. In addition, Hashtable defines the methods listed below.
applyAll(func) – for each element in the table, invokes the function func(key, value), and then changes the element's value to the return value of func. This allows you to modify all of the values in the table according to the given function. The keys are not changed. The entries are visited in arbitrary order. No return value.
forEach(func) – for each element in the table, invokes the function func(key, value). Any return value from func is ignored; the values and keys stored in the table are not changed. The entries are visited in arbitrary order. No return value.
isKeyPresent(key) – checks to see if an entry with the given key is present in the table. Returns true if the key is present, nil if not.
removeElement(key) – removes the element with the given key, if any. If there is no element with the given key in the table, this makes no change to the table and simply returns nil. If an element is found, the element is removed, and the value associated with the key (before the removal) is returned.