ByteArray
Most TADS programs work with the T3 VM's high-level types - integers, strings, lists, objects, and so on. In some cases, though, it's necessary to manipulate the raw bytes that form the basic units of storage on modern computers. The ByteArray class provides a structured way of working directly with bytes.
A ByteArray looks superficially similar to a Vector object, in that you can access the individual byte elements of a ByteArray using the square bracket indexing operator:
local arr = new ByteArray(100); arr[5] = 12;
The difference is that the elements of a ByteArray can only store byte values, which are represented as integers in the range 0 to 255.
Creating a ByteArray
You create a ByteArray object using the new operator. You must pass to the constructor the number of bytes you want to allocate for the new object; this can be any value from 1 to approximately 2 billion. For example, to create a byte array with 1,024 elements, you would write this:
local arr = new ByteArray(1024);
The size of a ByteArray is fixed at creation; the size cannot change after the object is created.
You can also create a ByteArray as a copy of another byte array or a portion of another byte array:
arr = new ByteArray(otherArray, startIndex, len);
The startIndex and len parameters are optional; if they're missing, the new byte array will simply be a complete copy of the existing byte array. If startIndex and len are provided, the new array will be a copy of the region of the other byte array starting at index startIndex and continuing for len bytes. If startIndex is specified but len is missing, the new array will consist of all of the bytes from the original starting with startIndex and continuing to the end of the original array.
Reference Semantics
Like Vector objects, a ByteArray has reference semantics: when you change a value in a byte array, any other variables that refer to the same ByteArray will refer to the modified version of the array.
Reading and Writing Raw Files
One of the tasks for which ByteArray objects are uniquely suited is working with files stored in a third-party data format. Using ByteArray objects, you can work directly with the exact bytes stored in an external file, allowing you to process data in arbitrary binary formats.
To read or write a file using ByteArray objects, you must open the file in "raw" mode. Once a file is opened in raw mode, you can use the fileRead() and fileWrite() methods of the File object to read bytes from the file into a ByteArray, and to write bytes from a ByteArray into the file. Refer to the File class for information on the file input/output.
ByteArray methods
copyFrom(sourceArray, sourceStartIndex, destStartIndex, length)
This routine is safe to use even if sourceArray is the same as the target object, and even if the ranges overlap. When copying bytes between overlapping regions of the same array, this routine is careful to move the bytes without overwriting any source bytes before they've been moved.
fillValue(val, startIndex?, length?)
length()
mapToString(charset, startIndex?, length?)
The character set given by charset must be known. If the character set is not known, an UnknownCharSetException is thrown. You can determine if the character set is known using the isMappingKnown() method of charset.
readInt(startIndex, format)
The format code given by format is a bit-wise combination of three parts: a size, a byte order, and a signedness:
- The size gives the number of bits in the integer; this can be one of the values FmtSize8, FmtSize16, or FmtSize32, indicating 8-bit, 16-bit, and 32-bit values, respectively.
- The byte order can be FmtBigEndian or FmtLittleEndian. A big-endian value is stored with its most significant byte first, followed by the second-most significant byte, and so on. A little-endian value is stored in the opposite order, with its least significant byte first. The readInt() method makes it possible to specify the desired byte ordering because the native byte ordering of different hardware platforms varies, and as a result, the ordering of bytes in data fields in file formats specified by third-party applications can vary. Note that the byte order is irrelevant in the case of 8-bit values, since an 8-bit value requires only one byte in the byte array.
- The signedness indicates whether the integer is to be interpreted as signed or unsigned; this can be FmtSigned or FmtUnsigned. Note that the T3 VM doesn't have an unsigned 32-bit datatype, so FmtUnsigned isn't meaningful with FmtSize32.
So, to specify a signed 16-bit value in big-endian byte order, you'd use (FmtSize16 | FmtSigned | FmtBigEndian).
It's a lot of typing to specify all three parts of a data format, so the byte array system header file defines all of the useful combinations as individual macros:
- FmtInt8 (signed 8-bit integer)
- FmtUInt8 (unsigned 8-bit integer)
- FmtInt16LE (signed 16-bit integer in little-endian byte order)
- FmtUInt16LE (unsigned 16-bit integer in little-endian byte order)
- FmtInt16BE (signed 16-bit integer in big-endian byte order)
- FmtUInt16BE (unsigned 16-bit integer in big-endian byte order)
- FmtInt32LE (signed 32-bit integer in little-endian byte order)
- FmtInt32BE (signed 32-bit integer in big-endian byte order)
This function simply reads the bytes out of the byte array and translates them according to the format specification. There is no information in the byte array itself that indicates how the bytes are to be interpreted into an integer, so it is up to your program to specify the correct format translation. You'll get strange results if you attempt to read values in a format different from the format that was used to write them.
subarray(startIndex, length?)
writeInt(startIndex, format, val)
The format code in format has the same meaning as the format code in readInt().
Note that this method doesn't perform any range checking on val. If val is outside of the limits that can be represented with the specified format code, this method will simply truncate the value stored to its low-order portion, discarding any high-order bits that won't fit the format. For example, if you attempt to store 1000 in an unsigned 8-bit format, the value stored would be 232; we can see this more easily by noting that 1000 is 0x3E8 in hexadecimal, so when we truncate this to 8 bits, we get E8 in hex, which is 232 in decimal. Note also that if you later attempted to read this value back as a signed 8-bit value, the result would be even stranger: it would be -24. This is because E8 is negative when interpreted as signed, so it would be interpreted as the integer 0xFFFFFFE8, which is -24. If you need range checking, your program must provide it. Here are the limits of the different types:
- Signed 8-bit: -128 to +127
- Unsigned 8-bit: 0 to +255
- Signed 16-bit: -32768 to +32767
- Unsigned 16-bit: 0 to +65535
- Signed 32-bit: -2147483648 to +2147483647
The capacity of a type doesn't depend on its byte order. Note that there should be no need for range checking on a 32-bit value, since the T3 VM's internal integer type itself is a 32-bit signed value and thus can't exceed this range to begin with.
This method stores only the bytes of the translated integer value. It doesn't store any information on the format code used to generate the value; this means that if you later want to read the integer value back out of the byte array, it will be up to your program to specify the correct format code.