Excel
Excel is a binary data format for storing relational data. Contains info for EXL, EXH and EXD files.
Excel is a binary data format for storing relational data. It's composed of 3 different files:
Excel List (.exl) is used to provide the excel index for the game, and allows certain sheets to be accessed by an ID rather than a name.
Excel Header (.exh) is used to define the data layout, variant, segmentation and available languages in the event of localisation being present.
Excel Data (.exd) is the raw data itself and just contains an offset table and then data for each row. The structure of this file changes depending on which variant is defined in the header.
Excel List (.exl)
Probably the simplest format in the entire game given it's just a text file.
The first line of the file is its 'header' and defines its type, EXLT
and it's version following that. Each sequential line after the 'header' is a path, relative to the exd/
category, and its immutable ID. The immutable ID is optional and in cases where not relevant (eg. quest dialogue files) it's value is -1
.
For example, here's an example of the content of a list file:
From these entries, you can build something like the following:
ID | Relative Path/Sheet Name | Header Path |
---|---|---|
209 |
|
|
4 |
|
|
|
| |
|
|
From here, the only thing you can do is parse headers as you will not be able to accurately figure out data paths without first reading the header due to language and row segmentation.
As a note, any excel entry in this file with an ID set get their header pre-cached by the game on startup.
Excel Header (.exh)
The Excel Header defines the schema and how you read data files. The first structure in this file is it's header which contains information you'll need to parse the rest of the file.
This file in its entirely is in big endian. You will need to convert this file to little endian on applicable systems.
For example, in C#, BitConverter.IsLittleEndian
will return whether or not a conversion needs to take place. See the Lumina source code for this file for a working example.
Magic
The magic is always EXHF
. If it's not, the file is probably not the file you're trying to read.
Data Offset
DataOffset
isn't relevant to this file at all, but is required when loading certain data from Excel Data files such as strings. It points to the end of the fixed size data of a row. For example, a row can be made of a bunch of integral types which have a known length at compile time, however a string length is variable. This offset allows you to then seek to the end of the row and access any additional data that may be on the end. If this is weird, don't worry, because it'll make sense later.
Column Count
ColumnCount
is how many columns exist within the file and is the first thing you'll read after the header. In other words, you'll need to read sizeof( ColumnDefinition ) * ColumnCount
immediately after the header.
Offset
is relative to the row offset, which you don't have until you read the offset table from a data file.
Type
is the type of data stored. Most don't have any special handling outside of String
and PackedBoolX
but we'll cover that later.
Page Count
PageCount
is the number of 'pages' an excel data sheet is split into. Many sheets only have a single page and every row will exist in a single page, however larger sheets such as Quest
or Item
have many pages. This information is another structure and immediately follows the column definition data.
startId
is the row id where a page starts. You'll need this to build file paths for data files.
rowCount
is how many parent rows a sheet contains. As a quick example before we cover it more in depth later is that given a row id, you can calculate which page a row is on with row >= startId && row < startId + rowCount - 1
.
Language Count
Finally, the last thing you'll need to read out of a header file is the languages. These are needed to generate paths to data files along with the paging information.
You can read out the languages as is.
Variant
Used when reading data. Default
requires no extra processing and you can just iterate over the offset list inside a data file. SubRows
makes each row contain it's own rows. As a better example, it's like having a compound key on a database table. Instead of one column being the identifier for a row, you have two instead.
Row Count
This is the total count of all rows across every page. The game uses this field when it internally queries for a row count.
Generating Data File Paths
Once you have the 3 pieces of critical information from the headers, namely:
The column count
The page count
The languages
You have everything you need to find data files! How exciting. The files follow one of 2 path formats, which makes this pretty easy. Make sure not to discard the data from the header, there's still some other information you'll need from it.
If the no language is set, your paths follow the following format:
Otherwise, if one or many languages are set (as in, not None
), the following format applies:
This should be relatively obvious, but here's a few examples:
Name | Page Start ID | Language | Path |
---|---|---|---|
Item | 0 | English (en) |
|
Item | 10000 | Japanese (ja) |
|
Mount | 0 | French (fr) |
|
Quest | 65536 | German (de) |
|
The above also applies for quest sheets and so on which exist in subfolders.
Excel Data (.exd)
The data file contains a single page of sheet data and as mentioned before, you will need the header file to read data correctly.
Similarly to Excel Headers, excel data files are entirely in big endian and will need to be converted to little endian on applicable systems.
The value of magic
should always be EXDF
.
indexSize
is how large the row offset index is, in terms of total size. To convert that to a number of entries, you'd do indexSize / sizeof( ExcelDataOffset )
.
Data Offset Entries
Immediately following the ExcelDataHeader
, the root row offsets are stored. The reason it's called the 'root row offset' is because on variant 2 sheets (or sheets with subrows), this offset won't point to data that can be read following the column data.
rowId
is the absolute row id, so a simple way to map these in whatever you're doing is to convert the list of data offsets in this file to a key value map, where the key is the rowId
and the value is the offset
. Then you can directly index rowId
s on any given data page.
offset
is the absolute offset to where the data is located in the file. It can be used as is.
Row Header
Once you seek to a row offset by following the offset list after the header, the first thing you'll encounter is the row header.
The dataSize
is the entire size of the row, including any data for subrows (if they exist). You can use this field to copy out the exact amount of data for a row (or subrows) to then later parse if you choose not to do it in place.
rowCount
is always 1 on variant 1 sheets and you can ignore that field entirely if you choose to. However, on variant 2 sheets, the rowCount
is how many subrows belong to a row.
Therefore, on variant 1 sheets, immediately after the row header is your row data. You can then read columns directly out of the data given a column offset that you read from the header.
On the other hand, variant 2 sheets, following the header is the first subrow, or subrow 0. You can calculate the offset to n subrow with the following:
To clear up a few things:
The
rowOffset
is the offset from the offset list.rowOffset + 6
is the skipping the size of the row header. You could just store the offset once you read the row header and use that.header.dataOffset
points to the end of the raw data of the row, or is basically the 'fixed' size of a row, not including strings. This lets you seek to the end of a row and immediately go to the next subrow.
There are some intricacies to reading certain types of columns which we'll cover in the next section.
Reading Row Data
Reading data out of a row depends a bit on the type of the column that you're reading. Most don't require any additional logic and you can read them out of the row and use them as is.
All of the types are as follows:
String
As mentioned previously, variable length data is stored past the end of the fixed length data segment of a row. So to access a string, you need to read a uint32
where there string column is. You can then obtain the offset of the string by doing the following:
Where rowOffset
is the offset of the first column of a row, the dataOffset
from the header and the uint32
that you read out from the column.
Packed Bools
Packed bools are always 8 bits/1 byte long. To each type is basically which bit you need to bitwise and against to read the correct bool out.
A simple way of doing this for all of them is something like the following:
Alternatively, you can handle each one indivdually, but it's better to be lazy and do it the lazy way instead.
Everything Else
Read and use it as is. It just works™ - no magic required. Just make sure the endianness is correct.
Last updated