The HDF-EOS5 Augmentation Tool is a program that augments an existing
HDF-EOS5
file so
that the
netCDF-4
library can read the augmented file. This program mainly adds
HDF5
dimension
scales, which can be recognized by the netCDF-4 library, and associates them
with corresponding HDF5 datasets, which correspond to both netCDF-4 variables
and HDF-EOS5 fields.
The augmented file can still be read by HDF-EOS5.
As of April 2010, the latest version of the netCDF-4 is 4.1.1 and it can access the augmented file. (Download)
This program requires two libraries.
configure
script is included in the package. The paths to
HDF5
and HDF-EOS5 should be
passed to
configure
. If
HDF5
or HDF-EOS5 was built with the
SZIP
library,
one needs to pass the path to
SZIP
, too. One may need to specify the path to ZLIB depending on
the system.
Makefile. With Makefile, one can
build this program by issuing make command. The generated binary is named
aug_eos5 and you can find it under src directory.
This program takes two arguments, an optional input mapping file and the name of existing HDF-EOS5 file to augment. Please see README for more details about the usage of augmentation tool. As this program augments and overwrites an existing HDF-EOS5 file, the specified file should be writable by the program user. One may need to first create a backup file to save the original file.
In the new release, a dimension can have up to 4 different mapping choices. They can be specified in input file. Please see README for more details about the usage of input file.
One can verify if the augmented file can be read by netCDF-4 ncdump, check_c, check_f or check_za_f. ncdump is a dumper tool distributed with the netCDF-4 library, which can be used to check because it recursively reads and dumps all objects in the file. check_c is a validator, which is built upon netCDF-4, written in C. check_f and check_za_f are validators written in Fortran as well. Please see README for more details about validation. All are distributed with the new release.
To understand how this program works, one needs to have some knowledge
on
HDF5
, HDF-EOS5 and netCDF-4.
The
HDF5
library defines two primary types of objects:
groups and datasets. An HDF5 Group
is a structure containing zero or more HDF5 objects
and an HDF5 Dataset is a multidimensional array of
data elements. Both HDF5 Datasets and HDF5 Groups
can have HDF5 Attributes. An
HDF5 Attribute is a small meta object describing the
nature and/or usage of HDF5 objects.
An HDF5 Dimension Scale is a special HDF5 Dataset that is associated with a dimension of another HDF5 Dataset to provide temporal or spatial information such as time, latitude and longitude.
Both HDF-EOS5 and netCDF-4 are built on
HDF5
. The
HDF-EOS5 library defines a few data types –
grid,
swath, point and zonal average
– specialized for the earth observing system.
The HDF-EOS5 library also defines groups,
fields, attributes and
dimensions. For example, an instance of the grid
data type consists of one group, multiple
fields, multiple attributes, and
multiple dimensions. An HDF-EOS5 Group,
an HDF-EOS5 Field and an HDF-EOS5 Attribute are represented using an
HDF5 Group, an HDF5 Dataset and an HDF5 Attribute,
respectively. However, an HDF-EOS5 Dimension is not implemented by
following the HDF5 Dimension Scale Specification and Design Notes[1].
Figure 2 shows the file structure of a typical HDF-EOS5 file viewed as an HDF5 file. In this figure, CloudFraction and CloudPressure are HDF5 Datasets corresponding to two HDF-EOS5 Fields. CloudFractionAndPressure is an HDF5 Group corresponding to an HDF-EOS5 Group. Other HDF5 Groups including HDFEOS, GRIDS and Data Fields, are groups created by the HDF-EOS5 library but invisible by HDF-EOS5 applications.
Figure 3 shows a typical HDF-EOS5 grid data. In this example, the grid is two-dimensional and it is defined by XDim and YDim. Since both XDim and YDim are standard dimensions defined by HDF-EOS5, all grid objects have these dimensions and most data fields in grid objects refer to them.
The netCDF-4 library defines groups, variables, attributes and dimensions. A netCDF-4 Group, a netCDF-4 Variable and a netCDF-4 Attribute are implemented using an HDF5 Group, an HDF5 Dataset and an HDF5 Attribute, respectively. A netCDF-4 Dimension is implemented as an HDF5 Dimension Scale by following HDF5 Dimension Scale Specification and Design Notes. The netCDF-4 library requires that every dimension of any single netCDF-4 Variable is associated with a netCDF-4 Dimension.
Although two HDF-EOS5 Dimensions, XDim and YDim, are actually associated with both HDF-EOS5 Fields, they are invisible in Figure 2 because they are not HDF5 Dimension Scales.
Since
HDF5
cannot see
HDF5 Dimension Scales, netCDF-4 cannot
see netCDF-4 Dimensions, either. (Recall that a netCDF-4 Dimension is implemented
as an HDF5 Dimension Scale.)
The lack of dimension information is the main reason why netCDF-4 cannot read
an HDF-EOS5 file. Having dimension information is one of netCDF-4's requirements.
Given that the problem is mainly caused by the lack of dimension information, HDF-EOS5 Augmentation Tool adds dimension information to existing HDF-EOS5 files. The result of augmentation is shown in Figure 4.
The result has two additional HDF5 objects, XDim and YDim, and two existing HDF5 Datasets, CloudFraction and CloudPressure, now have arrows to XDim and YDim. We will briefly explain how it augments existing HDF-EOS5 files.
As a netCDF-4 Dimension is an HDF5 Dimension Scale, creating HDF5 Dimension Scales will make netCDF-4 recognize netCDF-4 Dimension. For each HDF-EOS5 Dimension, the augmentation program creates an HDF5 Dimension Scale under the HDF5 Group that corresponds to the HDF-EOS5 grid object.
In Figure 4, XDim and YDim are created HDF5 Dimension Scales. They are created under CloudFractionAndPressur that corresponds to the HDF-EOS5 grid object.
Just creating HDF5 Dimension Scales itself does not make netCDF-4 associate them with netCDF-4 Variables; they should be associated. Association is made through an HDF5 Attribute as HDF5 Dimension Scale Specification and Design Notes defines. The HDF5 Dataset that refers to HDF5 Dimension Scales should have an attribute named DIMENSION_LIST. On the other hand, the HDF5 Dimension Scale associated with HDF5 Datasets should have an attribute named REFERENCE_LIST.
Those attributes are attached during augmentation. This association is described as arrows in Figure 4. Then, netCDF-4 recognizes the association, and this file becomes structurally correct.
Although the previous steps make the augmented file structurally correct, the file is not very useful because it does not contain any longitude or latitude values. These values are mostly missing in grid objects because they can be calculated from several parameters contained in one HDF5 Dataset predefined by the HDF-EOS5 library. In fact, this is the way HDF-EOS5 keeps the grid file compact.
To make the augmented file more useful, the augmentation program calculates longitude and latitude values using an HDF-EOS5 API, and stores them at XDim and YDim in Figure 4. For the grid file shown in Figure 2, XDim will have 14 longitude values, and YDim will have 8 latitude values.
This program assumes that all HDF5 Datasets are accessible via the HDF-EOS5 API. This limitation is caused by the fact that this program uses the HDF-EOS5 API to collect all information. If an HDF-EOS5 file contains an HDF5 Dataset that cannot be accessed by the HDF-EOS5 API, this program cannot add HDF5 Dimension Scales for that dataset, which may cause the netCDF-4 library to fail to read the file even after augmentation.
As of our best knowledge, object names in HDF-EOS5 can contain white spaces and some special characters as long as HDF5 allows. Some of these characters are forbidden to be used inside the name of a netCDF-4 variable, but the augmentation tool does not try to make the name safe. This is because this tool keeps the original file as much as possible.
HDF-EOS5 swath can have dimension maps. The augmentation tool does not handle this, and the program will generate an error if it contains any dimension maps.
If this limitation matters, one can use HDF-EOS5 to netCDF-4 Converter as this tool can handle dimension maps correctly. However, the file generated by the HDF-EOS5 to netCDF-4 Converter can no longer be accessed by HDF-EOS APIs.
Download the HDF-EOS5 Augmentation Tool.