HDF-EOS5 Augmentation Tool

The HDF-EOS5 Augmentation Tool is a program that augments an existing HDF-EOS5 file so that the netCDF-4external library can read the augmented file. This program mainly adds HDF5external dimension scales, which can be recognized by the netCDF-4 library, and associates them with corresponding HDF5 datasets, which correspond to both netCDF-4 variables and HDF-EOS5 fields. The augmented file can still be read by HDF-EOS5.

As of April 2010, the latest version of the netCDF-4 is 4.1.1 and it can access the augmented file. (Download)

Installation

This program requires two libraries.

As HDF-EOS5 1.12 was released on July 8, 2009, some readers may have older HDF-EOS5. This particualr version is required because augmented files create attributes that make the old version of the HDF-EOS5 library raise an error. This will be explained later.

configure script is included in the package. The paths to HDF5external and HDF-EOS5 should be passed to configure . If HDF5external or HDF-EOS5 was built with the SZIPexternal library, one needs to pass the path to SZIPexternal , too. One may need to specify the path to ZLIB depending on the system.

Figure 1 Configure the package
$ ./configure \
--with-zlib=<zlib-path> \
--with-szlib=<szlib-path> \
--with-hdf5=<hdf5-path> \
--with-hdfeos5=<hdfeos5-path>
The above command should succeed and generate Makefile. With Makefile, one can build this program by issuing make command. The generated binary is named aug_eos5 and you can find it under src directory.

Usage

This program takes two arguments, an optional input mapping file and the name of existing HDF-EOS5 file to augment. Please see README for more details about the usage of augmentation tool. As this program augments and overwrites an existing HDF-EOS5 file, the specified file should be writable by the program user. One may need to first create a backup file to save the original file.

In the new release, a dimension can have up to 4 different mapping choices. They can be specified in input file. Please see README for more details about the usage of input file.

One can verify if the augmented file can be read by netCDF-4 ncdump, check_c, check_f or check_za_f. ncdump is a dumper tool distributed with the netCDF-4 library, which can be used to check because it recursively reads and dumps all objects in the file. check_c is a validator, which is built upon netCDF-4, written in C. check_f and check_za_f are validators written in Fortran as well. Please see README for more details about validation. All are distributed with the new release.

How it works

To understand how this program works, one needs to have some knowledge on HDF5external , HDF-EOS5 and netCDF-4.

Background

The HDF5external library defines two primary types of objects: groups and datasets. An HDF5 Group is a structure containing zero or more HDF5 objects and an HDF5 Dataset is a multidimensional array of data elements. Both HDF5 Datasets and HDF5 Groups can have HDF5 Attributes. An HDF5 Attribute is a small meta object describing the nature and/or usage of HDF5 objects.

An HDF5 Dimension Scale is a special HDF5 Dataset that is associated with a dimension of another HDF5 Dataset to provide temporal or spatial information such as time, latitude and longitude.

Both HDF-EOS5 and netCDF-4 are built on HDF5external . The HDF-EOS5 library defines a few data types – grid, swath, point and zonal average – specialized for the earth observing system. The HDF-EOS5 library also defines groups, fields, attributes and dimensions. For example, an instance of the grid data type consists of one group, multiple fields, multiple attributes, and multiple dimensions. An HDF-EOS5 Group, an HDF-EOS5 Field and an HDF-EOS5 Attribute are represented using an HDF5 Group, an HDF5 Dataset and an HDF5 Attribute, respectively. However, an HDF-EOS5 Dimension is not implemented by following the HDF5 Dimension Scale Specification and Design Notes[1].

Figure 2 shows the file structure of a typical HDF-EOS5 file viewed as an HDF5 file. In this figure, CloudFraction and CloudPressure are HDF5 Datasets corresponding to two HDF-EOS5 Fields. CloudFractionAndPressure is an HDF5 Group corresponding to an HDF-EOS5 Group. Other HDF5 Groups including HDFEOS, GRIDS and Data Fields, are groups created by the HDF-EOS5 library but invisible by HDF-EOS5 applications.

Figure 3 shows a typical HDF-EOS5 grid data. In this example, the grid is two-dimensional and it is defined by XDim and YDim. Since both XDim and YDim are standard dimensions defined by HDF-EOS5, all grid objects have these dimensions and most data fields in grid objects refer to them.

The netCDF-4 library defines groups, variables, attributes and dimensions. A netCDF-4 Group, a netCDF-4 Variable and a netCDF-4 Attribute are implemented using an HDF5 Group, an HDF5 Dataset and an HDF5 Attribute, respectively. A netCDF-4 Dimension is implemented as an HDF5 Dimension Scale by following HDF5 Dimension Scale Specification and Design Notes. The netCDF-4 library requires that every dimension of any single netCDF-4 Variable is associated with a netCDF-4 Dimension.

The Issue

Although two HDF-EOS5 Dimensions, XDim and YDim, are actually associated with both HDF-EOS5 Fields, they are invisible in Figure 2 because they are not HDF5 Dimension Scales.

Since HDF5external cannot see HDF5 Dimension Scales, netCDF-4 cannot see netCDF-4 Dimensions, either. (Recall that a netCDF-4 Dimension is implemented as an HDF5 Dimension Scale.) The lack of dimension information is the main reason why netCDF-4 cannot read an HDF-EOS5 file. Having dimension information is one of netCDF-4's requirements.

Augmentation

Given that the problem is mainly caused by the lack of dimension information, HDF-EOS5 Augmentation Tool adds dimension information to existing HDF-EOS5 files. The result of augmentation is shown in Figure 4.

The result has two additional HDF5 objects, XDim and YDim, and two existing HDF5 Datasets, CloudFraction and CloudPressure, now have arrows to XDim and YDim. We will briefly explain how it augments existing HDF-EOS5 files.

Creating an HDF5 Dimension Scale for each HDF-EOS5 Dimension

As a netCDF-4 Dimension is an HDF5 Dimension Scale, creating HDF5 Dimension Scales will make netCDF-4 recognize netCDF-4 Dimension. For each HDF-EOS5 Dimension, the augmentation program creates an HDF5 Dimension Scale under the HDF5 Group that corresponds to the HDF-EOS5 grid object.

In Figure 4, XDim and YDim are created HDF5 Dimension Scales. They are created under CloudFractionAndPressur that corresponds to the HDF-EOS5 grid object.

Associating created HDF5 Dimension Scales with corresponding HDF5 Datasets

Just creating HDF5 Dimension Scales itself does not make netCDF-4 associate them with netCDF-4 Variables; they should be associated. Association is made through an HDF5 Attribute as HDF5 Dimension Scale Specification and Design Notes defines. The HDF5 Dataset that refers to HDF5 Dimension Scales should have an attribute named DIMENSION_LIST. On the other hand, the HDF5 Dimension Scale associated with HDF5 Datasets should have an attribute named REFERENCE_LIST.

Those attributes are attached during augmentation. This association is described as arrows in Figure 4. Then, netCDF-4 recognizes the association, and this file becomes structurally correct.

Fill two special HDF5 Dimension Scales, XDim and YDim

Although the previous steps make the augmented file structurally correct, the file is not very useful because it does not contain any longitude or latitude values. These values are mostly missing in grid objects because they can be calculated from several parameters contained in one HDF5 Dataset predefined by the HDF-EOS5 library. In fact, this is the way HDF-EOS5 keeps the grid file compact.

To make the augmented file more useful, the augmentation program calculates longitude and latitude values using an HDF-EOS5 API, and stores them at XDim and YDim in Figure 4. For the grid file shown in Figure 2, XDim will have 14 longitude values, and YDim will have 8 latitude values.

Limitation

Invisible HDF5 objects are not augmented

This program assumes that all HDF5 Datasets are accessible via the HDF-EOS5 API. This limitation is caused by the fact that this program uses the HDF-EOS5 API to collect all information. If an HDF-EOS5 file contains an HDF5 Dataset that cannot be accessed by the HDF-EOS5 API, this program cannot add HDF5 Dimension Scales for that dataset, which may cause the netCDF-4 library to fail to read the file even after augmentation.

Object names

As of our best knowledge, object names in HDF-EOS5 can contain white spaces and some special characters as long as HDF5 allows. Some of these characters are forbidden to be used inside the name of a netCDF-4 variable, but the augmentation tool does not try to make the name safe. This is because this tool keeps the original file as much as possible.

HDF-EOS5 Swath Dimension Map is not supported

HDF-EOS5 swath can have dimension maps. The augmentation tool does not handle this, and the program will generate an error if it contains any dimension maps.

If this limitation matters, one can use HDF-EOS5 to netCDF-4 Converter as this tool can handle dimension maps correctly. However, the file generated by the HDF-EOS5 to netCDF-4 Converter can no longer be accessed by HDF-EOS APIs.

Download

Download the HDF-EOS5 Augmentation Tool.

Reference


Last modified: 02/17/2011
About Us | Contact Info | Archive Info | Disclaimer
Sponsored by NASA Cooperative Agreement Grant Number NNX08AO77A / Maintained by The HDF Group