Benchmark systems for ALMA data processing
ALMA data represents a significant challenge to process. General considerations for computing hardware to run CASA on may be found on the CASA hardware considerations page, which has a detailed discussion of trade-offs between processing power and i/o and some alternative [though similar] system specifications). Note that the CASA software package is rapidly evolving, and our recommendations may change in the near future once the performance of CASA in parallel mode (which will start to become available with the CASA 4.1 release) is fully understood.
As a very rough guide to data volume, Cycle 0 execution blocks (EBs) from ALMA contain about 1GB of raw data if taken in continuum mode (128 channels per baseband), or 10GB of raw data if taken in spectral line mode (3840 channels per baseband), with a typical project containing 3-10 EBs (though a few have many more). In Cycle 1, the increased number of antennas will inflate these volumes, though the ability to use mixed modes and channel averaging will counteract this to some extent, resulting in a likely net increase of about a factor of two in volume over Cycle 0 (i.e. 2GB per continuum EB, 20GB per spectral line EB). Experience suggests that typical processing will temporarily inflate these numbers by about a factor of ten. Furthermore, the Cycle 0 data deliveries often contain multiple EBs, and also contain measurement sets which inflate the data volume of the deliveries by a factor of about four compared to the raw ALMA Science Data Models (ASDMs). (The Cycle 1 packages do not include measurement sets.) The largest Cycle 0 delivery is about 1.4TB in size. With the increased number of antennas in Cycle 1, and beyond, these volumes will increase.
Currently, only the smallest continuum datasets can be effectively processed on a laptop. For desktop computers, 8 GB is probably the minimum memory needed for data reduction, and substantial improvements are seen when additional memory is added (up to at least 128GB).
The systems on this page are representative of the machines the regional centres are purchasing to support visitors. We have been benchmarking these systems in order to assist users with their purchasing decisions should they wish to perform data processing and analysis at their home institutions.
Our current benchmark systems are as follows:
Sample System 1: "Low end" (<3k USD) Linux machine (adequate for small Cycle 0 datasets, e.g. continuum observations, or single spectral window spectral line observations)
- Dual core Intel Xeon 2.27GHz processor
- 12GB 1333MHz DDR3 SDRAM
- 1x1.5TB disk
Sample System 2: "Mid-range" (~5k USD) Linux machine (adequate for most Cycle 0 data processing)
- Dual quad core 2.26GHz 8M L3 Cache processors
- 24GB 1333MHz DDR3 SDRAM
- 3x2.0TB disks
Sample System 3: "High-end" (~$10k) Linux machine (adequate for most Cycle 1 data processing)
- Dual 8-core 3.1GHz Xeon E5-2687W processors
- 128GB 1600MHz RAM
- 1x3.0TB disks
Sample System 4: "Mid-spec" Mac Pro (~8k USD)
- 2x2.4GHz 6-core Intel Xeon processors
- 32GB 1333MHz DDR3 ECC
- 4x3.0TB disks (RAID0 configurartion)
Cluster solution for large or complex datasets (~65k USD) (note that these clusters require setup, tuning and maintenance by an experienced sysadmin, in addition to the capital cost).
Compute nodes (~40k USD)
- 8 servers with dual 2.6GHz eight-core E5-2670 Xeon processors with 20M cache, 8.0GT/s QPF, Turbo
- 64GB memory (8x8GB)
- Local OS disk only
- 40Gb QDR Mellanox NT26428 PCI-E 8xHCAs for fast access to Lustre filesystem
Lustre filesystem (~25k USD)
- 1 server with two 250GB internal disks (RAID-0 mirror of OS and metadata target), 8GB memory and a 40Gb QDR Mellanox NT26428 PCI-E 8xHCA to act as the Metadata server.
- 2x4U 24 disk chasses with Superlogics X8DTH-I motherboard, 7 PCI-E 8x slots, dual E5520 Xeon processors, 4GB memory, 40Gb QDR Mellanox NT26428 PCI-E 8xHCAs and redundant hot swappable 1200W power supplies to act as Object Storage Servers
- 8xObject Storage Targets (RAID arrays) with an eight port 3ware 9650SE raid controller attached to one of the PCI-E 8x slots and 6 Western Digital WD2003FYYS 2TB hard drives.
Our runtime benchmarking has been performed using the Antenna Band 7 Science Verification dataset. The benchmarking scripts are based on those in the CASA Guides for calibration and imaging with the interactive parts removed. These data were taken with eleven antennas compared to the 16+ offered in Cycle 0 and 32+ in Cycle 1, so these times are lower bounds on a typical Early Science program. These tests used CASA 4.0. (Note that none of these tests used parallel CASA, and thus effectively ran CASA on only a single core. We expect that when parallelization is enabled both the absolute and relative performances will change significantly, favouring single machines with large numbers of cores, and clusters.)
|System||Antennae Band 7 Calibration||Antennae Band 7 Imaging||Antennae Band 7 Both|
|System 1||3.0 hr||1.1 hr||4.1 hr|
|System 2||2.2 hr||1.1 hr||3.3 hr|
|System 3||1.7 hr||0.4 hr||2.1 hr|
|System 4||1.6 hr||1.1 hr||2.7 hr|
|Single Cluster Node||1.9 hr||1.4 hr||3.5 hr|