About Connect Network Applications Engineering Publications Home


The Globally Interconnected Object Databases (GIOD) Project

A data thunderstorm is gathering on the horizon with the next generation of particle physics experiments. The amount of data is overwhelming. Even though the prime data from the CERN CMS detector will be reduced by a factor of more than 107, it will still amount to over a Petabyte (1015 bytes) of data per year accumulated for scientific analysis.

The task of finding rare events resulting from the decays of massive new particles in a dominating background is even more formidable. Particle physicists have been at the vanguard of data-handling technology, beginning in the 1940’s with eye scanning of bubble-chamber photographs and emulsions, through decades of electronic data acquisition systems employing real-time pattern recognition, filtering and formatting, and continuing on to the PetaByte archives generated by modern experiments. In the future, CMS and other experiments now being built to run at CERN’s Large Hadron Collider (LHC) expect to accumulate of order of 100 PetaBytes within the next decade.

The scientific goals and discovery potential of the experiments will only be realized if efficient worldwide access to the data is made possible. Particle physicists are thus engaged in large national and international projects that address this massive data challenge, with special emphasis on distributed data access. There is an acute awareness that the ability to analyze data has not kept up with its increased flow. The traditional approach of extracting data subsets across the Internet, storing them locally, and processing them with home-brewed tools has reached its limits. Something drastically different is required. Indeed, without new modes of data access and of remote collaboration we will not be able to effectively “mine” the intellectual resources represented in our distributed collaborations.


Julian Bunn
CERN and Caltech


Caltech, Hewlett-Packard Corporation


web @ STAR TAP