Canadian Astronomical Society (CASCA)
Société canadienne d'astronomie (CASCA)


David SCHADE
Canadian Astronomy Data Centre, Herzberg Institute of Astrophysics, National Research Council Canada

Data Mining and the Virtual Observatory


Extracting scientific understanding from large, multi-wavelength data collections is the primary motivation of the Virtual Observatory movement. Scientific data mining can be done using either catalogues of derived parameters such as flux, spectral energy distribution shape, and source morphology or it can be done through the application of user-defined algorithms directly on the pixel data. In either case, existing data collections are large enough to require massive processing power. In addition, the data need to be very carefully engineered and organized to allow the execution of either database queries or pixel-level processing across multi-wavelength datasets. This engineering and organization task is the primary challenge of the Virtual Observatory. It is a fact, at present, that even large-scale science projects typically develop their own information technology infrastructure. But at some point in the near future the scale and complexity of available datasets will exceed the capabilities of individuals or small groups of researchers to handle then. At that point the Virtual Observatory capabilities for enabling data mining will become a necessity.