
Problems and Costs of Data Preservation.
In today’s New York Times, there is an article on the problem of long term data preservation, and on some of the cost factors involved in any scheme to do this. Librarians are very familiar with the problems of preservation, and for a while there, a number of articles appeared in our press and in other venues, discussing what was involved. Then, the topic seemed to disappear, at least in regard to preserving the scholarly record as we have come to understand it. But the same animal has returned, wearing a somewhat different hide and coloration. Interest is being focused on the enormous quantities of data produced as a result of research in the physical and social sciences. And, some investigators want to “repurpose” data generated in earlier experiments, their own or someone else’s with different endpoints or outcome measures, different analytical techniques and so forth. But as the enthusiasm for such efforts grows, the facts about data preservation emerge, or rather, re-emerge with discouraging force; long- term data preservation, in an accessible and useful format, is a real technological challenge. Moreover, whatever measures might be suggested as solutions, it’s all going to cost a great deal of money. Preservation also means more than mere dumping of files someplace, even with something like intelligent and conscientious curation. A good deal of what we would call subject analysis and description ( OK, metadata in today’s lingo), will be necessary. Institutional repositories, “Long Tail” marketing, Publishing on Demand (POD), digital publication in general and a lot of other goodies existing now or promised all depend on a reliable and secure “data base”, and I’m wondering if we have it or can get it at a price our institutions can, or will, pay. It’s probably time for a serious assessment of what is possible and what the price tag will be.

