The Scholar’s Space

Communicating research findings in a networked world

Problems and Costs of Data Preservation.

Posted by Alex Bienkowski on Apr 9th, 2008
2008
Apr 9

In today’s New York Times,  there is an article on the problem of long term data preservation, and on some of the cost factors involved in any scheme to do this.  Librarians are very familiar with the problems of preservation, and for a while there, a number of articles appeared in our press and in  other venues, discussing what was involved.  Then, the topic seemed to disappear, at least in regard to preserving the scholarly record as we have come to understand it. But the same animal has returned, wearing a somewhat different hide and coloration.  Interest is being focused on the enormous quantities of data produced as a result of  research in the physical and social sciences.  And, some investigators want to “repurpose” data generated in  earlier experiments, their own or someone else’s  with different endpoints or outcome measures,  different analytical techniques and so forth.  But as the enthusiasm for such efforts grows, the facts about data preservation emerge, or rather, re-emerge with discouraging force; long- term data preservation, in an  accessible and useful format,  is a real technological challenge. Moreover, whatever measures might be suggested as solutions, it’s all going to cost a great deal of money.  Preservation also means more than mere dumping of files someplace, even with something like intelligent and conscientious curation.  A good deal of what we would call subject analysis and description ( OK, metadata in today’s lingo), will be necessary.  Institutional repositories, “Long Tail” marketing, Publishing on Demand (POD), digital publication in general and a lot of other goodies existing now or promised all depend on a reliable and secure “data base”, and I’m wondering if we have it or can get it at a price our institutions can, or will, pay. It’s probably time for a serious assessment of what is possible and what the price tag will be.

Encrypted Data Not Safe After All?

Posted by Alex Bienkowski on Feb 27th, 2008
2008
Feb 27

In the continuing arms race between data protectors and those who don’t like things that way, encryption has been the trick-taking card. Packages that allow users to encrypt data are really quite capable, more than enougt to scare off the casual cracker and quite difficult to break through even for powerful and dedicated systems. But it seems that experiments at Princeton have shown that skilled use of some simple tools can allow a hacker to recover the codes used as encryption keys from the DRAM chips in the machine.  The thought used to be that the chips lost data as soon as power was shut off. But, some DRAMs retain the information for seconds or even minutes after the loss of power. And, freezing the chip with a blast of air cleaner or some commercial Freeze can extend this period even longer; enough for a hacker to tap the chip and recover the keys. With the keys, the hacker can read the cypher with relative ease.  I’m sure there will be more comment on this in the future. It all adds to the drama.

Georgia Harper

CTWatch Quarterly » 2007 » August

Posted by Georgia Harper on Aug 18th, 2007
2007
Aug 18

The latest issue of CTWatch Quarterly » 2007 » August is completely devoted to the future of scholarly communication and cyberinfrastructure. It appears to be a gold mine of insight and information with articles on changes in the form of the scientific article, the use and reuse of data, and incentives for the open access research web, among others. Perhaps I’ll find some time to peruse a couple of them later this weekend and report back.

Georgia Harper

Today I found an animation online, courtesy of the New England Journal of Medicine, that shows the changes in the Framingham social networks over the 32 years of the heart disease study, but focusing, of course, on the weight of those who were part of the changing social networks. The animation is accompanied by a narration that explains what you’re seeing. It’s very well-done, and raises several interesting points: since this kind of interpretive tool is only available online, it illustrates smart exploitation of the digital networked environment for far more than just posting a paper online; it may suggest roles for intermediaries that will help to distinguish them from public repositories, unless public repositories also can host (and perhaps create?) such non-text data; and it could explain, given the current services that public repositories offer, why report authors would not be rushing to post their text only pdfs on their Websites. Note, however, I did learn from report author James Fowler that he has indeed posted text and supplementary materials on his Website. He commented on the original post, below (thanks, James).

Multimedia enriched reports might seem to many authors and their libraries “better” and worth paying for. The ability to imagine and deliver new services built on the corpus of publicly available research data and reports will put to rest any expectation we might have that some day when all the research is available online, publishers won’t have us over a barrel anymore. A snowball’s chance…

Georgia Harper

Obesity study built on old heart disease data

Posted by Georgia Harper on Jul 28th, 2007
2007
Jul 28
NYT graphic about obesity research I, like most of my friends, listened to and read about the study that reported this week that obesity was in a sense, contagious. If you managed to miss this story, you only need to Google “new england journal obesity contagious” (or other similar key-wordy combinations) to come up with dozens of reports of the story. See for example, the New York Times’ report. That same query, however, does not yield the actual report, at least not within the first two pages of either Google or Google Scholar results. It was published in the New England Journal of Medicine. Luckily for me, I can pull it up if I go online to my Libraries’ content stores, but what about those who are not so fortunate as to be affiliated with an institution like mine, and what about linking to the report in this blog entry? Let’s see what SHERPA’s RoMEO says the NEJM’s policy is on open access

Hmm. The publisher, Massachusetts Medical Society, is considered a blue publisher, in this case authorizing its authors immediately to place pdfs of their published articles on their Websites or institutional servers, and into PubMed Central (assuming NIH research support) within 6 months of publication. So, it would appear that the report’s authors may not be taking advantage of this opportunity their publisher generously affords them to make their research report accessible to the public. For this research in particular, that’s pretty ironic, because, what piqued my interest in this story was the fact that the research was built on publicly accessible data!

“[Dr. Christakis] got the idea for [the study] from all the talk of an obesity epidemic.

‘One day I said: ‘Maybe it really is an epidemic. Maybe it spreads from person to person,’ ‘ Dr. Christakis recalled.

It was only by chance that he discovered a way to find out. He learned that the data he needed were in a large federal study of heart disease, the Framingham Heart Study, that had followed the population of Framingham, Mass., for decades, keeping track of nearly every one of its participants.”

Dr. Christakis and his co-investigator, James H. Fowler, repurposed other researchers’ data for their obesity study. That’s one of the most exciting aspects of public repositories like TDL — their potential to make large datasets gathered for research that has already been reported available to others for who knows what kinds of analysis to answer questions that the original researcher never contemplated. A lot more bang for the data-collection dollar, and a wonderful way to contribute to the progress of science. Contributing our datasets also allows us as educators to model for our students the benefits of a shared knowledge environment. Go for it!