Since diamonds are supposed to be forever, data encoded in them would also be forever. That seems to be the logic behind a fascinating new way to store a mind-boggling amount of data the world is creating every year now.
The Smithsonian magazine reports an extraordinary new way to store the ever increasing data being pioneered by researchers Siddharth Dhomkar and Jacob Henshaw from the City College of New York.
The abstract of their peer reviewed piece in Science Advances is quite intriguing. It says, “The negatively charged nitrogen vacancy (NV−) center in diamond is the focus of widespread attention for applications ranging from quantum information processing to nanoscale metrology. Although most work so far has focused on the NV− optical and spin properties, control of the charge state promises complementary opportunities. One intriguing possibility is the long-term storage of information, a notion we hereby introduce using NV-rich, type 1b diamond. As a proof of principle, we use multicolor optical microscopy to read, write, and reset arbitrary data sets with two-dimensional (2D) binary bit density comparable to present digital-video-disk (DVD) technology. Leveraging on the singular dynamics of NV− ionization, we encode information on different planes of the diamond crystal with no cross-talk, hence extending the storage capacity to three dimensions. Furthermore, we correlate the center’s charge state and the nuclear spin polarization of the nitrogen host and show that the latter is robust to a cycle of NV−ionization and recharge. In combination with super-resolution microscopy techniques, these observations provide a route toward subdiffraction NV charge control, a regime where the storage capacity could exceed present technologies.”
The Smithsonian magazine piece by Jason Daley describes it thus: “The system uses minute defects in the diamond's crystal structure, which can be found in even the most visually flawless of these gems. These imperfections occasionally create voids in the structure where a carbon atom is supposed to sit. Nitrogen atoms also occasionally slip into the structure. When a nitrogen atom is position next to this missing carbon atom, a so-called nitrogen vacancy (NV) occurs, which often traps electrons. Dhomkar uses these nitrogen vacancies as a substitute for the binary ones and zeros. If the vacancy has an electron in place, it’s a one; if it's empty, it's a zero. Using a green laser pulse, the researchers can trap an electron in the NV. A red laser pulse can pop an electron out of an NV, allowing researchers to write binary code within the diamond structure.”
I grant that all this is nearly impossible to understand for most but what is important is that it works.
By the way, just so you know how much a zettabyte is it is 1 followed by 21 zeroes. That is what we produce every year and it is only expected to increase.
I am reminded of two pieces that I wrote on August 17 and 19, 2012 about a technique to encode data in DNA. I republish them today.
August 17, 2012
DNA as a data storage medium
I am as guilty of crowding and cluttering the digital space as the 2,267,233,742 other Internet users worldwide. This blog is just one example of the useless content creation that occupies an ever increasing digital space. After all, what I write is stored digitally somewhere.
This information/content overload is beginning to pose a serious challenge to the digital storage of data. It is in this context that I note with great interest the remarkable new technology of storing data to DNA. A team of three scientists, George M. Crunch, Yuan Gao and Sriram Kosuri report in the journal Science about this new technology.
An extract on the journal’s website says, “Digital information is accumulating at an astounding rate, straining our ability to store and archive it. DNA is among the most dense and stable information media known. The development of new technologies in both DNA synthesis and sequencing make DNA an increasingly feasible digital storage medium. Here, we develop a strategy to encode arbitrary digital information in DNA, write a 5.27-megabit book using DNA microchips, and read the book using next-generation DNA sequencing.”
Intrigued by the idea I reached out to Dr. Kosuri, co-author of the Science piece who works at Harvard’s Wyss Institute. I emailed him this morning my intention to report the story for the IANS wire. Once I have interacted with him I will report my conversation on the wire as well as here.
In writing about the new technology Geraint Jones of The Guardian touches upon a thought that instantly crossed my mind on reading the extract in Science. Could a living organism be used for encoding data? While I wait for Dr. Kosuri’s response to me on the question, Jones says, “The work did not involve living organisms, which would have introduced unnecessary complications and some risks. The biological function of a cell could be affected and portions of DNA not used by the cell could be removed or mutated. "If the goal is information storage, there's no need to use a cell," said Kosuri.”
Perhaps I am running ahead of myself but the idea of encoding data onto a living organism, namely me for instance, is fraught with technological singularity.
Although I will carry Dr. Kosuri’s full interview tomorrow, here is the bit that relates to whether it is theoretically possible to encode data to oneself. He says, “It is possible, but there are a lot of disadvantages. First, some of the sequences could be detrimental to the cell and could be lost. Second, because the information is extraneous to the cell's functioning, it's likely to be mutated or deleted wholesale in the long term. Finally, there is no real use to keep it inside the cell, as the cell or a body wouldn't really know what to do with it. It's not like we can read our own genomes inside ourselves and learn from it.”
Speaking of data and living organisms, one little nugget that has been widely distributed on the Net lately is about how much data a single sperm carries. I do not at all vouch for the accuracy of this but I have received several emails from friends containing this bit of information. According to this one sperm has 37.5 MB of DNA, which in turn, means that one normal ejaculation represents a data transfer of 1,587 GB in about 3 seconds. There is massive data being flushed down every minute of every day.
August 19, 2012
Storing zettabytes on few grams of DNA

Dr. Sriram Kosuri of Harvard University’s Wyss Institute (Pic:http://openwetware.org/wiki/Sriram_Kosuri)
Archival storage of the world’s ever increasing volume of information is reaching a point where it will require a breakthrough technology to meet the enormous need.
Although flash drives and external hard drives have become remarkably efficient and eminently affordable, the speed of vast data creation on a daily basis could make them impractical. What also makes them impractical over the long-term is the sheer physical wear and tear.
As the world confronts 2.7 zettabytes of information (Projection by the International Data Corporation--1 zettabyte is 1 followed by 21 zeroes), it makes perfect sense to look for alternative ways to archive it. A team of three scientists, George M. Church, Yuan Gao and Sriram Kosuri, has done precisely that. They wrote data to Deoxyribonucleic acid or DNA, nature’s astonishingly large data storage system. DNA as we know contains genetic instructions for living organisms.
Hard as I may try I am still unable to fully picture or comprehend how DNA data storage works. Interviewing one of the three researchers at Harvard University’s Wyss Institute was perhaps one way to cover some ground. So I decided to contact Dr. Kosuri who responded rather expeditiously to my questions via email. Reproduced below is the written Q and A between Dr. Kosuri and I. I am writing a separate news story for the IANS wire.
Q: Would you mind describing the concept of DNA data storage as opposed to, say, on a flash drive?
Sriram Kosuri (SK): In terms of the paper (that they wrote for Science), on its face DNA storage is quite simple and has been done in on small scales previously. DNA is a four base code, individual bits (1's and 0's) can be encoded onto the DNA using arbitrary encoding scheme. We chose a simple code where two bases (A & C) correspond to 1 and the other two (G & T) to 0.
There are two main advantages of DNA storage as compared to other forms. First, it's an extremely dense form of information storage mostly because you can store the DNA in 3D. So it's a 3D storage technology and when compared to other technologies (including experimental ones like positioning individual atoms on a surface, or holographic storage), it's much more dense. This is despite major efforts to increase data storage densities in these other technologies over decades. The other main advantage is that DNA is very long-lived compared to existing data storage mechanisms, and sequencing DNA is probably going to be a readable standard for years to come (unlike for example magnetic tape).
The major disadvantages are that it is immutable (can't be modified once written), not random-access (have to read the whole thing to get any part), and slow (right now), and thus is more of an archival storage mechanism than directly comparable to flash or hard drives.
Q: The common perception of DNA is some sort of a life code sitting inside a fuzzy, stringy cell. For those would you care to explain what it means to write data to DNA?
SK: Sure; DNA is a chemical polymer; essentially a chain of nucleotides; adenine (A), guanine (G), cytosine (C), and thymine (T) in some order. To build it up, you have to string together the sequence you want one base at a time. The process we use was from a company called Agilent Technologies (a spinoff of HP) that uses ink-jet printing to build up such a sequence. Simplified, it's like replacing your 4 color ink jet printer with the four nucleotides, and then building up sequences on individual spots. The limitations of chemically synthesized DNA is that it that there are always errors, and you can't make more than small stretches (>200 at a time), but you can do it in a highly parallel manner.
Q: For instance, is there a cell involved?
SK: No cell involved. It's all pretty well established chemistry. The DNA never sees the inside of a cell.
Q: Can you picture for us what a DNA data storage device, if there is any such thing, would look like?
SK: We do this exercise in the supplement, as shown below:
Large-scale Storage Considerations
At some point, storing DNA as a single large mass with extremely large barcodes is both unrealistic and cumbersome no matter the future sequencing and synthesis technologies. To understand where this trade-off lies, we hypothetically imagine such a data store without constraining ourselves on sequencing/synthesis costs.
A larger 48-bit address (2.8e14 unique addresses) block with a 128bit data block would require 216nt length oligo synthesis, which is already available. Such a scheme would give 1.85e-19 g/bit, which would give 1.48 mg of DNA for storage of 1 petabyte of information (at 100x coverage). This DNA, stored in 1536 well plate, would give ~1.5 exabytes with dimensions of 128mm x 86mm x 13mm.
Reading each well (petabyte) would require ~1 exabase of sequencing, or 1.8e6 HiSeq runs (600e9 bases per run). Thus, we would need ~6 orders of magnitude improvement in sequencing technologies for routinely reading petabytes of DNA information. For synthesis, current Agilent arrays top out at 1e6 features (we need 6.25e13 features for a petabyte); so ~7-8 orders of magnitude improvement in synthesis technologies is required. However, reading and writing costs do not dominate in very long-term storage applications (e.g., century or longer archival storage), and could currently be cost competitive due to the expected lower maintenance costs.
Q: Extraordinary longevity of stored data is one of the major benefits of DNA data storing. Is that one of the considerations here?
SK: Yes. It's one of the major reasons this is really good for archival information. DNA can last for thousands of years in dried form, and is very stable.
Q: What about reading back the stored data? How is that accomplished?
SK: We use the same instruments that people use to sequence human genomes. The process involves using what people call next-generation sequencers in order to read out all the data cheaply
Q: Is it theoretically possible to encode the data to a living organism, say for instance, oneself? If yes, what would be the implications? To me that would seem like a strange form of technological singularity.
SK: It is possible, but there are a lot of disadvantages. First, some of the sequences could be detrimental to the cell and could be lost. Second, because the information is extraneous to the cell's functioning, it's likely to be mutated or deleted wholesale in the long term. Finally, there is no real use to keep it inside the cell, as the cell or a body wouldn't really know what to do with it. It's not like we can read our own genomes inside ourselves and learn from it.
Q: How soon would DNA data storage become a market reality?
SK: It's possible that it is doable now for very specialized long-term archival information needs (>100 years). However, for large scale storage, as we estimated in that pasted text, we are 6 and 8 orders of magnitude away for sequencing and synthesis technology scales respectively. To give perspective, we have seen a similar order drop in the last 10 years or so for both of those technologies. There are many reasons why we might not continue to increase these scales at the torrid pace we have, but if we do keep pace, about a decade.
Q: Your work led to the storage of 5.27 megabit of data, which is 600 times more than what had been achieved earlier. How was that accomplished?
SK: The main thing we do is that we are able to leverage the massive cost and scale advantages of next-generation synthesis and sequencing technologies while overcoming their limitations, which is that they can only read and write short pieces. Thus we make many short pieces, and each short piece has an address that lets us know where that data goes in the whole data stream.
Q: How does this technology potentially change computing in general?
SK: Well it's mostly a concept paper right now. The real concept that this paper brings up is that we can continue on a path to try to improve data densities on existing technologies to get to the densities we display with DNA, or we can work on getting DNA storage technologies scaled and cheaper as an alternative path.