We will now discuss some basic concepts in molecular biology and introduce the main players, the biological macromolecules, that are responsible for the precise duplication of the genetic information when the cell grows and divides and the flow of information from the genes to the molecules that regulate or control cellular activity.
What differentiates a living cell from an inanimate object is its ability to grow and divide, and in the process, replicate its genetic information and pass it on to the daughter cells. Thus, cells transmit hereditary properties from one generation to another.
Around the mid-1800s microscopists had observed that all plants and animals are constructed from small fundamental units called cells. All organisms could be divided into two main classes:
- Prokaryotes, which are single cell organisms (e.g. bacteria and blue-green algae), with cell sizes ~ 1-10 mm, and whose genetic material is mixed in with all the other cellular components.
- Eukaryotes, which are usually multicellular organisms (e.g. plants, animals, fungi), and with slightly larger cell sizes ~ 10-100 mm. the primary distinction between prokaryotes and eukaryotes is that the cells in the latter contain an inner body, the nucleus, surrounded by a nuclear membrane.
What are genes and where are they located?
These hereditary properties of an organism are controlled by what are called genes which are located in the chromosomes.
That genetic information is transmitted from one generation to another was first discovered by Gregor Mendel ~1861.
The origin of genetic variability through mutations
Genes are normally copied exactly during chromosome duplication. Sometimes, however, there are errors in duplication (also called mutations) which change the property of the gene. Most of the time, the mutations are silent and do not affect the organism in any way, and sometimes they are deleterious, in which case the cell dies. Rarely, but not insignificantly, the mutations lead to an improvement in the ability of the organism to adapt to a constantly changing physical and biological environment. The origin of genetic variability through mutations is the basis behind the theory of evolution, proposed by Charles Darwin ~ 1859.
The idea that mutations can be spontaneous was first confirmed on genetic experiments using the tiny fruit fly Drosophila which offered a distinct advantage over Mendel’s experiments on pea seeds. The fruit flies multiply very rapidly with a new generation being produced every 14 days (approximately 25 times more rapidly than peas). The first mutant found was a male with white eyes that spontaneously appeared in a culture-bottle of red-eyed flies. The physical characteristic (white eyes) is called a phenotype and the gene that controls that characteristics is called a genotype.
What are chromosomes made of?
There are two kinds of biological macromolecules that make up the chromosomes, DNA and proteins.
Before ~ 1950, the structures of neither DNA nor proteins had been established.
What was known about proteins:
- Proteins are made up of a chain of repeating units (called amino acids).
- There are 20 different kinds of amino acids that make up the proteins of all the organisms.
- One important function of proteins was as enzymes, molecules that catalyze chemical reactions inside the cell. (All known enzymes at that time were proteins).
- A specific enzyme has a unique sequence in which these amino acids are strung together, that is, the linear sequence of amino acids determines its function uniquely.
- DNA molecules are also long polymers made up of a string of repeating units called the nucleotides.
- There are 4 different kinds of nucleotides (A,G,C,T) that make up the DNA of all the organisms.
- The exact composition of the different nucleotides varies widely from one organism to another; however [A] was always found in the same proportion as [T] and [G] was always found in the same proportion as [C].
- In eukaryotes, the DNA is always found inside the nucleus, and never where there are no chromosomes.
- There was a strong suspicion that DNA was in fact the carrier of the genetic code, although how it managed to replicate, almost error free, was still a mystery.
- The genetic information in DNA somehow controlled the synthesis of proteins.
The key players in the triumph of applying x-ray crystallography to solve the structures of biomolecules were Max Perutz and John Kendrew, who were trying to solve the structures of proteins; Rosalind Franklin and Maurice Wilkins, who were at the same time working on the structure of DNA and had high quality x-ray diffraction pictures of strands of DNA; Linus Pauling, who suggested, from stereo-chemical arguments that a helical structure is a common pattern to be expected for a polymer chain with roughly identical repeating units; Francis Crick and James Watson, who put all the information together to correctly predict that the x-ray diffraction pictures of DNA were consistent with a double-stranded helix with each strand exactly complementary to the other strand.
The Double Helix
In the double helix, the two DNA chains are held together by weak noncovalent bonds called hydrogen bonds between pairs of nucleic acids on the opposite strands.
Watson and Crick realized that if A (adenine) always paired with T (thymine), and G (guanine) always paired with C (cytosine), then the distance between the two strands would be identical for all base-pairs (~ 11 D ). So, although the molecular dimensions of the four bases are very different, with A and G almost twice as big as T and C, the pairs A-T and C-G happen to be exactly the same size across. This ladder like structure then satisfied all the requirements that Pauling showed should result in a helical structure.
The specific base-pairing also explains how the molecule can replicate:
The sequence of nucleotides on one chain is exactly complementary to the sequence of nucleotides on the second strand, and each strand can act as a template to synthesize a new strand during replication with no loss of information.
How long is a DNA strand?
A typical DNA strand in our cells can be 50 – 250 million base-pairs long. The separation between the base-pairs (or the rungs of the ladder) is ~ 3.4 Angstroms. Therefore, the linear dimensions of our DNA are (250x106 bases) x (3.4x10-8 cm/base) = 8.5 cm. If we add up all the DNA in one of our cells, it can be about a meter long!!! The width of a double-helix is only 2 nm. DNA molecules are therefore very long molecular threads.
The Genetic Code
The sequence of nucleotides A, G, C, T along a particular strand of DNA specifies the genetic information. A gene is a sequence of nucleotides along the DNA that codes for one protein chain.
Even with only 4 letters, the number of potential DNA sequences for N number of letters is 4N, a very large number for even the smallest of DNA molecules. For example, a gene containing ~ 1500 base-pairs has 41500 ~ 10903 possible sequences accessible, a number that is virtually infinite!
Since there are 4 letters (alphabets) in the DNA sequence and there are 20 different kinds of amino acids which make up all know proteins, it is necessary to generate 20 words from the 4 alphabets in the DNA chain.
What is the minimum size of the ‘word’ in the DNA sequence that is necessary to code for all 20 amino acids? Clearly 1-letter words are not sufficient. 2-letter words also fall short since that could code for only 42 = 16 amino acids. 3-letter words give rise to 43 = 64 possible combinations, which are more than sufficient. These 3-letter words are called codons. It turns out that 3 out of the 64 possible codons are reserved for stop signals to specify the end of the gene, and the rest are used to code for amino acids. Therefore a particular amino acid can have more than one, and some have up to four codons.
Each strand of DNA has a direction along which the sequence is read, referred to as the direction. The two strands on the double helix run in opposite directions.
The Genetic Code
There is no special start sequence. The codon AUG which also codes for the amino acid ‘methionine’ (designated as Met) also serves as the start codon if it is preceded by a string of alphabets, much longer than a single codon, called the promoter sequence.
The Central Dogma of Molecular Biology
The flow of information from the DNA to protein synthesis
The DNA double helix does not act as a template for direct protein synthesis. In eukaryotes the DNA is located inside the nucleus, whereas protein synthesis occurs in the cytoplasm which is outside the nucleus.
Therefore, there must be another information containing molecule that can transfer the genetic information from the DNA inside the nucleus to the protein synthesis site in the cytoplasm. This molecule is the RNA, which is chemically very similar to DNA. It is also a long chain of nucleic acids made of 4 types of nucleotides.
The main differences between RNA and DNA are:
- In RNA the 4 bases are A, G, C, U; the T in DNA is replaced by U (uracil) in RNA which is also capable of base-pairing with A.
- The sugar molecule in the sugar-phosphate backbone of both DNA and RNA is slightly different with an oxygen missing in the sugars of DNA (thus the name deoxyribonucleic acid). RNA stands for ribonucleic acid.
- RNA molecules are single-stranded. The chain can and does fold back upon itself so that base-pairs can form between complementary regions of the same chain.
The arrows indicate the directions for the transfer of genetic information.
DNA serves as the template for its self-replication or duplication. All cellular RNA molecules are ‘transcribed’ from DNA templates. All protein sequences are determined by ‘translation’ of the RNA nucleic acid sequence into a corresponding amino acid sequence.
The arrows are unidirectional; that is, RNA sequences are never determined by protein templates, and RNA chains seldom act as templates for DNA chains.
The reverse flow of information from RNA to DNA, called reverse transcription, can happen but is very rare. As an example, when certain viruses that contain only RNA infect a cell, the viral RNA acts as a template for a single-stranded DNA chain, which then acts as a template for its own complementary strand. This double-stranded copy of the viral RNA is then incorporated into the DNA of the host cell, resulting in the multiplication of the viral RNA using the cell’s machinery.
In the replication process, the two strands of DNA separate and each strand acts as a template for its complementary strand, as shown in the figure. The replication is catalyzed by an enzyme called DNA polymerase.
Transcription refers to the process in which a single gene located on one of the strands of the double-stranded DNA is copied over into an RNA strand. The RNA strand thus produced is called a messenger RNA or mRNA. Which strand acts as a template for the RNA synthesis varies for different genes along the DNA molecule. The strand on which the gene is located and whose 5’ to 3’ direction correctly reads off the amino acid sequence for the corresponding protein is the non-template strand for RNA synthesis.
As an example, consider the following DNA strand where the top strand is read from left to right and the bottom strand is read from right to left. If the gene is located on the top strand, then the bottom strand acts as a template for mRNA synthesis. The mRNA sequence thus generated will have a sequence identical to the strand that has the gene.
During RNA synthesis, an enzyme called RNA polymerase (an oval shaped protein ~ 20 nm across) binds to the promoter sequence along the DNA chain and initiates the transcription process. As the RNA polymerase moves along the DNA chain, it unwinds the DNA in front, thus exposing the bases of the two strands; the exposed template strand is used to form base-pairs with RNA nucleotides, one nucleotide at a time, and when the RNA polymerase encounters a stop signal, the DNA as well as the newly synthesized RNA are released.
The process by which the information encoded in the mRNA is used to synthesize a protein with the appropriate amino acid sequence is called translation. In eukaryotes, the mRNA has to be first transported from the nucleus to the cytoplasm before protein synthesis can occur.
Although the mRNA has all the information necessary to make a specific protein, the amino acids do not bind directly to the nucleotides on the mRNA. Instead, there is another adapter molecule necessary to ensure that the protein synthesis is true to the mRNA sequence and reduce the possibility of errors.
This adapter molecule is also an RNA molecule, called the transfer RNA (tRNA) because of the role it plays in transferring the appropriate amino acid to the growing protein chain. There is at least one tRNA for each amino acid and often more than one. The tRNA consists of about 80-100 nucleotides. The single-stranded chain is folded up into an L-shaped structure. One arm of the L attaches to the appropriate amino acid. The other arm has a loop with 3 unpaired nucleotides which are complementary to the codon on the mRNA that codes for that specific amino acid. Hence, this loop is also referred to as the anti-codon loop of the tRNA molecule. The tRNAs have their respective amino acids covalently attached to one end. Only the tRNA whose anti-codon loop makes a complementary match with the codon on themRNA will position itself long enough on the mRNA to have its amino acid incorporated into the polypeptide chain. An impersonator whose anti-codon does not have a good fit with the codon will be shaken off by thermal motion.
Proteins are synthesized on the ribosome, a complex of more than 50 proteins and several (2-4) ribosomal RNAs (rRNA). The ribosome binds to the mRNA at a specific site that sets the reading frame, and the tRNA molecules position themselves on the ribosome so as to bind to the specific codon. The ribosome then catalyzes the bond formation between successive amino acids on the growing protein chain.
RNAs can be Enzymes
In the last decade or so, RNA molecules have been discovered that act as enzymes and can catalyze reactions, long considered the domain of proteins only. In fact, when the 3-dimensional structure of the ribosome was finally completed (within the last two years) what came as a pleasant surprise to the RNA people is the fact that at the active site of this massive enzyme (where the bonds between amino acids are formed) no protein molecules are to be found. It appears that proteins are primarily used as packing material or cement to hold the structure together, and that the rRNAs embedded in the structure do the catalysis.
The discovery that RNAs can catalyze reactions has led to some interesting evolutionary speculations that the early organism thrived in an RNA world. As we will see in the following weeks, the RNA molecule is in between a DNA and a protein, both capable of storing information in its linear sequence of nucleotides, as well as able to form complicated 3-dimensional structures like proteins and able to catalyze reactions. Therefore, the early organisms could well have survived on RNA alone. As the organisms became more complex, it became necessary and more efficient to separate the information storage (in the form of DNA) from enzyme catalysis (in the form of proteins) with the RNA left as a go-between. However, a remnant left from the very early stages of evolution, protein synthesis has survived as a reaction still catalyzed only by ribosomal RNA.
Best Wishes: Dr.Ehab Aboueladab, Tel:01007834123 Email:email@example.com,firstname.lastname@example.org