Free Web Hosting by Netfirms
Web Hosting by Netfirms | Free Domain Names by Netfirms

The Asian Student Medical Journal

About The Asian Student Medical Journal

Writing for Asian Student Medical Journal

Editorial Team

Medical Student Resources

Medical Student Forums

Disclaimer

©Asian Student Medical Journal 2002-2005 All Rights Reserved

 

 

Asian Student Medical Journal[ 10/21/03 ]

Recent Techniques in Biological Research: Bioinformatics
Pankaj Sohaney

Introduction

Successful and productive research in any discipline depends on the efficacy of its hypothesis. A good theoretical model includes all the known and relevant data so that it may closely approximate real life, thus free of any mistaken conjecture that can result in the loss of crucial time. Therefore, in order to be able to use, store, maintain, analyse and manipulate heaps of research data generated everyday, there is an urgent need for better computional models. In a way enormous data which was a challenge in biology are challenges in computing also.Bioinformatics is conceptualising biology in terms of molecules(Chemistry) and applying "ďnformatics techniques(Maths, Computer sciences, statistics ) to understand ,organise and predict the information associated with these molecules, on a large scale.

Bioinformatics plays a very important role in progress of Biology.  Without the help of Bioinformatics, changeover of Biology into an industrial-scale, comprehensive science with significant medical and thus economic impact would be beyond belief. In many ways, current techniques of mathematics and computer science have not been sufficient to support life science research. Bioinformatics, as a result, is not only bringing existing methods to bear on the new problems but also developing, or catalyzing the development of novel techniques in Biological sciences. With its increased importance, the fostering of bioinformatics has become a crucial part of Biology efforts to promote biotechnology and the life sciences in general.

Over the past few decades, major advances in the field of Biology, coupled with advances in genomics and proteomics, have led to an explosive growth in the biological information generated by the scientific community. This deluge of genomic information has, in turn, led to an absolute requirement for computerized databases to store, organize and index the data, and for specialized tools to view and analyze the data.

Bioinformatics

Bioinformatics can be defined as the interface between biotechnology and information technology [IT]. Thus, the people working in this field in most cases either have training in biology or information technology, and they learned about the other field by dealing with problems or using the tools of the other one. Although the term 'Bioinformatics' is not really well defined, you could say that this scientific field deals with the computational management of all kinds of biological information, Most of the bioinformatics work that is being done can be described as analyzing biological data. 

Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. At the beginning of the "genomic revolution,” bioinformatics was mainly concerned with the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences. Development of this type of database involved not only design issues, but also the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data.

Eventually, all this data must be combined to form comprehensive information of normal cellular activities so that researchers may study how these activities are altered in different disease states. Therefore, the field of bioinformatics has evolved such that the most impressive task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. The actual process of analysing and interpreting data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include: the development and implementation of tools that enable efficient access to, and use and management of, various types of information; and the development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences.

Biological Database

A biological database is a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system. A simple database might be a single file containing many records, each of which includes the same set of information, i.e. A record associated with a nucleotide sequence database typically contains information such as contact name; the input sequence with a description of the type of molecule; the scientific name of the source organism from which it was isolated; and, often, literature citations associated with the sequence. A database on the International Mycological Institute's Index of Fungi, is available.  It can be searched by genus or species of fungus and gives the reference (volume and page) to the Index of Fungi.

For researchers to benefit from the data stored in a database, two additional requirements must be met: easy access to the information; and a method for extracting only that information needed to answer a specific biological question.

Evolutionary Biology

An evolution that has joined the muscle of math and computing to the heart of the life sciences: Bioinformatics, which is essential to the use of genomic information in understanding human diseases and in the identification of new molecular targets for drug discovery. In recognition of this, many universities, government institutions and pharmaceutical firms have formed bioinformatics groups, consisting of computational biologists and bioinformatics computer scientists.

      Equally exciting is the potential for uncovering evolutionary relationships and patterns between different forms of life. With the aid of nucleotide and protein sequences, it should be possible to find the ancestral ties between different organisms. So far, experience has taught us that closely related organisms have similar sequences and that more distantly related organisms have more dissimilar sequences. Proteins that show a significant sequence conservation indicating a clear evolutionary relationship are said to be from the same protein family. By studying protein folds (distinct protein building blocks) and families, scientists are able to reconstruct the evolutionary relationship between two species and to estimate the time of divergence between two organisms since they last shared a common ancestor.

The human genome found in every cell of a human being consists of 23 pairs of chromosomes. These chromosomes constitute the 3 billion letters of chemical code that specify the blueprint for a human being. The world Human Genome Project, a vast endeavor aimed at reading this entire DNA code will completely transform biology, medicine and biotechnology. The entire code will be available on our computers, all 30,000 human genes will be identified; all 5000 inherited diseases will become diagnosable and potentially curable; drug design will be completely transformed; and our understanding of ourselves will move into a new dimension. The Genome Project focuses on two main objectives: mapping - pinpointing the genomic location of all genes and markers; and DNA sequencing - reading the chemical "text" of all the genes and their intervening sequences. DNA sequences are entered into large databases, where they can be compared with the known genes, including inter-species comparisons. The explosion of publicly available genomic information resulting from the Human Genome Project has precipitated the need for Bioinformatics capabilities. The science of Bioinformatics, which is the melding of molecular biology with computer science, is essential to the use of genomic information in understanding human diseases and in the identification of new molecular targets for drug discovery. 

Protein modelling

The term proteomic refers to all the proteins expressed by a genome, and thus proteomics involves the identification of proteins in the body and the determination of their role in physiological and pathophysiological functions. The ~30,000 genes defined by the Human Genome Project translate into 300,000 to 1 million proteins when alternate splicing and post-translational modifications are considered. While a genome remains unchanged to a large extent, the proteins in any particular cell change dramatically as genes are turned on and off in response to its environment.

Sequence comparison is a very powerful tool in molecular biology, genetics and protein chemistry. Frequently it is unknown for which proteins a new DNA sequence codes or if it codes for any protein at all. If you compare a new coding sequence with all known sequences there is a high probability to find a similar sequence. Often it is already known which role the protein in the data bank plays in the cell.

The process of evolution has resulted in the production of DNA sequences that encode proteins with specific functions. In the absence of a protein structure that has been determined by X-ray crystallography or NMR spectroscopy, researchers can try to predict the three-dimensional structure using protein or molecular modeling. This method uses experimentally determined protein structures (templates) to predict the structure of another protein that has a similar amino acid sequence (target).

Although molecular modeling may not be as accurate at determining a protein's structure as experimental methods, it is still extremely helpful in proposing and testing various biological hypotheses. Molecular modeling also provides a starting point for researchers wishing to confirm a structure through X-ray crystallography and NMR spectroscopy. As the different genome projects are producing more sequences, and because novel protein folds and families are being determined, protein modeling will become an increasingly important tool for scientists working to understand normal and disease-related processes in living organisms.

Genome Mapping

In 1971, when scientists devised a method to cut large pieces of DNA on each chromosome in to smaller, more manageable pieces the job got a lot easier. Within each of these smaller pieces, scientists were finally able to locate the regions containing genes. As the position of more and more genes were found, a "genetic map" was constructed which showed the positions of the genes relative to each other, and relative to the ends and center of the chromosomes. Genomic maps serve as a scaffold for orienting sequence information. A few years ago, a researcher wishing to localize a gene, or nucleotide sequence, was forced to manually map the genomic region of interest, a time-consuming and often painstaking process.

The science of locating these genes is called "Genetic Mapping" and although we now know the location of a number of very important genes, the map is far from complete. Today, thanks to new technologies and the influx of sequence data, a number of high quality, genome-wide maps are available to the scientific community for use in their research.

Computerized maps make gene hunting faster, cheaper and more practical for almost any scientist. In a nutshell, a scientist would first use a genetic map to assign a gene to a relatively small area of a chromosome. In light of these advances, a researcher's burden has shifted from mapping a genome or genomic region of interest, to navigating a vast number of Web sites and databases.

The rapidly emerging field of bioinformatics promises to lead to advances in understanding basic mycological processes, and in turn, advances in the diagnosis, treatment, and prevention of many genetic diseases. Bioinformatics has transformed the discipline of biology from a purely lab-based science to an information science as well. Increasingly, biological studies begin with a scientist conducting vast numbers of database and Web site searches to formulate specific hypotheses or design large-scale experiments. The implications behind this change, for both science and medicine, are staggering.

Importance

The justification for applying computational approaches to facilitate the understanding of various biological processes includes: a more global perspective in experimental design; and the ability to capitalize on the emerging technology of database-mining: the process by which testable hypotheses are generated regarding the function or structure of a gene or protein of interest by identifying similar sequences in better characterized organisms.

The input of bioinformatics in drug discovery is twofold: firstly the computer may help to optimize the pharmacological profile of existing drugs by guiding the synthesis of new and "better" compounds. Secondly, as more and more structural information on possible protein targets and their biochemical role in the cell becomes available, completely new therapeutic concepts can be developed. The computer helps in both steps: to find out about possible biological functions of a protein by comparing its amino acid sequence to databases of proteins with known function, and to understand the molecular workings of a given protein structure. Understanding the biological or biochemical mechanism of a disease then often suggests the types of molecules needed for new drugs.  The effective integration and use of information will become the single biggest differentiator of pharmaceutical R&D competitive advantage in the next decade.

 IT Scenario

Accurate predictions help Scientist/researchers to prepare in advance for potential calamity. It also helps them to decide alternative method in advance. 


                However, the step from theoretical biomathematics to applied bioinformatics, intending to produce software from an algorithm, is not an easy one. It requires a supportive R&D climate that generates a local need for such research, and an appropriate computer science infrastructure. At present, bioinformatics has successfully been applied only in those developing countries where these requirements are met, such as Brazil, China, India, Mexico and South Africa.

Lots of work currently available in Bioinformatics involves the design and implementation of programs and systems for the storage, management and analysis of vast amounts of DNA sequence data. This requires in-depth programming and relational database skills, which very few biologists possess, and so it is largely the computational specialists who are filling these roles. This is not to say the computer-savvy biologist doesn't play an important role. As the bioinformatics field matures there will be a huge demand for outreach to the biological community to sift through gigabases of genomic sequence in search of novel targets.

Programming Skills in addition to extensive knowledge of mycological packages, one will need to learn web and programming skills including HTML, Perl, JAVA and C++ and be familiar with a variety of operating systems (especially UNIX and Linux). Relational database skills like SQL and database application such as Sybase or Oracle will be highly advantageous.

 

 

THE AUTHOR
The Author is Vice President "Soochana"  IT Cooperative Society, Jabalpur and is doing Research on “DEVELOMENT of COMPUTER BASED EPITOME AS AN AID IN THE INDENTIFICATION OF IMPORTANT DEUTEROMYCETES FUNGI” with Dr. A. K. Pandey, Department of Biological Sciences, RD University, Jabalpur (MP),and is the author of Computer Ek Sahayak[in Hindi]

 

 

ASMJ has moved to www.asmj.org This site supported by www.aippg.com