Bioinformatics course


Textbooks:

Lengauer: Bioinformatics - from genomes to therapies.. Vol.1-3 (Wiley, 2007)
Isaev A. Introduction to mathematical methods in bioinformatics (Springer, 2006)


Handouts, course materials:

Public links and references: course materials:

For lecture 8: Chapter 5 of the Lengauer-book: David C. Kulp: Finding Protein-coding Genes

For lecture 9: Rabiner's tutorial: http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf ,  Chapter 3 of Isaev's book.



In-house electronic handouts for our students

Syllabus:

 

Course in theory

Laboratory practice

1. What is covered in this course, and why?

Brief introduction, definition of basic, frequently referred phrases like "DNA", "protein", "sequencing" etc.Introduction on algorithmic complexity: big O notation, P and NP classes, approximation algorithms; why are these relevant to us (examples).Basic databases (only a brief overview; they will be mentioned in detail at the beginning of the relevant "Part"). (GV)

 

1, Linux basics, commands, piping, archiving, updating. (RT, IG)

Part 1: Genome sequencing

 2., Genome sequencing: Related problems in bioinformatics: Physical mapping. Physical mapping with backtrack algorithm.
Physical mapping using hybridization (algorithm sketch: PQ-trees) (IG)

 

2, Shell scripts, compilation, ssh tunneling, (RT, IG)

3., Shotgun sequencing. Motivation, calculation of sequence overlaps (later: with suffix trees). Greedy algorithm. (IG)

3, Python script language (ÖR)

Part 2: Sequence analysis

4., Strings, basic problems related to string search: pattern matching, alignments.Brief introduction on data structures.

Two approaches for pattern matching: preprocessing the text or the searched pattern.Distance functions on strings: Hamming-distance, Levenshtein-distance, Levenshtein-distance with different costs.

Dynamic programming: main ideas and applicability to sequence comparisons. "Scoring functions" and their motivations. Brief introduction on scoring matrices (PAM, BLOSUM). Databases of amino acid sequences: UniProt = (SwissProt U TrEMBL); SwissProt and TrEMBL difference; RefSeq (also nucleotide sequences); corresponding websites; download hints (IG)

4, Python continued (ÖR)

5., Pattern matching using pattern preprocessing: the Boyer-Moore algorithm. Pattern matching using text preprocessing: Suffix trees and suffix arrays. Quickly solvable tasks using Suffix trees.
Efficient construction of suffix arrays: radix quicksort, (sketch: Sadakane-algorithm, linear method). Live demonstration of comparing the different search methods. (IG)  

5, Sequencing, sequence alignments (IG)

6., Sequence alignment algorithms

The concept of local, global, "glocal" alignments. Gap penalty types. Alignment with different overlap requirements / different gap penalty types. Motivation behind scoring functions: statistical considerations. Sequence alignment: statistical parameters and their interpretations. Live demonstration of the different alignment algorithms. (IG)

6, Sequencing, sequence alignment (IG)

7., Multiple alignments, heuristic sequence alignment algorithms.

Basic idea: BLAST (in detail), improvement possibilities: PSI-BLAST (sketch). Phylogeny, evolution trees. The NCBI taxonomy tree. Different methods for constructing phylogenetic trees. (IG)

7, Sequencing, sequence alignment (IG)

8, From genes to proteins: finding protein coding genes (GV) Transcription and translation. CDS, ORF, gene finding.

 

8,  System administration for beginners (R

9, AI and Bioinformatics. Markov models. Hidden Markov models. The forward algorithm. Viterbi algorithm. The Viterbi learning algorithm. (GV)

9, cancelled

Part 3: Structure prediction              

10, Molecular structure primer; Molecular structure prediction (GV)

 

10,

11, Drug-protein and protein-protein docking (GV)

11, Docking on a webserver (??)

Part 4: Interaction networks

12, Molecular networks: metabolic and physical interaction networks

Source of information: online databases (DIP, MINT, Intact, HPRD) (GV)

12. PPI download and update (BD)

13, Molecular networks: gene regulatory and cell signaling networks (GV)

13, PPI generation (BD)

 

14, Protein function prediction and similarity (GV)

 

14, PPI analysis (BD)