Thursday, May 18, 2017, 12:20PM, NAC 6/113

Basilis Gidas (Brown University), Finding Genes and Towards a Mathematical Framework for Artificial Intelligence and Biological Systems 

The first half of the lecture will be on a statistical model for finding genes in the human genome. The model contains two parts: (a) A finite network (graph) which represents the overall architecture of a gene. The vertices in the network represent DNA signals (small patterns) associated with a gene and which are recognized by proteins and enzymes involved in the transcription and translation of genes. The edges of the network correspond to interactions among these signals and represent statistical variability in the architecture across genes; (b) each signal and each part of a gene is a piece of DNA with a random length as well as a random variability of its nucleotide sequence. The second part of the model articulates these variabilities.

The above gene finding procedure is conceptually similar to what is believed to underlie speech recognition whereby recognition involves two types of information: The acoustic signal represented by a concatenation of phonemes, and global regularities articulated by grammars (or syntax). The underpinning process in visual recognition is undoubtedly similar. And so is – many practitioners believe – the functioning of biological processes whereby two principles are at work: physics (biochemistry) and evolution. Physics controls the biochemical interaction of macromolecules, but it is evolution that produced the perfect “code” or “syntactic language” for the collective behavior of genes (Gene Regulatory Networks), or the collective behavior of proteins in Signal Transduction Pathways in cell growth, cell division or immunology. While specific questions and application in speech, vision, and biology have seen impressive advances and have lead to a great deal of mathematical innovation (e.g. modern statistical learning), an underpinning mathematical framework is missing. Though we do not have the framework, we know quite a bit of some of the problems the framework needs to articulate and some of the properties it needs to have. Building on the gene finding process, the second part of the talk will aim at identifying some key sources that makes information processing in cognition and biology difficult, and hint towards a coherent hierarchical/grammatical framework.