Researchers in the laboratory of Frederick Alt at the Howard Hughes Medical Institute and Program in Cellular and Molecular Medicine (PCMM) at Boston Children's Hospital, led by Feilong Meng and in collaboration with teams at several other major centers working on the genetics of immunology and cancer, have identified a relationship between sites of convergent gene transcription, the presence of intragenic super-enhancers, and the mis-targeting of the mutagenic activity of activation-induced cytidine deaminase (AID).  These findings, published online in the journal Cell on December 4, 2014, have great potential impact on our understanding of mechanisms that lead to human B cell lymphomas.

Left to Right: Drs. Feilong Meng, Fred Alt, Zhou Du, and Jiazhi Hu

The Alt lab has a long history of work on the two main antibody gene diversification processes that occur in response to B lymphocyte activation by antigens: Somatic hypermutation (SHM) allows B cells to hugely increase antibody specificity by introducing point mutations into the DNA of gene exons that encode antigen-binding variable regions, and class switch recombination (CSR), which changes the antibody being produced from one class to another through an antibody gene DNA breakage-and-joining mechanism that customizes the way in which the antibody attacks and eliminates a pathogen.

Sometimes referred to as a master regulator of antibody diversification, the enzyme AID initiates both SHM and CSR by causing mutations and DNA breaks in antibody gene DNA.  Specialized features, such as a high density of preferred AID target motifs or ability to form requisite DNA secondary structures to allow AID access, evolved to allow antibody genes to recruit AID and serve as AID targets.  However, AID mutational activity can also act on a small subset of other cellular genes (called off-targets), in some cases promoting activation of cellular genes that can cause cancer (oncogenes). One major mechanism of such aberrant B-cell lymphoma-promoting AID activity is generation of DNA breaks in or around oncogenes that lead to their fusion to other genes in the form of chromosomal translocations.  How AID is recruited to a small subset of off-target genes that appears to be enriched in potential oncogenes had long been a mystery.

Finding AID off-targets in primary B cells has recently been made much easier by large-scale techniques, including high-throughput genome-wide translocation sequencing (HTGTS), which the Alt lab originated. They had previously identified 15 non-Ig genes as repeated targets of DNA breaks (DSBs) made by AID. Realizing that more data would be necessary for the depth of analysis they were after, the Alt team applied a more sensitive version of HTGTS to a specially tailored set of activated B cells.  The increased sensitivity of this approach allowed them to identify 51 AID off-target genes via their recurrent translocation.  A key finding was that AID targeting was confined to narrow regions within these off-target genes, which they set about analyzing for possible common features.

Co-first author Fei-Long Meng of PCMM first sought to figure out whether the targets shared any obvious features in antibody genes.  AID targeting of antibody genes requires transcription, and they noted that all of the off-target genes were highly transcribed from their normal upstream promoter. However, many other genes are similarly transcribed but are not AID targets.  Another possibility was shared target DNA features with antibody genes; however, analysis of the AID off-targets did not pinpoint any culprit DNA sequences that were not found in many other transcribed genes that are not off-targets.

Model of AID "off-targeting".
Top: At AID off-targets, SEs overlap with gene bodies and this combination generates regions of sense/antisense convergent transcription due to sense gene transcription encountering the enhancer antisense transcription.
Bottom: Stalled RNA polymerase with the help of Spt5 recruits AID and generates regions of ssDNA. RNA Exosome or other RNases degrade the aborted sense and antisense transcripts, and works together with RPA to help AID access to the ssDNA substrates.

(Courtesy of Cell, Publication Ahead of Print, DOI: 

Transcription of genes generates a messenger RNA copy of one strand of the duplex DNA molecule that encodes their protein product.  The type of transcription through genes leading to the generation of their messenger RNA is referred to as "sense" transcription.  However, it is now known that many genes undergo transcription in the opposite direction (referred to as anti-sense transcription) to form an RNA copy of the complementary non-coding DNA strand.  Anti-sense transcription may play various roles, but a major one is in regulating gene expression.  Dr. Meng sought to look for more specialized aspects of gene transcription, such as anti-sense transcription, in the narrow AID target regions.

Genome-wide nuclear run-on sequencing (GRO-seq) is another large-scale method capable of revealing the detailed transcription pattern of every gene in the genome.  The Alt team applied that technique and, by visual inspection, found that off-target genes had normal sense transcription running through them—but in the key target areas, also had anti-sense transcription running in the opposite direction.  Areas where sense and anti-sense transcription are both enriched and overlapping have been termed as harboring 'convergent transcription.'

Proving the significance of convergent transcription in AID off-targeting requires comparing off-target gene convergent transcription patterns with those of all others genome wide that are not AID off-targets.  Realizing the complexity of the transcription analysis they were undertaking, Dr. Alt's lab established a collaboration with Shirley Liu of the Dana-Farber's Department of Biostatistics and Computational Biology and her colleague Zhou Du (now in the Alt lab).  They worked for a year or more to develop the algorithms and other bioinformatics tools to study transcription across the whole genome in both directions, identifying not just where it is convergent (which is a lot of places), but where it is convergent at the highest levels.  These studies showed that nearly all AID off-target regions occurred in areas with high-level convergent transcription.  However, a large fraction of convergently transcribed genomic regions were not targeted.

In another line of analysis for AID off-target mechanisms that dramatically intersected with the transcription work, the Alt lab was engaged in a collaboration with James Bradner and his colleague Alexander Federation in the Dept. of Medical Oncology at the Dana-Farber to study potential roles of so-called 'super-enhancers' in AID targeting.  Typical transcriptional enhancers are made up of single or grouped transcription factor binding sequences that increase the expression of target genes.  Recently, super-enhancers (SEs) were identified: clusters of enhancers ten times the length of 'regular' enhancers and much more robust in effect. SEs tend to be on lineage-specific genes, i.e., in a B-cell, they would be in or around genes that would be expressed in an activated B-cell (a subset of those being oncogenes), and they have high levels of sense and anti-sense transcriptional activity.

The Bradner lab has been deeply involved in elucidating SE functions and particularly their roles as therapeutic targets in cancer.  Their preliminary analyses of the initial subset of AID off-targets identified by the Alt lab suggested a link between SEs and AID off-target regions that became highly significant when the enlarged set was similarly analyzed—with nearly all AID target regions lying within SEs within genes (intragenic SEs).  However, more than 400 intragenic SEs were identified in CSR-activated B cells, and again only a subset were AID targets.

Based on the convergent transcription and SE analyses, the Alt team arranged the 400-plus intragenic SEs into deciles based on levels of convergent transcription.  On this basis, they found a dramatically clear direct association between these factors and the AID off-targets defined by HTGTS: the higher the rate of convergent transcription, the more likely the intragenic SEs were to be AID off-targets. Dr. Meng then looked at how long the convergently transcribed regions were; among the genes with the very longest convergent regions, 80% were high-frequency translocation targets.

Thus, the Alt team—along with an impressive interdisciplinary set of collaborators—showed that AID off-targeting occurs in areas of convergent transcription created by intragenic SEs.  While these studies clearly implicate this mechanism for AID off-targeting in CSR-activated B cells, there is still more to do, as a major class of human B-cell lymphomas derive from B cells activated to undergo AID-initiated SHM in specialized structures referred as germinal centers (GCs).  Work in the current study also implicated a similar mechanism for AID activity on a limited set of known mouse GC B-cell AID off-targets and also showed that at least some recurrent oncogene translocations in human B cell lymphomas occurred within intragenic SEs observed in human GC B cells.  Now, with the substantial predictive ability implicated by the CSR-activated B-cell studies, future work will focus on further defining AID off-targets in mouse and human GC B cells.  

AID and related enzymes also have been implicated as causing mutations associated with other cancers.  In collaboration with Michel Nussenzweig (Rockefeller University) and Rafael Casellas, who together have a related back-to-back companion paper in Cell, the Alt team also showed in their study that the convergent transcription/intragenic SE correlation extended to a different set of AID off-target genes when AID was ectopically expressed at high levels in mouse fibroblasts.  These findings motivate studies of potential similar mechanisms of AID or related enzyme targeting that may contribute to non-lymphoid cancers.

Finally, as Alt explained, "Another important question raised by our study is, 'How can you make AID work more actively on the antibody genes you want it to work on, without increasing oncogenic off-target activity?'  Many workers feel that if we were able to address this question, we might find useful strategies for better generating certain vaccines, such as vaccines to elicit HIV broadly neutralizing antibodies that carry a very high load of SHM. Beyond relevance to cancer biology, such goals also drive us to continue this line of studies."