Results
Existing informatics platforms and software have been assessed and substantial progress in hospital patient data-warehousing and genomic sequence databasing has been made. Integration with a web-accessible informatics platform is underway. Another major achievement has been obtaining Research Ethics (REC) and National Information Governance Board (NIGB) approval to link whole genome data from C. difficile, S. aureus and Norovirus to hospital data from the partner NHS Trusts without individual consent .
C. difficile
Large archives of EIA-toxin-positive stools from Oxford and cultured isolates from the Leeds NHS Trust are in place. Oxfordshire transmission is currently being analysed using C. difficile epidemiological data from the Oxford University Hospitals (OUH) Infection Research Database (IORD). All hospitalised patients with C. difficile infection are now undergoing structured case review, and data is increasingly being entered electronically using wireless devices. Analysis describing the association between pathogenicity locus (PaLoc) type and host sequence has been published (PLoS One, 2011). Trends in C. difficile infections (CDI) over 2006-2010 and the association between ST and mortality were presented at ICAAC 2010 (papers submitted and in late draft). An improved MLST typing scheme for C. difficile (http://pubmlst.org/cdifficile), JClinMicro 2010, has been the basis for the first network-based investigation of C. difficile transmission at the ward level, indicating that only ~25% of new cases had plausible nearby symptomatic ‘donors’. This implies the existence of alternative (non-hospital) routes of transmission, to be investigated in a new study. Approximately 2200 samples chosen based on time and place from Oxford and Leeds NHS Trusts are in the Illumina sequencing and analysis pipeline, and a further ~500 are planned this year. Initial findings (presented at the WTSI, Sept 2010) confirm that WGS is unrivalled in transmission investigation.
Considerable effort has been placed improving the heuristic transmission analyses, using formal statistical models fitted with MCMC algorithms (presented at the Hierarchical Models and Markov Chain Monte Carlo workshop, June 2011). This approach has the advantage of estimating probabilities several different plausible donors, and allowing direct estimation of epidemiological model parameters, and can also be more naturally extended to include whole genomic sequence data. A second major analytical development underway is the extension of the widely used “ClonalFrame”, by the senior UKCRC Research Scientist (Dr Xavier Didelot), to estimate the time-to-ancestor of highly similar genomes and thereby provide date estimates for putative transmission events. Analysis of an initial set of 486 whole C. difficile genomes has recently been completed (paper submitted).
S. aureus
To date 3600 samples have been collected from in Oxford and 500 in Brighton, via REC-approved studies (a cohort carriage study in GP attendees, and studies of invasive disease and transmission in hospitals). An adult ITU transmission study in Oxford has been expanded using NIHR Flexibility and Sustainability Funding (FSF), to Trauma and Gerontology wards. All Oxford S. aureus isolates are initially typed by Spa, data that can be linked to IORD for epidemiological analysis, using automated systems, optimised to detect mixed carriage (~5000 S. aureus samples spa-typed to date). Approximately 1365 samples from retrospective and prospective collections in Oxford and Brighton have been Illumina sequenced. Automated new whole genome analysis methods are being developed in association with senior UKCRC Research Scientist (Dr Xavier Didelot), building on ClonalFrame for statistically robust phylogenetic reconstruction of evolution with recombination and mutation allowing for variable sampling times.
M. tuberculosis (Mtb)
450 Mtb samples collected in the Midlands by the HPA have been selected for WGS and so far 400 have been sequenced or sent to the Wellcome Trust Sanger Institute (WTSI) for sequencing. Long-term this part of the project will sequence ~4500 genomes from the ~13500 available in the reference laboratory archives. Variable Number Tandem Repeats (VNTR), person and some location-based data is available for the majority of samples, and is being collated. As WGS data becomes available it is being used to investigate aspects of the biology, epidemiology and local transmission of Mtb. Protocols are currently being optimised to improve DNA extraction from these samples which are more complicated than the other pathogens in MMM because of their slow growth and infection risk.
Norovirus
180 Norovirus stool samples have been collected from ORH outbreaks and from the Southampton reference laboratory and an Ocean Liner outbreak. C. difficile study samples have ethics approval for use in Norovirus research and further samples from the Southampton Norovirus testing laboratory are aalso vailable. Data collection by wireless hand-held device was piloted in all outbreaks since the start of the project, in conjunction with the Infection Control teams. Methodology for extraction, purification and amplification of Norovirus has been optimised. WGS has been done by Sanger sequencing in 20-25 samples and by Roche 454 in 20 samples. Due to technical difficulties with 454, Sanger sequencing is being used while Illumina protocols are developed.

