WiseScaffolder

Function

WiseScaffolder is a stand-alone semi-automatic application for genome scaffolding of pre-assembled contigs using mate-pair data. It also produces editable scaffold maps, allowing either to build gapped scaffolds or usable as a common thread for the manual improvement of scaffolds.

Description

WiseScaffolder includes 4 subcommands:

  • dumpconfig generates a configuration file that notably specifies the average insert size of the mate-pair library
  • preprocess allows the detection and correction of chimerae, the estimation of contigs copy number and produces valuable outputs for the manual improvement of scaffolds
  • scaffold constitutes the central scaffold-builder and comprises two modules: i) the interative_scaffold_extender, which works with big, unambiguous contigs, or when they run out, single copy contigs, and ii) the small_contig_inserter, which inserts the small contigs within scaffolds
  • buildfasta converts the scaffold(s) map(s) into Fasta sequences.

Classification

Category:

NGS > Scaffolding

User Interface:

Command line, GALAXY wrapper

Operating system:

Any (Python application)

Usage

The four abovementioned subcommands may be used sequentially as follows:

wisca.py (-p) (-d) (-h) dumpconfig --configout “wisca.conf” (-i 5000) (-b 5000)

→Output : An editable “wisca.conf” configuration file

wisca.py (-p) (-d) (-h) preprocess --configin “wisca.conf” -c “contigs.info” -m “reads_mapping.sam” (--dumpfiles)

→Outputs : chimerae resolution file “chimera.csv”, contig coverage/copy number file “coverage.csv”, additional files dedicated to chimera resolution and manual scaffolding

wisca.py (-p) (-d) (-h) scaffold --configin “wisca.conf” -c “contigs.info”-m “reads_mapping.sam” --scaffoldout “scaffolds_maps.txt” (-k “chimera.csv”) (-v “coverage.csv”)

→Output : An editable “scaffold_maps.txt” file

wisca.py (-p) (-d) (-h) buildfasta -f “contigs.fasta” –s “scaffolds_map.txt”-r “wisca_scaffolds” (-k chimera.csv)

→Output : A “wisca_scaffold” folder containing Fasta-formatted scaffolds

Command line arguments

X: parameter required to run a given subcommand
(X): optional parameter. In the case of “insertsize” and “bigcontigminimalsize”, it will take priority over the corresponding parameter in the configuration file.

Input file format

WiseScaffolder requires three inputs:

  • Contig info file : tabulated file specifying identifiers, coverage and length of contigs
  • Mate-pair mapping file either in SAM format or custom tabulated file
  • Multifasta of contigs

Outputs

WiseScaffolder produces the following outputs:

  • Configuration file
  • Chimerae resolution file
  • Contig coverage/copy number file
  • Outputs for manual scaffolding
    • Mate-pair insert size graph: showing the distribution of the mate-pair insert sizes, as determined using mate-pair reads mapping on the same contig
    • Global link map: a symmetric matrix providing for each contig the amount of mate-pair reads linking it to other contigs
    • Neighborhood link map: similar to the global link map but with an indication of the location of mate-pair reads on the contig (5' or 3' ends) and their orientation with regard to the contig
    • Linkage location map : a symmetric matrix providing for a given contig the average location of MPs linking it to each other contig
  • Scaffold maps
  • Scaffold fasta

Downloads

Application & Handbook

 

GALAXY wrapper

wrapper v1.0 (5.8 Ko)
 

Test dataset: Synechococcus sp. WH8103 assembly and subset of the mate-pair mapping data

Complementary scripts

 

Python & BioPython

Reference

Authors

Marine Phototrophic Prokaryotes (MaPP) Team (CNRS-UPMC - UMR7144): Gregory K. Farrant, Frédéric Partensky, Laurence Garczarek

ABiMS Platform (CNRS-UPMC - FR2424): Mark Hoebeke,  Gwendoline Andres, Erwan Corre

Please cite

Farrant, G.K., Hoebeke, M., Partensky, F., Andres, G., Corre, E. and Garczarek L., 2015. WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data, in revision for BMC Bioinformatics.