The Schneider Suite

A Graphically Driven Program and Application Programming Interface for Molecular Biology and Bioinformatic Sequence Analysis.

THE SCHNEIDER MANIFESTO

The motivation for Schneider is to make the tools of the Molecular Biologist and Bioinformatician available to as wide an audience as possible. Schneider will provide easy to use tools completely free of licensing fees. In light of the fact that similar commercial packages have greatly reduced feature sets, cost thousands of dollars, are subject to maintenance fees, and go out of date quickly, it is hoped that the Schneider model will help economically disadvantaged countries and communities address the particular biomedical research needs unique to their geographical regions.

Schneider aims to provide both an API and a stand-alone suite of programs for the graphical manipulation of biological sequences for the purposes of molecular biology and bioinformatics. The key philosophy of Schneider is to provide an intuitive working environment for biological researches who must perform rudimentary sequence manipulation and data management tasks. It is hoped that, by providing a friendly and intuitive environment for conducting these rudimentary tasks, familiarity with the Schneider environment will invite the researcher to explore higher level sequence manipulation and bioinformatic analyses. Thus, the focus will initially be to provide a solid core of common molecular biology tools and to expand these tools and integrate them with bioinformatic tools as the project matures.

Because molecular biologists have traditionally been somewhat alienated from computer programming and command line software, Schneider seeks to provide a world class graphical environment for sequence manipulation. Several GUI elements will contribute to this environment. An incomplete list is:

  1. A full featured and dynamic sequence editor that behaves appropriately depending on the type of sequence edited (e.g. RNA, DNA, protein). The editor will be able to search based on regular expressions and fuzzy search criteria.

  2. A full featured sequence mapping canvas dynamically integrated with the sequence editor

  3. A full featured sequence database that can represent hierarchical and inheritance relationships between sequences

  4. A full featured primer database that can represent relationships between primers and target sequences

  5. A full set of common calculators that can be summoned contextually. Such calculators will include but not be limited to

    1. A calculator for determining concentrations and volumes depending on absorbance data
    2. A primer calculator for analyzing primer stability and undesirable duplex formation
    3. A restriction digestion calculator dynamically integrated with the mapping canvas and sequence editor
  6. A sequence alignment utility that integrates with clustalw/clustalx if these free programs are installed on the user’s computer

  7. Graphical tools for creating searches for major repositories such as the PDB and genbank

  8. A graphically driven preference database

Schneider also aims to be an unrestricted Open Source package that can be run on any operating system and will require a minimum of hardware investment. Toward this aim, the primary language for development will be python. However, where needed, performance enhancements will be done in strict ANSI C/C++. Practically, then, Schneider will run on any computer capable of running python and gcc. Schneider will make use of the standard python distribution as much as possible. For example, Tkinter will be used for the user interface design. However, to provide complete and world class functionality, Schneider will require a number of additional open source libraries and programs. These include but may not be limited to python, Tkinter, PIL, biopython, SQLObject, clustalw/clustalx.

Fortunately, by using biopython and SQLobject, many technical problems have been already been addressed. Biopython contains ready-made file parsers and writers for a number of different and popular file formats, including PDB, fasta, genbank, and mmcif. Biopython also has interfaces for a number of different public databases such as the PDB, rebase, and genbank. In short, use of biopython will solve 80-90% of the difficult technical issues of this project, at least for what can reasonably be foreseen. Because we will be using biopython, its data structures will be used extensively in Schneider. Moreover, we will use biopython to read and write to common file formats. SQLobject will allow easy mapping between the data structures used by Schneider and a persistent relational database. The sequence editor has already been prototyped and is undergoing streamlining and optimization. Most technical difficulties we can foresee in the short term center around optimizing the sequence editor and the mapping canvas. We hope to provide a very responsive and dynamic environment for the user and we see these two elements as the cornerstones of that environment.

Several novel algorithms have already been authored as part of the Schneider Suite. These include an intensely fast algorithm for producing silent restriction sites and a module that calculates primer duplexes using the latest nearest neighbors. To our knowledge, this will be the first such publicly available implementation of this recent data for calculating stability of nucleic acid duplexes. Moreover, the graphical DNA calculator has already been prototyped.

In short, Schneider will be far superior to other molecular biology suites because no other package currently provides this powerful combination of planned features:

  1. Open Source & Free License
  2. GUI driven environment
  3. A complete set of molecular biology features utilizing the latest biophysical and biological data
  4. Full, dynamic integration between components
  5. Integration of sequence manipulation tools with bioinformatic tools
  6. Available on the widest possible set of software and hardware platforms
  7. Development guided by a career molecular biologist & biophysicist (James C. Stroud) and a professional GUI developer (Robert Cadena)
  8. Modular tools that can be easily integrated as GUI elements in other software

LICENSE FOR GREATER PARTS OF SCHNEIDER

Copyright © 2005 James C. Stroud.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA.

Where indicated, portions of The Schneider Suite are instead protected under the copyrights of the respective authors or sponsoring institutions. The license for the University of California will be used in some cases. It is open-source and included below. All code in The Schneider Suite is released as open-source software under Version 2 of the GNU General Public License provided below or a nearly equivalent license.

LICENSE FOR SEGMENTS OF CODE OWNED BY UCLA

Copyright © 2005 The Regents of the University of California. All Rights Reserved.

Permission to use, copy, modify, and distribute this software and its documentation for educational, research and non-profit purposes, without fee, and without a written agreement is hereby granted, provided that the above copyright notice, this paragraph and the following three paragraphs appear in all copies.

Permission to incorporate this software into proprietary products may be obtained by contacting the University of California through Lorelei de Larena, Office of Intellectual Property Administration, 10920 Wilshire Blvd., Suite 1200, Los Angeles, CA 90024-1406, 310-794-0558.

This software program and documentation are copyrighted by The Regents of the University of California. The software program and documentation are supplied "as is", without any accompanying services from The Regents. The Regents does not warrant that the operation of the program will be uninterrupted or error-free. The end-user understands that the program was developed for research purposes and is advised not to rely exclusively on the program for any reason.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.