Skip to content

New Input Generator Framework in Avogadro 2

Avogadro 1.x had quite a large number of input generators that came from very humble beginnings. They were designed to be easy to write, and to give a simple path from a structure in Avogadro to something that could be used as an input file in one of many codes. Our basic approach was to add a C++ class per program we targeted, with one or two special cases. This meant that to develop an input generator it was necessary to learn some of the Avogadro API, and to at least compile a plugin (matching our compiler, Qt, library versions, etc). It also led to minor differences between the different input generators, and a lot of copying/pasting of boilerplate code.

Avogadro 2 showing an ethane molecule

When developing the input generators for Avogadro 2 as part of the Open Chemistry project we wanted to make it easier to add new generators. We put a lot of thought into how to make this possible, and how to maintain a native look and feel without necessarily making an input generator developer learn C++, Qt, Avogadro and everything that goes along with setting up a development environment. The new input generator framework is largely language agnostic, with a minimum of assumptions. It currently executes the Python interpreter, but that is largely an artifact of the fact we have only developed input generators using Python.

Avogadro 2 NWChem input generator with syntax highlighting

The input generators are executed in a separate process, using several passes to get the display name, options supported, syntax highlighting rules and finally to actually generate the input. The current pass is communicated using command-line arguments, input is passed to the program using standard input and formatted as JSON. The results should be passed back using the standard output stream, and depending on the pass should be JSON results or the actual input file. We also do some post-processing of the input file where the molecular geometry can be inserted following the specified format. This command line API is documented here. The NWChem input generator is the first to add syntax highlighting in an external plugin, the GAMESS input generator shows an approach using C++ ported from Avogadro 1.x.

This approach assures that an input generator cannot possibly crash or hang the interface, licensing is not an issue (separate execution process) and gives input generator developers the freedom to concentrate on turning options into the appropriate input file without worrying about the details of the application it is being used in. With relatively minor modifications Avogadro 2 could look for other file extensions and execute the appropriate interpreter, or simply execute the programs found in a given path. These files can be modified directly, if options change it is currently necessary to restart Avogadro, but if the input generation changes those changes would be reflected in Avogadro the next time the generator was run. Menu entries are added dynamically at program start up, and this concept could be extended to more of Avogadro. The main for the NWChem input generator is shown below,

if <u>_name_</u> == "<u>_main_</u>":
  parser = argparse.ArgumentParser('Generate a NWChem input file.')
  parser.add_argument('--debug', action='store_true')
  parser.add_argument('--print-options', action='store_true')
  parser.add_argument('--generate-input', action='store_true')
  parser.add_argument('--display-name', action='store_true')
  args = vars(parser.parse_args())

  debug = args['debug']

  if args['display_name']:
  if args['print_options']:
  elif args['generate_input']:

A snippet of the input generation code is shown below, where a variable is populated with what will be the raw input passed to the code.

def generateInputFile(opts):
  # Extract options:
  title = opts['Title']
  calculate = opts['Calculation Type']
  theory = opts['Theory']
  basis = opts['Basis']
  multiplicity = opts['Multiplicity']
  charge = opts['Charge']
  # Preamble
  nwfile = ""
  nwfile += "echo\n\n"
  nwfile += "start molecule\n\n"
  nwfile += "title \"%s\"\n"%title
  # Coordinates
  nwfile += "geometry units angstroms print xyz autosym\n"
  nwfile += "$$coords:Sxyz$$\n"
  nwfile += "end\n\n"
  # More stuff here...
  return nwfile

We hope that this framework will make it much easier for researchers to customize their input generator scripts to their needs, and we would welcome your feedback on how we could make it even easier. If there are other languages of interest we could add examples, the major requirement is that the language can create a self-contained script or executable that can use standard in/out, has some string handling capabilities and support for JSON.

Using VTK's Image Regression Tests in Avogadro 2

One of the really nice features of VTK's testing framework is the use of image-based regression tests. These allow developers to write tests that result in a final image, which can be recorded and compared to known baseline images in order to verify that the OpenGL rendering code is rendering the same (or similar) image on all platforms. If this fails then CDash will display the image the test produced, the baseline image it was compared to and an image difference. Any project that performs rendering or visualization needs tests like these in addition to unit tests if they want to assure visualization code continues to function as expected across a range of platforms.

We recently extracted the relevant code from the VTK testing framework to perform image based regressions in Avogadro 2, with the bulk of that code living in utilities/vtktesting/imageregressiontest.h. This is currently used in one of the tests, with plans to extend it to cover all major types of rendering, this can in seen in action intests/qtopengl/glwidgettest.cpp with the important lines that take the snapshot/do the image comparison being:

  // Grab the frame buffer of the GLWidget and save it to a QImage.
  QImage image = widget.grabFrameBuffer(false);
  // Set up the image regression test.
  Avogadro::VtkTesting::ImageRegressionTest test(argc, argv);
  // Do the image threshold test, printing output to the std::cout.
  return test.imageThresholdTest(image, std::cout);

The CMake code that feeds in the command line arguments, and ensures the test runs correctly is in tests/qtopengl/CMakeLists.txt, and largely involves passing in paths to the baseline directory, a temporary directory and the test name (using the standard CMake generated test driver).

  add_test(NAME "QtOpenGL-${test}"
      AvogadroQtOpenGLTests "${testname}test"
      "--baseline" "${AVOGADRO_DATA_ROOT}/baselines/avogadro/qtopengl"
      "--temporary" "${PROJECT_BINARY_DIR}/Testing/Temporary")
Valid baseline image

The above is the baseline image, that is stored in a known location and compared with the image produced by the test (shown below).

Test image image produced

If the images don't match a difference image is produced and uploaded (shown below). In this case you can see that an extra sphere was rendered, and this can clearly be seen in the difference image. There is also a numerical difference returned by the test, which is a measure of how much the images differ. The tolerance can be tweaked depending on the test to allow some minor pixel differences, although care must be taken not to raise the number too high.

Image difference from test to valid

We have not implemented it in Avogadro 2 yet, but VTK can use multiple baselines and returns the smallest image difference. This allows for OS/GPU specific baselines to be uploaded where necessary as an alternative to increasing the tolerance. Using special tags returned by the tests in the standard output will prompt the ctest command to upload the image files when necessary (in the case the baseline image cannot be found, or the image comparison fails).

First Open Chemistry Beta Release

Open Chemistry

We are pleased to announce the first beta release of the Open Chemistry suite of cross platform, open-source, BSD-licensed tools and libraries - Avogadro 2, MoleQueue and MongoChem. They are being released in beta, before all planned features are complete, to get feedback from the community following the open-source mantra of “release early, release often”. We will be making regular releases over the coming months, as well as automatically generating nightly binaries. A Source article from 2011 introduced the project, slides from FOSDEM describe it more recently, and the 0.5.0 release binaries can be downloaded here.

Open Chemistry workflow

These three desktop applications can each be used independently, but also have the capability of working together. Avogadro 2 is a rewrite of Avogadro that addresses many of the limitations we saw. This includes things such as the rendering code, scalability, scriptability, and increased flexibility, enabling us to effectively address the current and upcoming challenges in computational chemistry and related fields. MoleQueue provides desktop services for executing standalone programs both locally and on remote batch schedulers, such as Sun Grid Engine, PBS and SLURM. MongoChem provides chemically-aware search, storage, and informatics visualization using MongoDB and VTK.

Open Chemistry library organization

Avogadro 2

Avogadro 2 is a rewrite of Avogadro; please see the recently-published paper for more details on Avogadro 1. Avogadro has been very successful over the years, and we would like to thank all of our contributors and supporters, including the core development team: Geoff Hutchison, Donald Curtis, David Lonie, Tim Vandermeersch, Benoit Jacob, Carsten Niehaus, and Marcus Hanwell. We also recently obtained permission from almost all authors to relicense the existing code under the 3-clause BSD license, which will make migration of code to the new architecture much easier.

Avogadro 2 rendering a molecular orbital

Some notable new features of Avogadro 2 include:

  • Scalable data structures capable of addressing the needs of large molecular systems.
  • A flexible file I/O API supporting seamless addition of formats at runtime.
  • A Python-based input generator API, creating an input for a range of quantum codes.
  • A specialized scene graph for supporting scalable molecular rendering.
  • OpenGL 2.1/GLSL based rendering, employing point sprites, VBOs, etc.
  • Unit tests for core classes, with ongoing work to improve coverage.
  • Binary installers generated nightly.
  • Use of MoleQueue to run computational codes such as NWChem, MOPAC, GAMESS, etc.

Avogadro is not yet feature complete, but we invite you to try it out along with the suite of applications as we continue to improve it. The new Avogadro libraries feature much finer granularity; whereas before we provided a single library with all API, there is now a layered API in multiple libraries. The Core and IO libraries have minimal dependencies, with the rendering library adding a dependence on OpenGL, and the Qt libraries adding Qt 4 dependencies. This allows us to reuse the code in many more places than was possible before, with rendering possible on a server without Qt/X, and the Core/IO libraries being suitable for command line use or integration into non-graphical applications.


MoleQueue is a new application developed to satisfy the need to execute computational chemistry codes locally and remotely. Rather than adding this functionality directory to Avogadro 2, it has been developed as a standalone system-tray resident application that runs a graphical application and a local server (using local sockets for communication). It supports the configuration of multiple queues (local and remote), each containing one-or-more programs to be executed. Applications communicate with MoleQueue using JSON-RPC 2.0 over a local socket, and receive updates as the job state changes. A recent Source article describes MoleQueue in more detail.

MoleQueue queue configuration

In addition to the system-tray resident application, MoleQueue provides a Qt 4-based client library that can easily be integrated into Qt applications, providing a familiar signal-slot based API for job submission, monitoring, and retrieval. The project has remained general in its approach, containing no chemistry specific API, and has already been used by several other projects at Kitware in different application domains. Communicating with the MoleQueue server from other languages is quite simple, with the client code having minimal requirements for connecting to a named local socket and constructing JSON strings conforming to the JSON-RPC 2.0 specification.


MongoChem is another new application developed as part of the Open Chemistry suite of tools, leveraging MongoDB, VTK, and AvogadroLibs to provide chemical informatics on the desktop. It seeks to address the need for researchers and groups to be able to effectively store, index, search and retrieve relevant chemical data. It supports the use of a central database server where all data can be housed, and enables the significant feature set of MongoDB to be leveraged, such as sharding, replication and efficient storage of large data files. We have been able to reuse several powerful cheminformatics libraries such as Open Babel and Chemkit to generate identifiers, molecular fingerprints and other artifacts as well as developing out features in the Avogadro libraries to support approaches to large datasets involving many files.


We have taken advantage of the charts developed in VTK and 2D chemical structure depiction in Open Babel to deliver immersive charts that are capable of displaying multiple dimensions of the data. Linked selection allows for selection in one view, such as parallel coordinate; views of that selection in a scatter plot matrix, and the table view. The detail dialog for a given molecule shows 2D structure depiction, an interactive 3D visualization when geometry is available and support for tagging and/or annotation. We have also developed an early preview of a web interface to the same data using ParaViewWeb, enabling you to share data more widely if desired. This also features a 3D interactive view using the ParaViewWeb image streaming technology which works in almost all modern browsers.

Putting Them Together

Each of the applications in the Open Chemistry suite listens for connections on a named local socket, and provides a simple JSON-RPC 2.0 based API. Avogadro 2 is capable of generating input files for several computational chemistry codes, including GAMESS and NWChem, and can use MoleQueue to execute these programs and keep track of the job states. Avogadro 2 can also query MongoChem for similar molecules to the one currently displayed, and see a listing sorted by similarity. MongoChem is capable of searching large collections of molecules, and can use the RPC API to open any selected molecule in the active Avogadro 2 session.


The development of the Open Chemistry workbench has been funded by a US Army SBIR with the Engineering Research Development Center under contract (W912HZ-12-C-0005) at Kitware, Inc.

Originally published on the Kitware blog

FOSDEM: Open Science and Open Chemistry

I will be talking about the Open Chemistry Project at FOSDEM this year in the FOSS for scientists devroom at 12:30pm on Saturday. I will discuss the development of a suite of tools for computational chemists and related disciplines, which includes the development of three desktop applications addressing 3D molecular structure editing, input preparation, output analysis, cheminformatics and integration with high-performance computing resources.

Open Chemistry

On Sunday Bill Hoffman will be speaking in the main track about Open Science, Open Software, and Reproducible Code at 3pm on Sunday. Bill and Alexander Neundorf will also be talking about Modern CMake in the cross desktop devroom on Saturday.

FOSDEM is one of the first conferences I attended (possibly the first, I can't remember if I went to a science conference before this). It will be great to return after so many years, and hopefully meet old colleagues and a few new ones. Please find me, Bill or Alex if you would like to discuss any of this work with us. I fly out tomorrow, and hope to get over jet lag quickly. Once FOSDEM is over we will be visiting Kitware SAS in Lyon, France for a couple of days (this is my first trip to our new office).

Then I have a few days in England visiting friends and family before heading back to the US.

Avogadro Paper Published Open Access

In January of last year I was invited to attend the Semantic Physical Science Workshop in Cambridge, England. That was a great meeting where I met like-minded scientists and developers working on adding semantic structure to data in the physical sciences. Peter managed to bring together a varied group with many backgrounds, and so the discussions were especially useful. I was there to think about how our work with Avogadro, and the wider Open Chemistry project might benefit from and contribute to this area.

Avogadro graphical abstract

My thanks go out to Peter Murray-Rust for inviting me to the Semantic Physical Science meeting and helping us to get the Avogadro paper published in the Journal of Cheminformatics as part of the Semantic Physical Science collection. Noel O'Boyle wrote up a blog post summarizing the Avogadro paper accesses in the first month (shown below - thanks Noel) compared to the Blue Obelisk paper and the Open Babel paper. We only just got the final version of the PDF/HTML published in early January, but already have 12 citations according to Google scholar, showing as the second most viewed article in the last 30 days, and the most viewed article in the last year. The paper made the Chemistry Central most accessed articles list in October and November.

I made a guest blog post talking about open access and the Avogadro paper, which was later republished for a different audience. I would like to thank Geoffrey Hutchison, Donald Curtis, David Lonie, Tim Vandermeersch and Eva Zurek for the work they put into the article, along with our contributors, collaborators and the users of Avogadro. If you use Avogadro in your work please cite our paper, and get in touch to let us know what you are doing with it. As we develop the next generation of Avogadro we would appreciate your input, feedback and suggestions on how we can make it more useful to the wider community.

Open Chemistry, VTK and ParaViewWeb

Last year David Lonie, now a new Kitware employee, worked on a Google Summer of Code project to add better support for chemical structure visualization to VTK. More recently, Kyle Lutz added representations to ParaView to expose some of this new functionality for ParaView users. Once that was in place we were able to work with Sébastien Jourdain to expose this functionality in ParaViewWeb and expose parts of the MongoDB database we have been working on as part of the Open Chemistry project. You can checkout the live demo here, or take a look at the screen shot below.

ParaVIewWeb and Open Chemistry live demo

It was up and running within a day, and in another day we had a query page and summaries exposed in ParaViewWeb with some simple queries. ChemData exposes more complex searches and 2D visualizations of the data contained. The 2D images are created using Open Babel's SVG rendering, and saved to the database as PNGs for speed and the 3D structure is rendered using ParaViewWeb and image based delivery right now. You can interact with the 3D geometry both inline, or full screen. We will be extending this to show electronic structure and adding other features in the near future too.

Open Science, Open Access and Open Source

I have been thinking this over for quite a while, and have written this post several times over in my mind. As an undergraduate student I remember admiring scientists and imagining how amazing it must be to have a job where you got to discover new things, think of better solutions to problems facing our society and making the world a better place. As my studies continued I aspired to become one of those researchers, and made the decision to take my studies further and applied to do a PhD.

As a PhD student I enjoyed learning more about materials, and was excited to be working with gold nanoparticles and research into how we might make real devices out of this novel new material in the Nanomaterial Engineering Group. It was exciting, challenging and fascinating using techniques such as X-ray and neutron reflectometry, electron and atomic force microscopy and Langmuir-Blodgett troughs. As I learned more through my work I became frustrated with the quality of the software I used, and had always imagined that "real scientists" had better tools available to them. It became even more frustrating when I realized how bad some of the instrument control software was, and how so many of the file formats could only be used in one or two expensive and hard to use programs that only worked on one or two platforms.

Towards the end of my PhD I decided I would like to take some action. I had been trying to draw and render images of molecular structures, and wanted a way to do simple geometry optimizations for posters, papers and web pages. At first I tried to do some of this using an existing commercial package, but it only worked on Windows and we only had one license for the department. The training provided to me as a researcher in areas such as programming and analysis were disappointing and all too often generic tools such as Word, Powerpoint and Excel were the most viable choice for preparing, analyzing and presenting our work. I began writing more software, but much of it was written from scratch with little guidance. As I searched for a better way I came across some open source libraries and tools.

I found a program run by Google called "Summer of Code" where they offered me the opportunity to "flip bits not burgers". I was extremely lucky to find an idea on KDE's idea page for a molecule editor in Kalzium. I was very excited, and had been using KDE for many years. This was a pivotal moment for me, where my life and career took a twist I never expected into the world of open science - and I have loved every minute of it.

It was through that work that I became involved in the Avogadro project, and later Open Babel and met Geoff who later that year offered me a position in his new research group. This was an exciting opportunity as not only did we share a passion for correlating experimental and computational techniques, Geoff was also very active in open chemistry. After I moved out to Pittsburgh Geoff introduced me to the Blue Obelisk, and I now proudly count myself as one of their un-members. We published an open access paper on the Blue Obelisk five years on last year.

After a two year postdoctoral position with Geoff, who was extremely supportive of my work in open chemistry, I met Bill Hoffman from Kitware. I knew that Kitware developed CMake, but beyond that was not really aware of what they did. It turned out that they were involved in much more than just CMake, with open source tools and frameworks such as VTK, ParaView, ITK, CDash and more. They had been working on open scientific software for over a decade, and they were hiring! They weren't just making applications either, they were tackling the whole problem including development, testing and validation of open-source, cross-platform applications and frameworks.

After accepting a position with Kitware in 2009 one thing I never really appreciated was just how poor access is to publicly funded research. I can no longer access scientific papers I and others wrote, that were funded with tax payer money from both the UK and the US! I think that is terrible, and later realized I had become part of the scholarly poor, Peter wrote a follow up detailing the plight of those of us in industry. There is currently raging debate on open access, and campaigns such as The Cost of Knowledge need our support. The products of publicly funded research should be available to all, whether they are in academia, industry, government or anywhere else.

There are too many black boxes in science today, too much published work that is not available to all or reproduced by others. Mathematics used to be the language of science, but more and more it is computer software that is needed to learn more, and too much of this code is closed, unpublished and poorly shared. Papers must include mathematical proofs, or refer to proofs already published, but it is common to see work published that used closed, proprietary package X to conduct a simulation. This is changing, and Scientific American recently published an article on how "Secret Computer Code Threatens Science". Science also published an article about "Shining Light into Black Boxes", detailing the growing problem of witheld source code preventing meaningful peer review and reproducibility of research.

Michael Nielsen published a book called "Reinventing Discovery" that talks about the value of networked science, and is well worth a read if you have not yet had a chance. The Panton Principles outline the need to make scientific data open, and the Science Code Manifesto calls for openly available code in science. The core goals of the Blue Obelisk are open data, open standards and open source. I think for science to progress we must embrace openness, and sharing and resist the urge to hoard data building up small empires on proprietary code and data.

One thing I hope to see come from all of the controversy of the Research Works Act is a clarification that publicly funded research should be available to all, whether you think they will understand it or not. Scientists need to get better at communicating with the general public, and being more transparent about how research is done. I think open science will give us a chance to increase public engagement in science, which seems to be a growing problem in an age where we can all access the internet and a wealth of knowledge available on it.

I think that we need to figure out sustainable ways to fund the development of open software platforms to enable the next generation of researchers to push back the frontiers of science. We need to remember that we are publishing to share the results of (often publicly funded) research, and so we should be using liberal licenses such as CC-BY, CC0 that allow reuse and further analysis. We also need liberally licensed software that allow those same things, with simple licenses such as BSD and Apache 2.0. These libraries should contain well-tested implementations of data structures, algorithms and best structures, along with training for researchers to help them take advantage of these resources. If there is a better way to do something, contributions and integration should be encouraged as is the case in most open source communities.

Our Open Chemistry project recently got Phase II SBIR funding, and I am very excited to be leading that work at Kitware. It is part of a collaborative, open effort to improve the tools and frameworks available in the area leveraging new software processes to enable wider community involvement.

Leap Day: Never Enough Time

What a busy year it has been so far, a leap day hardly seems enough to help me catch up! I started off the year with a meeting in Cambridge, England on Semantic Physical Science which was hosted by Peter Murray-Rust. I ended up leading the working group on CML and the developing a roadmap to move forward. Peter blogged about this on my birthday (by chance) and you can see the video of my summing up of the results, along with all the other videos from the final day.

While I was back in England I took the opportunity to visit friends and family, along with a day trip to Liverpool to meet with Abbie and Jens. While I was there we discussed some plans around alternate inputs for Avogadro for an upcoming MP visit at the end of January. I found some time to blog about that on the Kitware blog, and Abbie wrote up the visit on their site. I think engaging more people in chemistry is important, and whilst I don't think the interaction is ideal at the moment I was pleased to see them enjoying it. The Kinect is something that many groups can purchase, and if it helps engage a wider audience in science I think that is a great thing.

I am very excited about the work we are doing in Open Chemistry at Kitware. We have been bringing web sites and testing online, and have begun engaging more people in the development process. The official announcement of our Phase II funding went out in January too, and I set up an Open Chemistry group on Google+ if you would like to follow new developments there.

I am especially excited after meeting some people from EMSL at the Semantic Physical Science meeting in Cambridge about the possibilities of working with NWChem more in the future. The open source license they switched to last year is of a very similar liberal nature to that of many of the open source projects we work on at Kitware. There are a large array of techniques available in NWChem, and interest in correlating computational and experimental observables.

We have also been extending Gerrit to support topic branch reviews, and switched VTK to use it for all code submissions. You can see proposed topics and they will trigger automatic build tests using CDash@Home for members of the core group. The Open Chemistry projects are also using the same Gerrit server for code review, and I am adding automated build testing of topics as I find time (any more leap days would help).

As my extra day draws to a close I realize there is still so much more I should get down. I will aim for more discipline in adding more regular entries here, you can follow my Google+ updates if you would like more updates on open source, open science and the life of a scientific software developer.

Conferences: Talking Open Science at OSCON, Desktop Summit and Chemical Databases Meeting

Over the last two months I have had one of my most hectic travel schedules ever. It started withOSCON, and a panel discussion about "Practicing Open Science". This one was a bit of a surprise, as Bill Hoffman was originally presenting with Will Schroeder and Brian Wylie, from Sandia National Laboratories. As Bill couldn't make it we decided to change the content of my section, and talk about the new open chemistry area that I have been working on for about four years now. Will went first, followed by me and a wrap up from Brian, with a nice flow between Kitware working on open science for over a decade, me growing a new area of open science (now at Kitware) and Brian giving a government perspective on open source and open science. The slides are below and on slideshare if you would like to take a look.

I thoroughly enjoyed OSCON, and would love to attend future events. The toughest thing was deciding which talks to attend as there were often multiple tracks with talks of interest to me. This was also by far the largest and most commercialized open source event I have attended so far, in the beautiful city of Portland, OR. I couldn't stick around for long after the conference as I was flying out to England on the following Tuesday, and on to Berlin, Germany Friday to attend the Desktop Summit. This was my first time in Germany, and I was looking forward to exploring Berlin a little, along with some time to catch up with a few family and friends in England before and after the conference. I talked about "Open Source Visualization of Scientific Data" on the final day of the main conference, and was very pleased to have a large and interested audience. Here I also discussed my work in open chemistry, along with a lot of the other work we do at Kitware in the Scientific Computing group.

I stayed for the remainder of the conference, attending my first KDE e.V. meeting, and was joined by Bill Hoffman towards the end of the week. Bill gave a workshop on using CMake, and I helped out with that, along with taking part in several BoF sessions and meetings. It was a very hectic week, very different feel to OSCON with a lot of great presentations, BoFs and hacking sessions. I also had the opportunity to meet up with Alexander Neundorf who was an intern at Kitware for half a year, and several other KDE developers interested in build systems, software process, testing, coverage and related areas.

Then I was back home for just over a week before braving the elements and heading straight for the path of hurricane Irene. I was invited to the 5th Meeting on U.S. Government Chemical Databases and Open Chemistry where I talked about "Chemical Databases and Open Chemistry on the Desktop". This meeting was very focused on chemical databases and the open chemistry I have been working on so hard for the last few years. It was a great experience to be able to see what others are working on, and discuss possible points for future collaboration. There is some amazing work happening in this area, and this meeting helped me gain greater clarity on how my work at Kitware can fit into the larger picture to significantly improve the landscape in open chemistry.

Thanks to Kitware for allowing me to attend, and funding my travel/other expenses, and to my wife and son for tolerating my long absences over the last couple of months. An even bigger thank you to my wife, Louise, for letting me off the hook on my first missed wedding anniversary so that I could present at OSCON! I had some great news about funding for the continued development of many of the ideas discussed in the slides, and so hope to have much more to talk about over the coming months (and years). This post is already pretty long, I hope to continue developing this work and promoting open science, especially in chemistry, materials science, physics and the bio areas. There are lots of other amazing people working in these areas too, and I feel like we are getting to a point where we can create real change to improve the outlook in scientific research.

Talking About Open Source Visualization of Scientific Data at the Desktop Summit

I have begun my journey to the Desktop Summit, making the flight over from the US to Manchester yesterday. A short stay in Sheffield, and catch up with family before heading out to get my flight to Berlin tomorrow. I will be talking about the work I have done both at and before joining Kitware with the title "Open Source Visualization of Scientific Data". I plan to talk about a range of work from my Google Summer of Code project on Kalzium back in 2007, through to some of the exciting work at Kitware in VTK, ParaView and Titan looking at the challenges of large data, remote visualization and how to integrate the web and smartphones/tablets into the scientific data visualization workflow.

Desktop Summit 2011

Bill Hoffman is also planning to attend, and we will be running a workshop introducing CMake on Thursday. This is my first Desktop Summit, although Bill and I have both attended previous aKademy and Camp KDE meetings. I should be in on time to attend the pre-registration event, and will not be leaving until Saturday. Looking forward to a great summit, catching up with some old friends and making some new ones. Now, I think I should try to get some sleep before my flight tomorrow!

Talking at OSCON 2011 about Open Science

I am currently on a plane bound for Portland, Oregon enjoying the in-plane wi-fi. Will Schroeder, Brian Wylie and I will be talking about "Practicing Open Science" on Friday in the government track. I am standing in for Bill Hoffman who unfortunately could not make it, and will be discussing the work I have been doing to grow open chemistry both at Kitware and outside of Kitware with many amazing collaborators scattered around the world. I am really excited to have the opportunity to talk at OSCON, and would be happy to meet up and discuss this work if you are at OSCON. Will and Brian are both very passionate about open science too, they will both give their unique perspectives on practicing open science. I will be there from this evening and don't fly out until early Saturday morning.

OSCON 2011

I am very much looking forward to OSCON, and the major difficulty I have had is choosing between the talks that are all happening at the same time. In some cases there are two or three I would like to see in any given slot. I am hoping to attend the KDE release party tomorrow too, please join us there if you would like to celebrate with us.

Avogadro 1.0.3 Released

I am pleased to be able to announce the availability of Avogadro 1.0.3! What happened to Avogadro 1.0.2 I hear you ask...shortly after tagging Michael reported an issue with i18n building/installations. So 1.0.3 contains a couple of very small build system fixes, but see the 1.0.2 release notes for details of most of the fixes.

As always, we appreciate your feedback. There are still a few issues outstanding, but many things were fixed. These binaries are also built against much newer versions of Qt and Open Babel where significant improvements have also been made. There may be one or two more releases of the 1.0 line if necessary (I have streamlined the release process with a view to making more releases), but I would like to focus our efforts on an unstable release for 1.1. Once 1.1 is stable, a 1.2.0 release will be cut and branched. There are lots of new features in master that we would love more feedback on.

Blue Obelisk Award

At the recent ACS Spring meeting I attended the Blue Obelisk dinner, where I was honored to receive a Blue Obelisk award, pictured below, for my contributions to Open Data, Open Standards and Open Source. This is largely due to the work I have done on Avogadro, Open Babel and other open source chemistry tools.

Blue Obelisk award

This was one of the biggest dinners I have had the opportunity to attend, and I got to meet many of the people I have worked with (or used their work), along with several people I had not had the opportunity to work with yet, but hope to in the future. We presented the work we had been doing on Quixote project at the chemical information symposium on chemistry and the internet, after attending the first Quixote meeting the previous week (thank you to Hartree Centre for inviting me to speak there, and sponsoring the event).

These are exciting times, thank you very much to Peter Murray-Rust for presenting me with the award, and all of the support he has shown, along with his relentless passion for open science. I have only been a part of this for a few years, but Peter has been working on opening up chemistry for decades now.

CMake External Projects: Building Project Dependencies

Historically projects have attempted to minimize their dependency list, and often bundle in small third party libraries in an attempt to make things easier for new developers/users to compile their code. In the Avogadro project we have bundled a few really small libraries, but on the whole have maintained a dependency list and tried to keep it smaller. As I work on new code, I see opportunities to break off bits of functionality, such as with OpenQube, but don't want to add yet another thing a new user or developer must download, compile and install somewhere.

Linux packagers, myself included, dislike the practice of bundling in libraries. It means that instead of patching one libxml2, we get to patch one plus the three or four in our tree that have been bundled (often with different version, some local patches). The problem is less pronounced on Linux where package managers are ubiquitous and we are able to provide a list of packages to install, but even there we might be developing against versions not yet in the main distribution repository. This is one of the reasons I have always favored rolling release distributions over the periodic.

CMake's external project module helps us to deal with this issue in quite an elegant fashion. Coupled with meta repositories to bring several source trees together, CMake is able to direct the build of several projects, passing locations between projects and expressing dependencies between the projects being built. This means that something like Open Babel can build zlib and libxml2 before building the main Open Babel library. External projects and CMake allow us to download the source, create the build trees and even direct the build of non-CMake based projects like libxml2.

I have a prototype of this that I just put up to build the core of Avogadro, its working name is Avogadro Squared as I was feeling geeky that day and had no good names. One thing you should note is that everything in there is an external project, and Avogadro is the last one to be built (it depends on all of the other projects). It requires minimal changes to the projects it contains, it uses git submodules for some of the source, and CMake's download and tar functionality for zlib and libxml2. I will be adding options to simply use system versions of the libraries it can build, but Linux distributions etc can continue using the Avogadro repository directly.

As a new developer or user I can checkout the meta repository, have git download the submodules and CMake download the source tarballs. I can then build the entire project, and then continue to work in the Avogadro subdirectory of the build tree after that. That build tree is almost identical to the one I would have ended up with had I not used the meta repository, except it points to the dependencies I just built. I can then use vim, and IDE or whatever I choose to work on the inner projects. This works across Linux, Mac and Windows to get new users and developers up and running very quickly while only loosely coupling the dependencies to the Avogadro project.

I have worked on other larger projects, such as Titan and ParaView that are using this approach to a greater or lesser extent. Titan can actually built Qt, Boost, VTK, protobuf, Trilinos and a host of other dependencies before building the Titan libraries and applications. I think Avogadro Squared is an example of just how minimal a meta repository can be, although I will be extending it with more dependencies it really is just a glue repository.

Volume Rendering in Avogadro

Since joining Kitware I have had limited spare time to work on Avogadro, and for various reasons my spare time has been more limited than usual too. Since the new year I have been able to start spending more time working on Avogadro, and open source chemistry in general, thanks to an SBIR phase I proposal that was funded last year with the US Army Corps of Engineers. This is exciting for a number of reasons, including the fact that I have the opportunity to prototype exciting new features for chemistry visualization, workflow and data management.

One of the new bits of work I have been doing is to use some of the advanced visualization techniques in VTK such as GPU accelerated volume rendering. Now the code is still pretty rough, and is more a proof of concept. I wrote a simple external Avogadro extension that links to and uses VTK to render the first volume found in the current Avogadro molecule. All of the parameters are currently fixed, I am hoping to get the time to add in more options along with some integration of the Avogadro rendered molecule in the VTK render window. You can view the code here, please bear in mind it is at a very early stage.

I have also been working on several other things such as splitting out the quantum calculation code from the Avogadro plugins, and putting it in a small library. I have called the library OpenQube, right now it only has the base functionality that was in Avogadro but I will be extending it with more features, regression tests and I am hoping due to the decoupled nature and liberal BSD license it will encourage wider collaboration in this field.

There is also the Quixote project which I am very excited about. Meaningfully storing the results of quantum calculations, annotating them and retrieving them within an open framework. This is a growing problem in todays world, and I am working on extensions to Avogadro to allow it to fully exploit the semantic chemical web. This includes some of the previous work to access the PDB and other public resources as well as private databases within groups and organizations.

I think this is going to be a very exciting year for Avogadro, and open source chemistry in general.