Skip to content

Open Science, Open Access and Open Source

I have been thinking this over for quite a while, and have written this post several times over in my mind. As an undergraduate student I remember admiring scientists and imagining how amazing it must be to have a job where you got to discover new things, think of better solutions to problems facing our society and making the world a better place. As my studies continued I aspired to become one of those researchers, and made the decision to take my studies further and applied to do a PhD.

As a PhD student I enjoyed learning more about materials, and was excited to be working with gold nanoparticles and research into how we might make real devices out of this novel new material in the Nanomaterial Engineering Group. It was exciting, challenging and fascinating using techniques such as X-ray and neutron reflectometry, electron and atomic force microscopy and Langmuir-Blodgett troughs. As I learned more through my work I became frustrated with the quality of the software I used, and had always imagined that "real scientists" had better tools available to them. It became even more frustrating when I realized how bad some of the instrument control software was, and how so many of the file formats could only be used in one or two expensive and hard to use programs that only worked on one or two platforms.

Towards the end of my PhD I decided I would like to take some action. I had been trying to draw and render images of molecular structures, and wanted a way to do simple geometry optimizations for posters, papers and web pages. At first I tried to do some of this using an existing commercial package, but it only worked on Windows and we only had one license for the department. The training provided to me as a researcher in areas such as programming and analysis were disappointing and all too often generic tools such as Word, Powerpoint and Excel were the most viable choice for preparing, analyzing and presenting our work. I began writing more software, but much of it was written from scratch with little guidance. As I searched for a better way I came across some open source libraries and tools.

I found a program run by Google called "Summer of Code" where they offered me the opportunity to "flip bits not burgers". I was extremely lucky to find an idea on KDE's idea page for a molecule editor in Kalzium. I was very excited, and had been using KDE for many years. This was a pivotal moment for me, where my life and career took a twist I never expected into the world of open science - and I have loved every minute of it.

It was through that work that I became involved in the Avogadro project, and later Open Babel and met Geoff who later that year offered me a position in his new research group. This was an exciting opportunity as not only did we share a passion for correlating experimental and computational techniques, Geoff was also very active in open chemistry. After I moved out to Pittsburgh Geoff introduced me to the Blue Obelisk, and I now proudly count myself as one of their un-members. We published an open access paper on the Blue Obelisk five years on last year.

After a two year postdoctoral position with Geoff, who was extremely supportive of my work in open chemistry, I met Bill Hoffman from Kitware. I knew that Kitware developed CMake, but beyond that was not really aware of what they did. It turned out that they were involved in much more than just CMake, with open source tools and frameworks such as VTK, ParaView, ITK, CDash and more. They had been working on open scientific software for over a decade, and they were hiring! They weren't just making applications either, they were tackling the whole problem including development, testing and validation of open-source, cross-platform applications and frameworks.

After accepting a position with Kitware in 2009 one thing I never really appreciated was just how poor access is to publicly funded research. I can no longer access scientific papers I and others wrote, that were funded with tax payer money from both the UK and the US! I think that is terrible, and later realized I had become part of the scholarly poor, Peter wrote a follow up detailing the plight of those of us in industry. There is currently raging debate on open access, and campaigns such as The Cost of Knowledge need our support. The products of publicly funded research should be available to all, whether they are in academia, industry, government or anywhere else.

There are too many black boxes in science today, too much published work that is not available to all or reproduced by others. Mathematics used to be the language of science, but more and more it is computer software that is needed to learn more, and too much of this code is closed, unpublished and poorly shared. Papers must include mathematical proofs, or refer to proofs already published, but it is common to see work published that used closed, proprietary package X to conduct a simulation. This is changing, and Scientific American recently published an article on how "Secret Computer Code Threatens Science". Science also published an article about "Shining Light into Black Boxes", detailing the growing problem of witheld source code preventing meaningful peer review and reproducibility of research.

Michael Nielsen published a book called "Reinventing Discovery" that talks about the value of networked science, and is well worth a read if you have not yet had a chance. The Panton Principles outline the need to make scientific data open, and the Science Code Manifesto calls for openly available code in science. The core goals of the Blue Obelisk are open data, open standards and open source. I think for science to progress we must embrace openness, and sharing and resist the urge to hoard data building up small empires on proprietary code and data.

One thing I hope to see come from all of the controversy of the Research Works Act is a clarification that publicly funded research should be available to all, whether you think they will understand it or not. Scientists need to get better at communicating with the general public, and being more transparent about how research is done. I think open science will give us a chance to increase public engagement in science, which seems to be a growing problem in an age where we can all access the internet and a wealth of knowledge available on it.

I think that we need to figure out sustainable ways to fund the development of open software platforms to enable the next generation of researchers to push back the frontiers of science. We need to remember that we are publishing to share the results of (often publicly funded) research, and so we should be using liberal licenses such as CC-BY, CC0 that allow reuse and further analysis. We also need liberally licensed software that allow those same things, with simple licenses such as BSD and Apache 2.0. These libraries should contain well-tested implementations of data structures, algorithms and best structures, along with training for researchers to help them take advantage of these resources. If there is a better way to do something, contributions and integration should be encouraged as is the case in most open source communities.

Our Open Chemistry project recently got Phase II SBIR funding, and I am very excited to be leading that work at Kitware. It is part of a collaborative, open effort to improve the tools and frameworks available in the area leveraging new software processes to enable wider community involvement.

Another Post About Camp KDE 2010

There have been lots of posts about Camp KDE on Planet KDE, along with posts and a stream of photos on flickr. It has been a great event so far with some really interesting talks. I especially enjoyed Philip Bourne's talk on open access to data which is very close to my heart, but noted that many parts of the stack used are still closed source. My background in Physics and Chemistry tell me that this needs to change. Open access data without open source tools to create, store and view that data is only addressing one part of the problem. I hope to address other parts of this issue in the work I am doing at Kitware

Celeste's talk was also interesting, and I found out that I may be an OCD interface design guy (many of the points she outlined bugged me in projects I had worked on, especially consistency in interfaces, grammar, etc). Great talk, and illuminating for someone like me who has not worked with anyone in this field before. Then of course there was Till and Alexandra's talk on career opportunities in FOSS, which was a great talk and I found myself nodding along with them. My windy path was not quite so glamourous as rock star or opera singer, but I can certainly identify with them. I instead pursued a degree and a PhD in physics research (largely experimental too), only to find I was extremely passionate about developing software to edit and visualize the data, rather than spending months in the lab.

This is not even the end of the first day, and so you can tell it was a great conference. Jos talked to use about marketing and then Artur presented his take on KDE form the desktop to the pocket. I still really want my own N900 to experiment with taking scientific visualization to the pocket (I have the desktops, laptops and a netbook to play with already). The next morning began with Frank presenting his vision of open source in the cloud, I find myself using the cloud more and more (especially now I have a Droid), but share his concerns and wish to create AGP led alternatives that can be easily deployed by both companies and individuals.

I also really enjoyed Romain's talk on the state of KDE PIM/KDE Windows, with live demos (warts and all). It also nicely segwayed the need for automated testing in order to improve the quality of KDE on other platforms, as well as use our limited resources wisely. I presented my talk on CMake, CTest, CDash and improving the software process in KDE. I think the testing framework can really help KDE developers by providing continuous feedback about platforms not everyone has access to. There are already quite a few KDE projects on, and I would like to improve that and possibly use subprojects to divide the projects up into manageable pieces.

More great talks from Leo, and we ended the day with plasma talks and demos from Marco and Chani. I don't want to reproduce the schedule, but needless to say we had a great set of talks (all of which were taped and should be available soon). Thanks go out to Jeff and the ground team here for organizing the event so well. Monday was taken up with some more technical talks, Will's talk on the build service is something I would like to use in the future and see if we can get it contributing build/test results to KDE dashboards. The day concluded with CMake training run by me. I really enjoyed the dialog that was present in many of the talks (mine included), and got some great feedback about the training afterwards. I would love to do this again at future KDE events, and from the feedback I received it would seem others would like that to. It was very strange not to talk about any of the scientific visualization work I am doing, one of the first conferences in years where I have not.

Tuesday was the traditional trip day, and we checked out Stone Brewery, tried some excellent ales and then had dinner at one of the longest tables I have ever eaten at. William was of course in attendance, as the youngest attendee. After that we braved the driving rain and winds to get back to the UCSD campus. I took the opportunity to catch up on some work, and recharge my batteries a little ready for the Qt training that is being offered by Till Adam of KDAB today. Looking forward to a day of learning and admiring the sun this morning! The company has been great, and I am very pleased I was able to make it along. This is my first business trip for Kitware, and I am very pleased they sent me along, and that NAMIC sponsored my attendance.

Disclaimer: The opinions and musings in this post are mine, and not those of my employer. Any mistakes/inaccuracies are also mine, that said I would love to hear what people think of this new work.

Avogadro Auto Optimization Screencast

Geoff showed me a new screencast he created recently. It is made using the latest Avogadro, and is one of the first screencasts with our new and improved user interface. Geoff has also added some audio commentary with notes on the chemical relevance of the auto optimization tool. Check it out and let us know what you think - a new release of Avogadro is coming soon.

I will hopefully find the time to make a few new screencasts soon too. Between my one month old son, day job and waiting on my visa application (does not take any real time - some mental drain) I have not had much spare time to code or blog. Remember that Avogadro was nominated for the SourceForge community choice awards too - click on the link below to vote for us.

Avogadro Nominated for SourceForge Community Choice Awards

I am very pleased to announce that Avogadro has been nominated as a finalist in the SourceForge community choice awards this year. We are in the "Best Project for Academia" category, and I would like to encourage you to vote for Avogadro.

This is a real honour for all of us, and I appreciate all of you who nominated Avogadro. We are all pushing very hard on polishing Avogadro, getting ready for our 1.0 release. It would be absolutely amazing to see Avogadro win this award, so please vote for us.

Avogadro collage

There are also some other really nice projects in there too, such as Lancelot, ClamAV, phpMyAdmin and RepRap. So please take a few moments to place your vote, and tell your friends!

Update: You can vote even without a SourceForge account - just enter your email address and verify your vote.

Vote for Avogadro

I just got an email from Sourceforge about their community awards. If you are a user, fan or developer please vote for Avogadro in the Best Project for Academia category. They even provided me with a nice graphics to put on the page, you can just click on it to register your vote.

In other news lots of exciting things happening in Avogadro, hopefully I will find some time to blog about them soon!

Avogadro at the APS March Meeting and Q-Chem Workshop

So last week was extremely busy. The APS March Meeting was held in Pittsburgh and Q-Chem held a workshop on Q-Chem at the end of the week. I presented a poster on Avogadro (shown below), met lots of interesting people and got lots of new ideas for both research and Avogadro.

Avogadro poster

As we push towards making a 1.0 release of Avogadro, getting feedback from users in the scientific community is extremely important. As Q-Chem chose to use Avogadro as the builder/visualizer in their workshop I had the opportunity to observe new Avogadro users interact with our application for the first time. I also had the opportunity to help them overcome some initial issues and gained a few new insights.

I was very pleased to meet people at all stages of their career who were very interested in having an open source application that can provide a framework for building and visualizing molecules. I also realized that two of the most sought after features in Avogadro right now are the capability to easily make movies, and a z-matrix editor. People loved the ray-traced images of surfaces, coincidentally I received a request from someone in the press wanting to use an image I put up on my blog last year of ray-traced benzene molecules.

I look forward to hearing from some of the new users we gained in the last week. It is great to see Avogadro receiving more attention. I have started to work on the z-matrix editor and spent the weekend experimenting with movies - more to come soon!

ACS Avogadro Talk Slides and Poster

I kept meaning to put the slides to my Avogadro talk and the poster I presented at the recent ACS meeting in Philadelphia. Things have been really hectic these last few weeks but here they are. The talk was presented in the chemical eduction section, on Monday 18 August, "Avogadro: An integrated approach to teach computational chemistry modeling, simulation and visualization". The slides were made using LaTeX Beamer and the talk itself was focused on the use of Avogadro when teaching computational chemistry.

Avogadro poster presented at ACS meeting

I also presented a poster at Sci-Mix on Monday, 18 August, and at the main computational chemistry poster session on Tuesday, 19 August. You may have guessed already but I used LaTeX - this time the A0 poster package. The poster title was "Avogadro: A framework for quantum chemistry simulation and visualization". I really enjoyed the two poster sessions and met lots of interesting people during the sessions.

You can grab copies of the slides or poster by clicking on their titles. It was certainly a very interesting conference, although it was so big it was difficult to choose where to go and what to see at times. Especially as some of the hotels with talks I wanted to attend were thirty minutes apart on foot. It was a great opportunity to tell other scientists about the work we are doing as well as introducing some of the concepts of open source to the wide and varied list of attendees I had the pleasure of meeting.

ACS Meeting in Philadelphia: Two Talks and a Poster (Twice)

This was my first ACS meeting. It was relatively close, in Philadelphia, and so we decided to car share. Another postdoc on my floor was also going to the meeting and actually has a car. It was quite a long drive but we arrived on Saturday afternoon and checked into our hotels.

I gave my first talk, "Avogadro: An integrated approach to teach computational chemistry modeling, simulation and visualization", on Monday morning in the division of chemical education. It was an interesting track on the use of "computation, modeling and molecular visualization across the chemistry curriculum". I saw other talks from high school teachers to professors on things from the use of second life to WebMO and commercial packages such as CAChe.

On Monday evening I presented my poster, "Avogadro: A framework for quantum chemistry simulation and visualization", at the Sci-Mix event as part of the division of computers in chemistry. That went really well and I met lots of people who were very positive about Avogadro and the work I presented. I presented the poster again at the computers in chemistry poster session on Tuesday evening. I don't think I got a chance to stop either evening but really enjoyed talking to everyone about my work.

My final talk on the experimental work/computational work we have been doing, "Monolayer FETs: Metal terpyridine complexes as a model material", was at the end of quite a long session on Wednesday morning. This was in the division of inorganic chemistry, in the "Nanoscience: Characerization and applications" session. After all this I was feeling pretty exhausted!

In between talks and posters I managed to attend quite a few other sessions and visited the expo several times. I met the people at Asylum Research who make some amazing AFM hardware and software - I want one of their AFMs! I was very impressed at the power of the highly configurable software that was demonstrated to me, far more powerful than anything I currently have access to.

I got the opportunity to meet some of the people who have written various pieces of software I use, such as PyMOL and ChemAxon. NanoAndMore were there and gave me a sample of a new type of conductive probe AFM tip that will hopefully help with some of the CP-AFM work I am doing. I also got quite a bit of expo swag such as periodic tables, beaker mugs and pens galore.

I got lots of ideas from the meeting, met lots of interesting people, have quite a few things I need to follow up on once I get a reasonable amount of sleep. It is a shame some of the hotels hosting meetings were so far apart but other than that it was a great meeting. I even squeezed in a little time to see some of Philadelphia, and the bits I saw were really nice. Back at work now, with new sources of inspiration and lots of new ideas/suggestions on my mind.

Chemistry Visualisation and Tools Meeting

Last week I was privileged to be invited to speak at a meeting about molecular modelling with a focus on tools, GUIs and visualisation. The meeting was held at the Holiday Inn in Runcorn and the Daresbury Laboratory (England). I wasn't expecting to be back in England quite so soon, having only just returned to Pittsburgh at the end of January.

The meeting was a great opportunity to present some of the latest work I and others in the Avogadro and OpenBabel communities have been doing to create tools that enable the building of molecules and structures, as well as their visualisation. It was also a great chance to hear some very interesting talks by the developers of other building tools and some quantum codes. Donald and I were also invited to Daresbury Laboratory to work with some of the CCP1GUI developers.

I presented my talk on Avogadro on Wednesday morning and have made the slides available here. Donald gave an introduction to Avogadro, some of the history and the architecture before I gave my presentation. We finished by taking questions while I demonstrated the Avogadro application. I think it was extremely productive. We had many more conversations over dinner and drinks later as well as in a workshop setting on Thursday afternoon.

It was great to be able to put a face to a few of the names and discuss current issues more informally in the evening. The talks were all of a very high quality and from a varied list of speakers from other open source projects, some of the free quantum codes as well as commercial products. I have come away from the meeting with a much better appreciation of the needs in the community and I feel that Avogadro is in a great position to fill the apparent void.

I am glad that we were able to get surface and orbital support working in Avogadro before the meeting. Right now we only support Guassian cube files but the implementation is general enough that I will be able to add support for further formats. I really think that if we can get enough people collaborating on a common project everyone can get the tool they need to effectively do their research at a much lower investment than could be achieved by working on many separate projects.

I met Tristan Youngs, the developer of Aten, who had implemented some really nice features in his molecular builder that is much more focussed on molecular mechanics. It is well worth checking out. As is Zeobuilder which was developed by Toon Verstraelen. They both implement some great features and have strengths in different areas. Of course my dream is to integrate many of these features via Avogadro plugins and have one editor which is capable of being used in a diverse range of applications.

It was also great to speak to Mario Valle who is doing some very interesting work in the area of new visualisation methods and supports a large user base of computational chemistry users. There were of course so many other talks but you can look at the schedule yourself and I think the slides of all the talks should be available in the near future.

I feel sure that many good things will come out of this meeting and hope to be able to attend similar meetings in the future. I would like to thank Jens once again for hosting the meeting and taking care of everything. I hope to see some patches and/or commits from him in the near future ;-)

Postdoctoral Associate Position at University of Pittsburgh

At the start of October I began my first postdoctoral position, at the University of Pittsburgh in the Chemistry Department working in the newly formed Hutchison Group for Geoffrey Hutchison. Life has been so hectic these last few months finishing my doctorate and preparing for my biggest move ever.

Louise and I arrived on the 27th of September and had less than a week to find a suitable apartment out here ready for me to start work. I am now a legal non-resident alien (I think - someone correct me if I got it wrong). I am very excited to be starting this new job, the move has been really tough but the research looks very interesting and I am sure that my time here will be very productive.

I have already been to quite a few interesting talks in the department and am making the transition from physics to chemistry. It is OK though as I get to do lots of physics and programming along with some more chemistry. Exciting times and I am sure I will talk more about my work in future. I have already posted a tutorial article on installing Sun Grid Engine on a Mac OS X cluster.

Dax made it out here and we are both enjoying the recent snow. This entry sat in drafts for about a month or so as I have been so busy sorting things out, coding, reading and going to physio for my high ankle sprain which I am told is both unusual and the worst kind of sprain...