Wright Lab @LBRN

This week, the Wright Lab is at LSU for the LBRN annual meeting. Here are the talks and posters for the lab:

My talk: https://wrightaprilmblog.files.wordpress.com/2020/01/lbrn.pdf
Christina’s Talk: https://wrightaprilmblog.files.wordpress.com/2020/01/lbrna-2020-pdf.pdf

Christina’s Poster: https://wrightaprilmblog.files.wordpress.com/2020/01/lbrna-2020-poster.pdf

Basanta and Courtney’s poster: https://wrightaprilmblog.files.wordpress.com/2020/01/poster.pdf

RevBayes & the universe

One of the interesting things about having a blog is that you can see what people are interested in, and when. This week, I can see a lot of traffic going to a couple of blog posts, Teaching Phylogenetics in the Cloud and Plan C. This is pretty common around the start of the semester – people are interested in trying new things in their teaching, and particularly using cloud technologies to improve access to compute resources.

I want to emphasize some new developments on this front. Jeremy Brown and I taught a workshop at the SSB2020 meetings on developing a hands-on classroom using RevBayes. Slides are here.

We covered three main things: Graphical models, and why they’re cool; how to build a graphical model; and some of the graphical interfaces that you might use to deliver RevBayes to your students. That last point is important – systematics isn’t a science isolation. To use phylogenetic methods, students need to understand statistics, they need to understand where their data come from, and what the biases in those data may be. Common tools like R and RStudio, or Python and the Jupyter notebook are often used for “data science.” Given the integrative nature of systematics as a discipline, doesn’t it make sense to make our tools interoperate more smoothly with a broader universe of tools for working with data?

We have some tools to help RevBayes play nicer with tools like RStudio and Jupyter. Michael Landis and Sebastian Frost developed a Jupyter Kernel for using Rev inside of Jupyter notebooks. Lately, David Bapst and I have been working on an RStudio interface. If you’re interested in any of these tools, please do see the install page for them, and give us feedback. A manuscript describing the classroom contexts for these tools is forthcoming; in the mean time, you might find interesting tidbits related in the paper described here.

If you were at the workshop, you ought to have received an email inviting you to take a survey on it, and inviting you to comment on the issue tracker for the RevKnitr repository. I’d like to extend that invite more widely – if you are teaching with RevBayes and would like to join the conversation with other educators, do feel free to open an issue on the RevKnitr issue tracker. It would be wonderful to have an active discussion on how to teach systematics expansively and inclusively.

Nantucket DevelopR 2019

This past week, we (Drs. Liam Revell, Klaus Schliep, Josef Uyeda, Claudia Solis-Lemus, and myself) hosted the Nantucket DevelopR Phylogenetics workshop again.

This is a really interesting course because it’s aimed at intermediate learners. Intermediate is slippery to define. I often think of it as the point where questions stop having clear answers – i.e., when you google for an answer, you don’t just get back “How to initialize a list.” You have to start thinking about optimization, or making code clean to read for other contributors.

Basically, an intermediate learner is someone who might not have a clear path forward. And at many universities, they might not have someone more advanced to go to for help. For intermediates, we don’t just need skills/information transfer, we need network-building.

So our goal with this workshop was a few things:

  • Get everyone a base of some basic intermediate skills: functional programming, efficient use of Git and GitHub, packaging R code, and phylogenetics in R. Materials here
  • Get motivated folks who might be on different ends of the “intermediate” spectrum together to work together productively on R phylogenetics packages
  • Create a diverse network of people who now all know each other and are connected by work on packages. Build a community of R-phylo developers

What worked

Bear in mind, these are personal reflections; we’ve not yet done our surveys.

  • Diverse leadership. Diverse teams are known to produce better results. And it’s known that diverse faculty can assist in establishing and maintaining diverse communities. I think that is reflected in the make-up of our course, which is more gender-balanced than the previous offering. We also took other steps, like a more verbose course advert, since literature suggests minoritized students don’t apply for things unless they meet more of the criteria than white male students. Making it easier to see how you fit means you see how you fit.
  • More coordination on the front end. Last time we offered this, Klaus and I were really unsure what the students would already know and need to know. This time, we had a little more blueprint, so we decided a few topics we would cover in advance.
  • Larger leadership team. Last time we did this, it was Klaus and I doing everything (+ my husband doing the cooking). This time, there were four of us with a more distributed knowledge base. This meant better lectures, and a wider array of things to accomplish.
  • Balance of work & lecture time. We only had four real days on the island. Two were mostly spent lecturing, two mostly working. The students got a lot done on the various projects.

What we could improve

  • More organization on the lead end. We had some last-minute upsets to infrastructure, which meant we did some last minute scrambling. This probably won’t happen next time, but we could do a bit more polling on interests for lecture topics, and organize food purchases somewhat better.
  • Scalability. This workshop was great, and there was far more interest than we could accommodate. Many great applicants we just couldn’t make room for. And for sustaining something, funders often want to see through-put. It would be great to keep the feel of everyone in one lodging, coding in the shared spaces, eating in the shared spaces. But we have little room to grow in the current location.
  • Something else will come up in evals, I’m sure.

In conclusion

What a wonderful week. I hope we can do it again. On a personal note, as much as I adore being PUI faculty, we do have fewer research active faculty. It was really nice to go somewhere and be in the company of other researchers and new PIs for a week. I feel very much refreshed going into the last month of classes.

Our next steps are to put together a post-workshop survey + check-in schedule to keep people motivated to finish projects.

The why, when, and how of computing in biology classrooms

Along with Rachel Schwartz, Catherine Newman, Jaime Oaks, and Sarah Flangan, I am an author on a preprint reviewing common technologies and teaching practices in teaching computation to biologists. In it, we review some of the technologies that educators might choose to use to deliver a course in computational biology. We also review the evidence for various strategies for teaching, including various ways to incorporate live coding and active learning into class.

I’m immensely proud of this paper for a few reasons. Weirdly, the inspiration for this paper came from Twitter. In the linked thread, we really saw that a lot of early career folks are struggling to keep up with the glut of educational technologies on the market. What is RStudio Server? When is that what I want, compared to a JupyterHub? Do I need to pay for hosting to teach computational biology?

This year’s iEvoBio theme was “Enabling the next generation of computational biologists.” So I decided (as the current head of iEvoBio) to put a little money into getting speakers to have this discussion at the meeting. Organically, this discussion became a meeting, and the meeting became a paper.

Something that I think is really cool about this manuscript is that the authors are from different types of institutions (R1s, PUIs) that attract different types of students. And so we decided to pay special mind to the challenges our real students have faced. What happens for students who can’t afford the latest and greatest laptop? Or who might go on deployment over the weekend and be without their personal computer? All of the challenges we discuss in this paper are real. The solutions we cite are solutions we use.

This manuscript is currently a preprint. If you see things that you think should change, you can make a difference! On the right hand side of the screen, you should see a link to post comments. We welcome your feedback! This is an F1000 preprint, and the reviews will be visible to readers as they become available, which is pretty cool.

Writing as training as writing is training

It’s been a little while since I blogged! I wanted to I wanted to highlight a new paper I authored called “A Systematist’s Guide to Estimating Bayesian Phylogenies From Morphological Data.”

This paper was a long time coming. It started as the forward to my dissertation, in fact! In the time since, one issue I’ve persistently come across is needing to onboard young systematists into research. I work at a primarily undergraduate institution, which means my students are, well, undergraduates. And to get them involved in research can be tough! In statistical phylogenetics, there’s no real equivalent of washing test tubes or feeding fish while you read papers and develop an independent project. Getting involved in our work means getting to work, right away. It’s like drinking from a firehose.

Computation is still not heavily incorporated in curricula basically anywhere. I have students take my computational biology course before starting in the lab, so that I don’t need to teach every student, personally, how to use Python or R.

But I do still need to work with students fairly intensively on systematics and mathematical modeling. This manuscript came from a need to have something accessible I could hand to each student, and say “Here, this is what we do in the Wright lab.” It’s a labor of love for the science. But it’s also a labor of love for me. Writing all this down in one place allowed me to reduce my training burden, and providing a solid overview of these methods allows the students to get a solid grounding on methods and be exposed to some of the literature.

I’ve already had multiple lab members tell me that the paper was clarifying for them to read. As I get older, as I train more students, that’s the only thing I really want to hear: that a paper helped them learn to be systematists, and helped them think through problems better. I hope it will for you, too.

Semester Wrap Up

Last semester, I taught computational biology for the first time at Southeastern (schedule, course materials). This is a little bit of a different ‘flavor’ of computational biology than a lot of the courses we see, since I’m not really a genomics person, but an evolutionary biologist, working in a department of mostly population (ecology, evolution, behavior) biologists. The audience was upper division undergrads and MS students, and one faculty member.

This semester, I decided to try something different than I have in the past, which is that I decided to forego installs at the start, and had them run everything in a JupyterHub. My blogpost on setting all that up is here. As I covered in that post, teaching undergraduates is different than graduate students. With graduate students, “I need to install this so I can analyze my data and get my MS/PhD” is a powerful motivator. They’re captive. My course is an elective. If the students feel super shitty and incapable after a day of installs, they can leave. And when they do, this is how they’ll feel:

Ye Olde Darwin Chestnut: “But I am very poorly today and very stupid and hate everybody and everything” Image via NPR

Undergrads require a reframing of how we teach computation. The goal might not be that they have a laptop full of software ready to go, but that they learn something about computation, feel confident in those skills, and get to interact with some MS students and research active peers. So I didn’t start with installs. We did them at the end, for students who wanted to keep working in Python on their personal computers or the state HPC. This was very smooth.

I used a combination of Jupyter Notebooks and the Hub’s command line to teach. I’ve documented a lot of my thoughts on this choice here. Fundamentally, to me, the argument for notebooks boils down to this: Our competitor isn’t C++ or MatLab. It’s Excel. The retreat to the familiar. To get people working a little more reproducibly, and taking those first steps in computation, why take away all the nice interface bells and whistles they’re familiar with? Notebooks render well, they enable note taking, and data tables printed in a notebook look familiar.

Over the first month, we went through the Data Carpentry Python ecology materials. This by and large went great. I’m a maintainer on those materials, and using them in class lead to new pull requests from me, and has informed my own thinking on some of the issues and pull requests raised during the Spanish translation  of the lesson.

One feedback that I got was that the first part of the course is very fast. I think next time, we’ll do 6 weeks on the really basic Python stuff. I’ll also split the first assessment into two pieces – one on the basic slicing operations, and one on functions and scripts. I kind of thought 4 weeks would be enough time to cover material that’s supposed to be covered in a 6-hour workshop. Alas.

The rest of the course, we do some querying of data from the web (Open Tree of Life, BLAST), phylogenetic computing with Dendropy, project management, and Git & GitHub. We also talk about Louisiana-specific stuff, like using the state supercomputer.

The final assessment was building Python packages, and doing teach-ins. Everyone did really well in the parameters of the assignment. The idea was that they would implement a couple functions in a package, document them, and then teach their lab (for the MS students) or the class (for the undergraduates) how to use the package. I self-doubted a little too much and let them re-use functions from earlier in class. For some reason, I thought I hadn’t shown them enough to do something totally novel. But it’s pretty clear from conversations after the fact that I could have aimed higher with this. Next time around, I’m going to structure my assessments like so:

1. Indexing, slicing, filtering data (Python in Pandas)
2. Functions and scripts
3. Querying the web, visualization
4. Making a Python package and putting it on GitHub
5. Teach-In

I was really worried about overwhelming them too early with assessment, but paradoxically made those concerns worse by holding off on the first assessment until too late.

Overall, I’m really happy with how the course went, and student evaluations suggest the learners were, too. For a first pass, I’m immensely satisfied. My second round with this course next fall will probably involve coming up with more biological narrative for the package making and GitHub steps.

Get Involved!

If you’re interested in any of this: I’m working on a SciPy proposal right now with Jeet Sukumaran. One of the things we would like to do is develop some Carpentries-style materials focused more towards phylogenetic data science – querying and cleaning data from the web, assembling phylogenetic datasets, processing MCMC output, visualization. We’d love collaborators!

I’m one of the hosts of iEvoBio this year, and the afternoon session will be on teaching computation biology. We’ll get started with lightning talks – 10 minutes on what you’re teaching, to whom, how you’re doing it, and what’s working about content delivery. Then, we’ll have a birds of a feather session where we try to write some of that info down to demystify course content delivery tech for instructors. We’ll put out a call for lightning talks soon. Feel free to get in touch early if you’re really keen!

I’ve also been having some conversations with the state supercomputing complex about making JupyterHubs available to host courses for free on those servers. If you might be interested (and are at an LA institution!), please get in touch.

Teaching Phylogenetics in the Cloud

A few weeks ago, I wrote about using JupyterHubs to make computational biology education more accessible to my students. I know teaching with Jupyter Notebooks has it’s detractors, but I’ve always noticed a difference when teaching with notebooks, as opposed to a text editor + the interpreter. The conversations in the classroom stay more focused on the material, rather than what they missed when the interpreter moved too fast, or when I switched from the script to the terminal. And, in fact, multiple students have told me similar – that they’ve felt lost in other courses, but not mine. I feel like there’s an education paper there that I don’t have the experience to write, but would happily collaborate on.

When we think about teaching computation, or phylogenetics, we often think of PhD students. Their questions look like this:

  • “I have data, can you please please please help me analyze it?”
  • “I have data and my advisor says I need a phylogeny like tomorrow, help???”
  • “I read about this technique, can you help me understand if it’s appropriate for my data?”

And so I’ve mostly switched over to a read-try-create model for teaching. Read an example, try to run the example and understand the output, create your own extension or apply the concept to novel data. I find this works better for early-stage MS students, and my undergraduate students.  Their questions often look like this:

  • “I think I might find phylogeny interesting, can I try?”
  • “I’ve heard that computation is the wave of the future, but can I really do it?”
  • “I’m not sure this will be part of my career as a scientist, can you try it with me?”

Read-try-create in a notebook environment puts content over delivery.

I got to wondering if I could see similar shifts in phylogenetics if I adopted this framework for teaching phylogeny. I typically teach phylogeny with RevBayes. There are a few reasons for this – I’ve implemented things in RevBayes, I like the graphical model framework, the analyses that I want to do are implemented there. The tutorial materials are also wonderful. But RevBayes has a framework in which you specify almost all parts of a phylogenetic model, including some concepts that are quite an abstraction from empirical biology, like specifying MCMC moves. Learners get overwhelmed, fast, and switching back and forth between text editor and interpreter is a lot for many of them.

Simon Frost and Michael Landis created a Rev Jupyter Kernel. I recently contributed a pull request that fixes some core functionality of this, and it’s now ready for use. I’ve been using the RevNotebook in a Littlest JupyterHub instance to onboard undergraduates and MS students into phylogeny research, and I’m really happy with it. Here’s what I did:

  1. First, I set up a JupyterHub on Digital Ocean according to these instructions. There’s more on this in my Plan C post. I started an 8 GB RAM instance.
  2. In the terminal of the JupyterHub, I installed RevBayes according to the Linux instructions. Mostly – the build command to build a Jupyter-ready RB is ./build.sh -jupyter true
  3. I cloned the kernel. And followed the instructions. Note that for a JupyterHub, to make RevNotebook available to all your users, any sudo command will need to be sudo -E
  4. Then, I cloned in the repository where I’ve been stashing notebooks.
  5. Next, I added students
  6. Finally, I created a link for students to click on to automatically sync their copy of the lessons with mine.

So far, so good! For the most part, the students are working through the lessons on their own, and it’s going so much better than last year when I was assigning them lessons, and having them work from the PDF at the Rev  interpreter. Anecdotally, I feel like comprehension, and crucially retention, is higher.

But don’t they need to learn command line???

Yes. Yes, they do. But I think if they really understand both phylogeny and Rev before I turn them loose on our friendly local HPC, it will be better. It’s not that hard to run a script. We’ve practiced running some Rev scripts in the terminal of the JupyterHub. I really don’t doubt that we can take those skills are port them to an HPC.  I’ll probably make my second-year undergraduates give the new members the five-cent tour of the HPC. But understanding what’s in the script … that’s hard.

I wanna see the RevHub!

RevBayes is too memory-intense to compile in a MyBinder, but if you want to play around, I can give you access to my RevHub. Just drop me a line.

Additionally, I am unhappy with my workflow for converting our Latex and Markdown tutorials to Notebooks. If you’d like to help, I’d love a buddy. A couple of us have been floating the idea of a SciPy meeting sprint to develop a set of notebooks for teaching phylogenetics in Python and Rev. Get in touch, if you’re interested? No, that doesn’t look right. Get in touch!!! Diversity in contributorship is our strength.