Writing as training as writing is training

It’s been a little while since I blogged! I wanted to I wanted to highlight a new paper I authored called “A Systematist’s Guide to Estimating Bayesian Phylogenies From Morphological Data.”

This paper was a long time coming. It started as the forward to my dissertation, in fact! In the time since, one issue I’ve persistently come across is needing to onboard young systematists into research. I work at a primarily undergraduate institution, which means my students are, well, undergraduates. And to get them involved in research can be tough! In statistical phylogenetics, there’s no real equivalent of washing test tubes or feeding fish while you read papers and develop an independent project. Getting involved in our work means getting to work, right away. It’s like drinking from a firehose.

Computation is still not heavily incorporated in curricula basically anywhere. I have students take my computational biology course before starting in the lab, so that I don’t need to teach every student, personally, how to use Python or R.

But I do still need to work with students fairly intensively on systematics and mathematical modeling. This manuscript came from a need to have something accessible I could hand to each student, and say “Here, this is what we do in the Wright lab.” It’s a labor of love for the science. But it’s also a labor of love for me. Writing all this down in one place allowed me to reduce my training burden, and providing a solid overview of these methods allows the students to get a solid grounding on methods and be exposed to some of the literature.

I’ve already had multiple lab members tell me that the paper was clarifying for them to read. As I get older, as I train more students, that’s the only thing I really want to hear: that a paper helped them learn to be systematists, and helped them think through problems better. I hope it will for you, too.

Semester Wrap Up

Last semester, I taught computational biology for the first time at Southeastern (schedule, course materials). This is a little bit of a different ‘flavor’ of computational biology than a lot of the courses we see, since I’m not really a genomics person, but an evolutionary biologist, working in a department of mostly population (ecology, evolution, behavior) biologists. The audience was upper division undergrads and MS students, and one faculty member.

This semester, I decided to try something different than I have in the past, which is that I decided to forego installs at the start, and had them run everything in a JupyterHub. My blogpost on setting all that up is here. As I covered in that post, teaching undergraduates is different than graduate students. With graduate students, “I need to install this so I can analyze my data and get my MS/PhD” is a powerful motivator. They’re captive. My course is an elective. If the students feel super shitty and incapable after a day of installs, they can leave. And when they do, this is how they’ll feel:

Ye Olde Darwin Chestnut: “But I am very poorly today and very stupid and hate everybody and everything” Image via NPR

Undergrads require a reframing of how we teach computation. The goal might not be that they have a laptop full of software ready to go, but that they learn something about computation, feel confident in those skills, and get to interact with some MS students and research active peers. So I didn’t start with installs. We did them at the end, for students who wanted to keep working in Python on their personal computers or the state HPC. This was very smooth.

I used a combination of Jupyter Notebooks and the Hub’s command line to teach. I’ve documented a lot of my thoughts on this choice here. Fundamentally, to me, the argument for notebooks boils down to this: Our competitor isn’t C++ or MatLab. It’s Excel. The retreat to the familiar. To get people working a little more reproducibly, and taking those first steps in computation, why take away all the nice interface bells and whistles they’re familiar with? Notebooks render well, they enable note taking, and data tables printed in a notebook look familiar.

Over the first month, we went through the Data Carpentry Python ecology materials. This by and large went great. I’m a maintainer on those materials, and using them in class lead to new pull requests from me, and has informed my own thinking on some of the issues and pull requests raised during the Spanish translation  of the lesson.

One feedback that I got was that the first part of the course is very fast. I think next time, we’ll do 6 weeks on the really basic Python stuff. I’ll also split the first assessment into two pieces – one on the basic slicing operations, and one on functions and scripts. I kind of thought 4 weeks would be enough time to cover material that’s supposed to be covered in a 6-hour workshop. Alas.

The rest of the course, we do some querying of data from the web (Open Tree of Life, BLAST), phylogenetic computing with Dendropy, project management, and Git & GitHub. We also talk about Louisiana-specific stuff, like using the state supercomputer.

The final assessment was building Python packages, and doing teach-ins. Everyone did really well in the parameters of the assignment. The idea was that they would implement a couple functions in a package, document them, and then teach their lab (for the MS students) or the class (for the undergraduates) how to use the package. I self-doubted a little too much and let them re-use functions from earlier in class. For some reason, I thought I hadn’t shown them enough to do something totally novel. But it’s pretty clear from conversations after the fact that I could have aimed higher with this. Next time around, I’m going to structure my assessments like so:

1. Indexing, slicing, filtering data (Python in Pandas)
2. Functions and scripts
3. Querying the web, visualization
4. Making a Python package and putting it on GitHub
5. Teach-In

I was really worried about overwhelming them too early with assessment, but paradoxically made those concerns worse by holding off on the first assessment until too late.

Overall, I’m really happy with how the course went, and student evaluations suggest the learners were, too. For a first pass, I’m immensely satisfied. My second round with this course next fall will probably involve coming up with more biological narrative for the package making and GitHub steps.

Get Involved!

If you’re interested in any of this: I’m working on a SciPy proposal right now with Jeet Sukumaran. One of the things we would like to do is develop some Carpentries-style materials focused more towards phylogenetic data science – querying and cleaning data from the web, assembling phylogenetic datasets, processing MCMC output, visualization. We’d love collaborators!

I’m one of the hosts of iEvoBio this year, and the afternoon session will be on teaching computation biology. We’ll get started with lightning talks – 10 minutes on what you’re teaching, to whom, how you’re doing it, and what’s working about content delivery. Then, we’ll have a birds of a feather session where we try to write some of that info down to demystify course content delivery tech for instructors. We’ll put out a call for lightning talks soon. Feel free to get in touch early if you’re really keen!

I’ve also been having some conversations with the state supercomputing complex about making JupyterHubs available to host courses for free on those servers. If you might be interested (and are at an LA institution!), please get in touch.

Teaching Phylogenetics in the Cloud

A few weeks ago, I wrote about using JupyterHubs to make computational biology education more accessible to my students. I know teaching with Jupyter Notebooks has it’s detractors, but I’ve always noticed a difference when teaching with notebooks, as opposed to a text editor + the interpreter. The conversations in the classroom stay more focused on the material, rather than what they missed when the interpreter moved too fast, or when I switched from the script to the terminal. And, in fact, multiple students have told me similar – that they’ve felt lost in other courses, but not mine. I feel like there’s an education paper there that I don’t have the experience to write, but would happily collaborate on.

When we think about teaching computation, or phylogenetics, we often think of PhD students. Their questions look like this:

  • “I have data, can you please please please help me analyze it?”
  • “I have data and my advisor says I need a phylogeny like tomorrow, help???”
  • “I read about this technique, can you help me understand if it’s appropriate for my data?”

And so I’ve mostly switched over to a read-try-create model for teaching. Read an example, try to run the example and understand the output, create your own extension or apply the concept to novel data. I find this works better for early-stage MS students, and my undergraduate students.  Their questions often look like this:

  • “I think I might find phylogeny interesting, can I try?”
  • “I’ve heard that computation is the wave of the future, but can I really do it?”
  • “I’m not sure this will be part of my career as a scientist, can you try it with me?”

Read-try-create in a notebook environment puts content over delivery.

I got to wondering if I could see similar shifts in phylogenetics if I adopted this framework for teaching phylogeny. I typically teach phylogeny with RevBayes. There are a few reasons for this – I’ve implemented things in RevBayes, I like the graphical model framework, the analyses that I want to do are implemented there. The tutorial materials are also wonderful. But RevBayes has a framework in which you specify almost all parts of a phylogenetic model, including some concepts that are quite an abstraction from empirical biology, like specifying MCMC moves. Learners get overwhelmed, fast, and switching back and forth between text editor and interpreter is a lot for many of them.

Simon Frost and Michael Landis created a Rev Jupyter Kernel. I recently contributed a pull request that fixes some core functionality of this, and it’s now ready for use. I’ve been using the RevNotebook in a Littlest JupyterHub instance to onboard undergraduates and MS students into phylogeny research, and I’m really happy with it. Here’s what I did:

  1. First, I set up a JupyterHub on Digital Ocean according to these instructions. There’s more on this in my Plan C post. I started an 8 GB RAM instance.
  2. In the terminal of the JupyterHub, I installed RevBayes according to the Linux instructions. Mostly – the build command to build a Jupyter-ready RB is ./build.sh -jupyter true
  3. I cloned the kernel. And followed the instructions. Note that for a JupyterHub, to make RevNotebook available to all your users, any sudo command will need to be sudo -E
  4. Then, I cloned in the repository where I’ve been stashing notebooks.
  5. Next, I added students
  6. Finally, I created a link for students to click on to automatically sync their copy of the lessons with mine.

So far, so good! For the most part, the students are working through the lessons on their own, and it’s going so much better than last year when I was assigning them lessons, and having them work from the PDF at the Rev  interpreter. Anecdotally, I feel like comprehension, and crucially retention, is higher.

But don’t they need to learn command line???

Yes. Yes, they do. But I think if they really understand both phylogeny and Rev before I turn them loose on our friendly local HPC, it will be better. It’s not that hard to run a script. We’ve practiced running some Rev scripts in the terminal of the JupyterHub. I really don’t doubt that we can take those skills are port them to an HPC.  I’ll probably make my second-year undergraduates give the new members the five-cent tour of the HPC. But understanding what’s in the script … that’s hard.

I wanna see the RevHub!

RevBayes is too memory-intense to compile in a MyBinder, but if you want to play around, I can give you access to my RevHub. Just drop me a line.

Additionally, I am unhappy with my workflow for converting our Latex and Markdown tutorials to Notebooks. If you’d like to help, I’d love a buddy. A couple of us have been floating the idea of a SciPy meeting sprint to develop a set of notebooks for teaching phylogenetics in Python and Rev. Get in touch, if you’re interested? No, that doesn’t look right. Get in touch!!! Diversity in contributorship is our strength.



Plan C

This semester, I have the pleasure of teaching a semester-long Computational Biology course for the first time at Southeastern. This is really exciting – I’m very excited to help a new generation of students learn to use computation to do their research. We have about 12 students – MS, undergraduates and even a faculty colleague.

It’s always a challenge to teach these sorts of courses. In terms of the material, computation is just different than the normal biology classes. It’s a different skill set, and many students have low awareness of scientific and technical computing.

But that’s not what I’m going to talk about today. Today, I’m going to talk about the technical aspects of a course like this. Computing courses aimed at graduate students typically go something like this:

  1. Show up; say hello; get a cup of coffee.
  2. Sit down and do installs.
  3. Do tutorials.
  4. Do additional installs later in the course, as needed.

This works OK. Graduate students are generally aware that they a) have research data and b) need to analyze it if they’re leaving with a PhD at some point. While it’s nice to make the installs not be horrifically painful, the audience is sort of captive.

Undergraduates aren’t. My class is an upper-division elective; they could do something else. If the installs are too torturous or their computer can’t run the software, they can leave. And that’s a particular issue for the students I serve. Southeastern is in Southeastern Louisiana, a historically low-income region of a low-income state. Most students work 20+ hours a week outside class and end up doing homework on a work computer that isn’t their personal machine. Some might not have reasonable computers. Some might be military or coast guard and might need to keep up with homework on a deployment weekend.

Enter JupyterHub. JupyterHub allows instructors to serve multiple instances of Jupyter notebooks for their classes. The servers can be then hooked up to a custom domain, so students can navigate in their browser to a course website, login, and start an interactive compute session without doing any installs. Need to do homework on a work computer? No problem! Your computer is 6 years old and you might need to finish the class out on a loaner laptop? No problem.

In particular, I used the Littlest JupyterHub, a JupyterHub variant for small classes. TLJH is meant to be used in a single-server set up. This works well for me for a couple of reasons:

  1. What we’re doing is fairly simple – install a few Python packages for working with data. We’re not doing a ton of complicated installs, or working with multiple languages, or compiling a whole lot of finnicky FOSS software. I don’t have a real need for containers, either.
  2. I have a newborn at home, and I’m very tired and I need everything explained like I’m five. TLJH docs are very good at this.
  3. I’d like this course to work, and work smoothly, and work with easily-communicated technology so I can encourage other faculty to adopt the model and infrastructure.

I had originally intended to serve the course off of a the State of Louisiana supercomputer, but a major equipment breakage has been taking up all the staff’s time, so they couldn’t set up a JupyterHub to their security specifications for me. I ordered a small server to run the course … and it didn’t arrive in time to start class. TLJH has well-written instructions for deploying on Digital Ocean, a provider for customizeable cloud servers. The time from making the choice of server to having a working hub was about 5 minutes. I purchased a domain name, and linked the nameserver and server address and had a working web portal in another few minutes.

Once the hub was working, I logged into it, opened a terminal and followed these instructions to start installing packages.

Each week, I make my lectures in Jupyter Notebooks, make a homework notebook, and push them to github. [We’re in week three of the class and still working our way through Data Carpentry‘s python materials. As a maintainer for these materials, I am slightly concerned that we’re in instructional hour 12 and still on these lessons … lot of stuff in there. We do every exercise, though.] Then, I use the nbgitpuller utility to generate a link to my repo. This causes the materials in each student’s hub to sync to my personal repo when clicked. This way, I can serve materials based on my GitHub and use version control in the way I’m used to, without inundating novice students with git lingo right away. I put the link on the schedule for that day. The students arrive, click the link, sync, and we get started.

Overall, this is very easy, and I’m very happy.  I have 12 students + me working through the dc-ecology-py materials on a 2 CPU, 4 GB memory system. That seems sufficient, and  my Digital Ocean server can be resized on the fly in under 2 minutes if I decide it isn’t. So far, the course has cost like $2.50 to run.

I’m the Bioinformatics and Computational Biology Core organizer for the Southeastern campus, under the state’s INBRE funding. I’m hoping to discuss these experiences more at the INBRE retreat in two weeks, and hopefully drive forward adoption of these types of course set-ups. TLJH is clearly a very important tool in the kit for serving the students that we have.

Since RevBayes also now has a Jupyter Kernel, this also seems like a potentially exciting way to serve systematics and macroevolution classes. Stay tuned for more on this.

Edit: Got a good question on twitter:

And the answer is yes! There is a terminal in JupyterHub, so you can still practice the command line, command line based revision management, and running python scripts at the command line. Below are screenshots of how you access it.




I started with the idea of navigating the file system from within Python last week, then did some shell navigation the following class period. I even had props!




Year One: Work Life Balance

I’ve tackled course prep and setting up a research lab. Now, that topic no one is good at, but only one gender gets pushed to talk about.

Just kidding, this is really important and I’m happy to talk about it. I’m going to start by admitting that this month, work-life is really skewed to work. I’m due to have a baby in between 1 (my due date) and 7 (scheduled C-Section) days. So I’m working pretty hard to try and get those last few things done. Or close to done. In the previous few, it has been skewed to life, since I’m tired and need to go to bed quite early. In a not-hugely-pregnant person, this might mean getting up earlier. No such luck.

I feel like, on average, my work-life balance is reasonable. Typically, I get up about an hour earlier than my daughter, and work on correspondence triage, then writing and/or programming. Then, when my daughter gets up, I put her in the stroller and run her to school and take myself on a short run. I haven’t done that in about a month; it’ll resume in the fall. Get dressed, get in to the office. Pick Alice up at 5, my husband and I cook dinner and hang out with the kid. After bed time, I normally put in a couple hours doing fairly mindless tasks. Especially now, I’m very fried, so this time is often spent checking HPC jobs, email triage, posting reminders for classes, making todo lists and schedules, or downloading files to prepare for productive analyses in the morning. I try not to work while Alice is up, and succeed a majority of the time.

I sleep well, I eat well, I get lots of exercise. Here’s how I facilitate that in different parts of my life.


What worked:

  • Hard deadlines. My homeworks become available one week before due; they cover information that has already been covered in class. I have 50% of my office hours in the first half of the week. I make all this very clear to the students. I think it is more than reasonable to expect that a student could look at the homework, review it, get in touch with me early. And I observe this – the vast majority of homeworks are turned in more than 24 hours early. Obviously, there is forethought that needs to happen here to get homework up a week in advance, but my homeworks largely come from my lecture outlines. So to plan the lectures is to plan the homeworks. I had very few students miss homework assignments, and in my undergraduate courses, spent mere moments negotiating late homework or arguing back for points. Key with explanations goes live the minute homework closes, so they can plan to meet me in office hours with questions.
  • Asking proactive questions about meetings. What are you confused about? What are you struggling with? That way, I can tell a student what to bring to the meeting, and what to do to prepare for it. I can also do my prep – a meeting with a student who is struggling with how to study is really different than a meeting with a student who wants to walk through some calculations in-depth. If we both prepare, we’re in and out quicker with fewer follow-ups.
  • Enjoying teaching. I really like teaching, so I don’t find the labor of preparing courses to be awful. Time-intensive, but if you hate teaching, you should go somewhere were this is not expected. On a really fundamental level, if you don’t enjoy the distribution of activities you do in a day, you will feel negatively about your job, and it will impact how you feel about your life, and how you interact with all parts of your life.

What didn’t:

  • First time around: I had three new preps this year, and that’s just hard. It’s hard however you slice it. I feel quite good about where each of those courses stand, in terms of being able to make incremental improvements that are much less time-intensive the second time around. But that first time with any course is a real bugbear.


What worked:

  • Choosing reviews wisely. I turn down a lot of reviews. I suggest reviewers in my stead, and feel no guilt about it. I also email editors frequently. I have my limitations – maybe the review will be late. Maybe I won’t be available for the re-review. If I’m not sure I should accept, I just ask.
  • Choosing service wisely. I’m on the council for the Society of Systematic Biologists. That’s my home society, and it’s important to me to take part. I’m also a maintainer for the Data Carpentry Python materials. I use that material regularly in my courses, so maintaining it, and having it maintained by others, inures to my benefit.
  • Saying no. I’ve applied for my first university committee. It’s one I feel strongly about, and feel that my research program would be benefitted by participating. Important work needs to be done, but not necessarily by you. I am slowly adding responsibilities. Life is long, and there will be time.

Research and Mentorship:

What worked:

  • Tracking my habits for a couple weeks. Are there certain times of the day when I am fresher for certain tasks? I schedule my day aggressively, and protect my schedule. There is an assertiveness to doing that that is hard to develop when you’re so used to being in “pleaser mode” from the job search. I use a variety of tactics: turning off notifications on messaging apps, closing my door and changing my venue to find appropriate ways to get time alone when I need it.
  • Proactivity about meetings. Much like under teaching. What is the agenda? Is there reading, by either of us, that needs to be done before we meet? I got very assertive in my second semester: if I had asked for something to be read, and it clearly wasn’t, meeting over. We will reschedule, and I will use the remainder of the currently scheduled meeting however I see fit. Likewise, I will not give up time from the task after the meeting if you’re late. This is really important with undergrads, particularly, who don’t necessarily understand that when you’re just “on your computer”, you’re working. Especially as an evolutionary biologist who primarily uses computational tools to get at their questions.

What Didn’t:

  • Proactivity about meetings. I was not proactive enough about agendas for some of my undergraduate meetings. I could have moved some projects forward more if I had a little more prep time. Which brings me to …
  • Being new faculty. Everything just takes a long time. SO much longer than you think – like that Stephen King story, The Jaunt. It’s an eternity in here.


What didn’t:

  • The first year is just hard, emotionally. You move somewhere new. That costs money. My spouse has to redo his professional certification because it doesn’t transfer. That costs money, and means he can’t bring in money. You need to make all new friends, when funds are tight (and I can’t drink beer).  You don’t know where things are, you make a lot of choices from a distance without having full pro/con info. I think we have largely made good choices as a family, and we are on track to be where we want to be, but this stuff is hard, and that is inherently part of the process.
  • Just asking. I need to be more proactive about just asking. Purchasing? Someone knows what to do – ask before shopping so you don’t waste time when there’s an approved vendor for something. I’m getting better about it, but there’s that little voice inside that tells you not to be a bother. That voice is a jerk; strangle it.

What did:

  • The first year is just hard, but it would be impossible without my husband. He is the better parent, debatably the better cook (he cooks meat and I don’t – people get hung up on that point), and the absolute only person I could imagine doing this with. I don’t want to get too mushy here, but the good company of a true partner is an inestimable boon in this whole process.
  • Getting regular exercise. The 5 weeks since I stopped running (due more to poorly-managed allergies than pregnancy) have been not amazing. It turns out I feel better and more focused by a long shot when I’m getting a morning run. And by month 7, even in the heat, I was running a sub-7 minute mile. When you need a win, hit the gym! I worked exercise into my day – I run my kid to daycare. Because I can take a different route, it’s actually about as quick as driving her would be.
  • Cooking, at home. It’s hard to summon the energy to cook after a long day. But it’s time with my spouse and kid. Kids love to cook, and we all have to eat, so put ’em to work! I have never really eaten fast food extensively, but I think I would not feel as good, physically or mentally, if I were eating … cheese curds? what even is vegetarian fast food? daily. Walking to daycare, getting my daughter, and going home and making a meal as a family is a good point of disconnection from the day. It’s too hard to get work done with the toddler up, and trying to do it just makes everyone upset, so there needs to be a nice, clean cut at the end of the work day until she’s in bed.
    • Also, that first year, it’s hard to get out for lunch because (say it with me) everything takes so long as new faculty. So you need to pack some leftovers and put some snacks in your office. That’s not just my pregnant belly talking.


I’m still hugely pregnant, so I can’t do a what worked/what didn’t on this. Here are some disorganized thoughts:

  • Continuing to exercise was a good move. I didn’t do this well my first pregnancy, and I felt better and had better energy this pregnancy.
  • Listening to my body. There is an inflection point, every night, at 9:15 where my body tells me “No serious work can happen after this.” And at this point, I wrap up what I am doing and do the last mindless tasks I need to prep for the new day. Then it’s tea and reading in bed.
  • Scheduling courses that can be taught seated.
  • I’m not traveling without the baby while nursing this time. Last time around, I had a fairly seriously upsetting experience while traveling that lead to me having to quite nursing 10 months early. It was shocking, and traumatizing, and I still struggle with a lot of negative emotions (anxiety, grief) about it … and I’m not someone who has a lot of negative emotions, so that’s really confusing. This time, I’m taking him with. It turns out that if someone wants you to be faculty at their thing, they can probably help you on the cost of a daytime nanny. It’s worth asking. I don’t think we win this one by contorting our bodies to be small and unobtrusive to a labor system that wasn’t invented for them.
  • Being honest about leave. What will happen? How will I monitor students’ research progress? When will I be reachable? I talk about diversity issues with my students pretty often, and I think it’s important for them to see this aspect of that, too. Undergraduates are trying out the identity of scientist as much as they are trying out science, so they should appreciate that “scientist” is not separate from “wife” or “mother” or “runner” or “avid reader” or “kitchen hermit”. You get to be any, or all, or completely different things, as well as “scientist.”

I’ll probably revisit this post in late summer when the baby is here and I’ve actually been working as a mom of a toddler and an infant.

Edit: One thing I forgot. Lower your standards for your house when you have kids. My standard is this: Imagine my husband and I get in a car wreck. Someone has to come take care of the kids at the house for a bit. They must find: enough food in the fridge and non-perishables that they can take care of them. Enough clean laundry that they don’t immediately have to start washing up. The kitchen and bathrooms clean and organized enough that they can find things to cook a meal, do bath time, etc, safely and efficiently. As long as those standards are met, fine. Anything else is lagniappe, and we treat it as bonus rounds.

Year One: Setting Up a Research Lab

When I started telling people I was starting a faculty position, one of the things I didn’t expect was how many people would chuckle and say “Oh, I remember that. I didn’t get anything done.”

Everyone who said that to me wins. I got virtually nothing done. Even a paper that is fairly close to completion is still limping on in some last simulations (really is almost there, though).

So what worked and what didn’t, year one?


I mentioned in yesterday’s post that Southeastern is a primarily-undergraduate institution. That means most of my research involves undergraduates. Louisiana is also a low-income state. Undergraduates don’t necessarily have access to the latest and greatest personal Mac laptop to do compile cutting edge software and run it at all hours of the night. This is a serious equity issue – if you try to do research and are stymied by your equipment, are you going to continue? Particularly when we’re talking about research that is really different than your training, as computation tends to be for biologists, this is a potential huge bottleneck. But access to compute power is important to my research, so here’s what worked for solving this:

  • Keeping compute-intense tasks off of student laptops. Anything that runs a long time needs to be run either on a high-performance cluster computer or a lab server. I really do think interacting with remote machines is an important skill, and Louisiana has very nice compute infrastructure. I opted to solve this by applying right away for allocations on those resources, and this was very smooth. The resources are free, and they are eager to develop better relationships at the PUIs, so support is top-notch.
  • The students who are staying on for the summer have Linux laptops. I bought cheap (~$270 on sale) machines (4 core, 8 GB mem, 1 TB conventional hard drive) machines a couple weeks ago. One night, after my daughter went to bed, I sat down, made an Ubuntu boot disk, and installed Linux on all the machines inside of an hour. Then, I installed a minimal set of scientific programming tools (Python via Anaconda, Dendropy) and software (RevBayes, Tracer, FigTree), and a revision management system (git). Because we keep the heavy computations off the personal computers, the quality of the computer isn’t strongly important. I also got one for myself so I can eat my own dogfood and be familiar with the operating system I’m asking them to use.

Here’s what didn’t work as well:

  • Not getting standardized laptops into their hands sooner. With graduate students, they have a little more resilience to things like minor interface differences, and they have a little more experience to try to solve errors on their own. With undergraduates, it’s more important to have that standardized environment. Confusion about absolute paths and copying data to and from a remote server, continued to be issues throughout the semester, particularly when students are collaborating. Standardizing the environment means everyone sees the same things when they fire up a terminal. When I set up the laptops, I set up a project bin that they’ll keep all the code, data and output in, so there’s not confusion when they look over and see their buddy working in /home/user/Documents/sciencestuff/myproject, but they’re in /home/user/projects.


As those of you reading are probably aware, I’m at a PUI, so the emphasis on funding is different than it would be at an R1. But I still do need money to spend on equipment and, more importantly, labor.

What worked:

  • Focusing on the state-level. This was a suggestion from my chair – focus on funds with smaller applicant pools. They’re smaller dollar amounts, but that’s fine – I’m mostly paying undergrads. These types of funds are less competitive, but also allow you to build up a track record of getting lab funds, and doing stuff with them. I got some funds this way, and they’re more-or-less exactly the money that I need to spend over the term they’re for. I’m probably going to write about maternity leave and work-life balance in the first year later this week, and I think a good grant strategy was a key piece of this.
  • Asking for guidance. Doing everything is not sustainable. Talk to your chair and your dean about where to focus. If you have questions, just ask them.

What didn’t:

  • With planning my move, and the summer conference season last year, I sort of dropped the ball on seeing the deadlines for some of those state-level funds for particular purposes (like undergraduate fellowships or course development). This year, I’m more proactive. Two of my students have developed plans for applying for state-level funds for undergrads in the fall, and I’ll apply for some course-based research initiative funding in late summer, as well.


This is the hardest part because it’s not just a science problem, it’s a people problem. It’s a social problem when all you might want to do is shut your office door, put on your headphones and code. But your mentees depend on you.

What worked:

  • Scheduled standing meetings. My undergraduates had one meeting a week with me, and one lab meeting. I’ve now had three undergraduates continue with me for a full year. The first semester is run like a class: you learn some computation, and some Bayesian stats for phylogenetics, then move on to specific software and analyses (RevBayes, in this case, for phylogeny and divergence time estimation). Second semester, we really dig into playing with the data.
  • Pair work. Having students start in pairs and work together is much, much better than having them not. I suspected this, but allowed some students to work alone, and this was a bad move.

What didn’t:

  • I need to implement more structure next semester. Here’s what I’ll be doing:
    • I’m teaching a computational biology course, that will focus on data management and good practices for reproducible science. I will teach this each fall, and students who want to work with me will register for this course. They will also register for a small number of research hours to start learning about phylogenetics.
    • Only pairs. Seriously, working in pairs is the best.
    • Office hours. Per the Carnegie classifications, each credit hour can be assigned 3-4 out-of-class preparation hours. I am going to insist that 50% of those hours be worked in the lab, during the work week. It’s not sustainable for students to be lumping research in with “homework time” in the evening or weekends.  As an undergrad, they often look at a Saturday afternoon and say “Oh wow, I can do my 6 hours of research work in one long, focused session!” And then they go to pull the updated scripts from GitHub, hit a conflict, and tank the whole afternoon because they can’t get a hold of me on Slack to help them solve it. I’m not going to sit there and watch them those 50% of hours, but getting work sorted out while I’m just down the hall or active on Slack will be helpful.
    • More goal orientation. Undergraduates can be tricky, because you don’t know they’ll stick around. But putting conferences on the calendar with research benchmarks won’t hurt anything.


What worked:

  • Blocking off time. I make a to-do list every night. I block off time for specific tasks. Something I did was, for a few weeks, tracked how I felt during the day. What tasks do I want to do, and when? That way, I can schedule time for programming when I know I typically have the brainspace to do that.
  • Using space effectively. I close my door more often now. It’s a good move.
  • Saying no. There has been so much ink spilled on saying no that I can’t possibly add to it. You can even think something is important, and needs to get done, and say no to being the one to do it. For real!

What didn’t:

  • Being new faculty. It turns out this is just hard. Everything takes longer because you are learning it. Tasks that, now that I know how they work, will only take a few moments took me forever. Course prep took forever. Fighting Moodle took forever. This is a process, and it will get easier.
  • Not grouping meetings effectively enough. Even though my teaching schedule was very clustered this semester, I didn’t get that many “deep thinking” blocks because I didn’t group meetings effectively. I feel much more productive this semester, because I was able to guard my time better. I’m getting very close to finishing a paper I’m passionate about and have been working on for a long time. But those thinking blocks are important, and I need to really look carefully at ensuring those blocks occur. Especially since I don’t have PhD students and postdocs, and handle more of the research stuff personally rather than delegating.
  • Better workflow for results documentation. This summer, I’m piloting a more checklist-based model for research tasks for undergraduates, with results deposition. That way, even if we can’t meet to discuss a result in short order, I can take a look and be prepared for the next meeting with a new set of tasks.

Even though I feel like I didn’t accomplish what I wanted before the baby comes (which is any day now), I do feel like I accomplished a lot. In particular, learning the institution and developing mentoring pipelines that will allow me to effectively leverage undergraduate enthusiasm into research. This is a process that looks different at PUIs than R1s, since the structure of delegation is really different and the labor pool is different. I can’t necessarily say “What would $R1_MENTOR do.” While I went to a PUI for undergrad, I didn’t have research-active role models there, so the learning curve has been steep. But with all I’ve learned, and excellent departmental support and colleagues, I feel very confident about the road ahead.

Year One: Courses

Been a little while since I blogged. Well, being first year faculty is hard.

In this blogpost, I’m going to tackle what worked and what didn’t about courses I offered. Particularly, I’m going to focus on Spring, since that’s when I added courses to the books (Systematics lab and a bioinformatics component of a genomics course).

A brief note about the set-up of our institution. I’m at a primarily undergraduate institution (PUI), which means I’m heavily incentivized towards course development. Course sizes are pretty small at a school like this, and so I can get away with a lot of personal help and personal feedback to students. Some of the materials (particularly the RevBayes tutorials in the Systematics lab) have been used in classes of >50, but beyond that, scalability is not something I have thought much about.

What worked:

  • Integration of good computational principles with the course content. Both courses started out going through the Carpentries UNIX shell and Git lessons. All the software we used this semester was command-line executeable, and that is very much the norm in biological computation. It’s fine to not be a shell expert and do all this stuff, but things are much smoother if you do have some expertise in navigating directories and managing data.
  • Superb HPC support. Public institutions in Louisiana have access to LONI, a high-performance computer cluster run out of LSU in Baton Rouge. The facilities offered are varied and met our needs well, from having availability of interactive single nodes for live-typing demos to big mem nodes for running assemblies overnight. They’re really eager to see adoption of HPC at the state PUIs, and so we got really excellent customer support for getting things figured out on the fly.
  • Backending with Github. Using revision management for text documents is very standard practice in much of my training. For the plain text lab materials, data and scripts, it doesn’t really make sense to do anything but. Backending with github also meant that every day, we would log into LONI, start an interactive session, and then do a git pull and get the day’s materials. No need to work out other storage bins or dissemination for materials. Each lab (example here) had its own directory in the course repo, and the learners would just copy that into their user directory.
  • Less is more. For both classes, I did fairly minimal slide decks. We spent a lot of time discussing in both classes. I’d estimate that each lab period for systematics, we spent about 1 of 3 hours just talking. Asking questions, answering them. And that where real learning occurs. I’m always clear on something: you learn as much from your peers, their questions, and their research, as you learn from your instructors. So we need to take the time to encourage that sort of interaction.
  • Plugging into existing structures. The last month of systematics was basically RevBayes. I work on and with RevBayes and maintain one of their tutorials. I hack on these tutorials, contribute to them, and use them. They’re precise, they work well, and are really amenable to whatever lecture stuff you want to add. It’s a better use of my time not to re-invent the wheel. And it’s good for the development team of RevBayes to be able to show classroom penetrance of the software and instructional materials. This also meant that I had community: I felt very comfortable in front of my systematics class, not just because I’m an expert in that topic, but because I had a support community to bounce things off of.
  • Great subject matter. I love phylogenetics. A lot. I like showing people the first steps of computation, and helping them not be afraid to become independent and try new things.
  • Great co-instructors. The bioinformatics component I taught was in a genomics course lead by Dr. Raul Diaz. His lectures were really excellent, and I learned a lot about genomics from him. And he’s a true biologist. I don’t find genomics and bioinformatics inherently interesting, but the biology is right in my wheelhouse, so having him ground the course so strongly in the actual messy stuff of biology was excellent.
  • Great students. This material is not easy. This was my first time being at the helm. This could have been much harder if our learners weren’t engaged, interested, and (crucially) willing to roll with the punches.

What didn’t:

  • Backending with GitHub:  I should have done several things:
    • Make sure I add more things to the git ignore file so learners don’t develop conflicts
    • Review Git and GitHub more actively throughout the semester. Particularly conflicts arise, and it’s hard to recall how to resolve them if you’re not using Git outside of class. And new situations arise – suddenly, you might have large files that you probably shouldn’t commit. So I need to do a mid-semester revisit of Shell (for file transfer) + Git.
    • Figure out how to edit my template better. WHY AM I SO BAD AT THIS.
  • Software installs. This was fine for systematics – everything is some combo of git clone –> cd –> ./configure –> make. Most software builds in 5 minutes. And that’s a vital skill for people who will be doing cutting-edge research! But bioinformatics software is chaos! Everyone has some wild install thing, and it takes super long. I’m inclined to do more requesting of things to be installed as modules, but like I said, installing software is a crucial skill. I’m at a loss here, and would love to hear from all of you.
  • Less is more: I was better at living this in systematics than bioinformatics. Next time, I’ll start from a more reduced set of materials and add in more project time. I think 6/10 times I felt frustrated this semester, I was frustrated because I was putting too much pressure on myself to cover too much stuff. The other four I was installing software.
  • Debugging. I need to revamp how I discuss debugging and error-handling. There was a real patchwork in how independent became. Some students were doing marvelous independent work, others were still struggling with figuring out how to deal with error messages. Kate Hertweck on Twitter posed a prompt about developing better materials for introducing googling and debugging. I think I have some funds to get people together for a short meeting on this; stay tuned. I think this will go a long way towards helping people be independent.
  • Learning from each other: Since homeworks were turned in via GitHub, they were public (feedback is not), and I should have reminded learners to read each other’s. Everyone comes from a different experiential and empirical background, and those contrasts are very informative.
  • Slides: I’m still really dependent on Keynote, and I’d like to break that habit. But it was just too much right now. Maybe next time.

Overall, this was a great semester. I did a lot of work I’m proud of in both these courses, and I have a solid foundation to improve on in the future. Because I’m TT faculty, I have many course iterations to make those improvements. That’s the ideal situation, and the one learners deserve: one in which instructors are engaged in material development and have job security to know that we will be here to make improvements for the future.

I really can’t say enough good things about the learners. I had a lot of proud moments when students emailed to ask about pieces of software we hadn’t used that they had compiled on their own, and needed some help to get started. I don’t know every piece of software, but I can read a paper, understand something, and give advice. And giving that help and advice is much easier when we have a collaborative relationship – the student has done some reading, has tried something, has maybe failed, and is prepared to ask pointed questions. Watching students go from being stuck in point-and-click interfaces to being able to read the latest research, say “Hey, that’s a good approach and I’m going to try it”, then compile software, move data around on a remote machine, and actually do it (maybe with a little help) remains one of the great joys in my job. I’m not always the most demonstrative with that joy, particularly being so pregnant and so tired, but it’s there, and I feel it, and it keeps me going.

I might update this when my student evals come in. Hopefully I can share some thoughts this week on setting up a computational research lab in a low-income state, where student access to personal computers and laptops is a challenge. Or I might go into for-real labor and never think about this blog post again. No way to know.






Getting Settled

The Paleantology project is getting settled. I’ve accepted a position at Southeastern Louisiana University. I’ve been meaning to write a little bit about academic job searching. Since I’m starting to prepare my lab documents, now seems like the time.

My search was a little different than other accounts we often hear in the Evolution, Ecology and Behavior space, in that I  always intended to end up at a primarily undergraduate institution (PUI) – a liberal arts school or small university without many graduate students.

By the time I leave for my job next month, I will have done a total of just shy of two years of postdoc. I started my postdoc with a six-week old toddler, who I am now watching eat yogurt at the coffee table. I will have to give up part of my NSF postdoc funding, but I found the job, and the place, and the school. I’m making the right choice.

Jeremy Yoder had a thread on Twitter this morning where he talked about the numbers behind his job search. I didn’t keep as detailed of notes as Jeremy did, but I probably applied to about 17 jobs, had phone interviews at 6, on-campus visits at 3, and received two offers. This reflects, very much, what Jeremy said about the number of applications sent being proportional to job openings. I’m an evolutionary biologist who uses computational methods to answer questions in statistical phylogenetics, particularly questions about how we incorporate fossil information into phylogenetic trees.

So I’m not applying to genomics jobs. I’m not applying to ecological physiology jobs. And my research might sound sort of esoteric. The number of jobs at primarily undergraduate institutions is already smaller than the number of R1 jobs (at least the years I applied). My interests put me in a smaller subset of those jobs.

On top of that, we have some tricky business. PUIs often write job ads that are a little broader than research schools, particularly big research schools. Big-name schools can rely on getting hundreds of applicants, even in specialized fields. Without the name recognition of big schools, PUIs may try to increase the depth of the applicant pool by making it wider. Many of the ads I applied to were something like “Faculty line in computational biology. May specialize in evolution, ecology, bioinformatics, genomics, ecoinformatics, cancer biology, neuroscience, or a combination thereof.” That’s a wide net being cast.

I applied to jobs for which I thought I would be a great fit, and never received a call. I applied to jobs where I thought I was a tangential fit, and had phone interviews. I had phone interviews that went awesome, only to see the seminar calendar get loaded up with genomics and neuroscience folks. And I get that – PUIs have smaller faculty rosters. Getting students into medical school is often a high priority. Having folks who can teach evolution, or bioinformatics, as well as a course in cancer genomics or neuroscience lets the department grow its course roster while increasing the number of courses about which the med school students will be excited. But I definitely did have moments, after everyone else in my house had gone to sleep, when I sat up wondering if there would ever be an opening for a theoretician who is passionate about working with students.

Next year would have been my last year of postdoc on my personal funding. I could probably scrounge some funds after. I would have hit the market harder, and maybe applied to some schools that weren’t PUI. I also would have applied to other postdoc fellowships, and jobs in industry. I have a skill set that would bring me success in industry. I have always known that card is in my back pocket. But I would have fought like hell to get a chance at the job I really want.

There is one other thing that I want to note about my PUI job search. Christie Bahlai wrote about her job search season, looking at the angle of challenges posed to people marginalized in science. And those challenges are real – having children so soon before moving for my postdoc, I was carrying some debt from medical and moving expenses that I was trying to pay down.

The money aspect can be pretty scary, and I’ve seen a smattering of comments to that effect from grad students and postdocs on Twitter. In my job search, with all three on-campus visits, I put less on my credit card than one conference, typically. You certainly can ask for a department to directly book flights, as opposed to reimbursing you. I did not have success with this, but I know others who have. Interview clothes don’t have to be the latest style. I bought everything I wore on clearance at Nordstrom Rack or TJ Maxx for under $100 dollars the first year. Due to post-baby weight loss, I had to buy some new things the second year – I spent about $15 on a plain, white button down to freshen up my skirt-blazer outfit. I bought a new pair of dark jeans, not for the interview, but I ended up wearing them on the interview.

Right, the baby. The first year I was on the market, I had to kick back a schedule for not having enough nursing breaks. It might be illegal for them to use info about my marital status or children in their hiring decisions, but they sure knew about it. On my second year on the market, I was never asked about children … because I volunteered it. The two schools I had on-campus visits at where in areas where I wasn’t familiar with the school systems and daycare. It’s info I needed. I will say that I was treated with nothing but respect when I asked about nursing accommodations, schools, etc. More respect than I’ve than been treated with at some workshops. I can’t promise that being honest about my kid didn’t hurt me (I don’t have that job where I had to push on the schedule, eh?). I can’t promise being honest wouldn’t hurt you. But I can say that I’m happy with my results. If being a mom cost me a job, that job probably would have eventually driven me out, anyway.

I read this piece out loud to my husband. He noted that I hadn’t mentioned him. And I didn’t know what he meant. And he said, “Well, the childcare.” On-campus PUI visits mean flying out one afternoon, being gone for a full day, and usually coming back the following afternoon. My husband works second shift, and had to rearrange his schedule on days I was gone so he could make the daycare drop-off and pick-up. He’s the best; I couldn’t do this without him. He did a lot of legwork to support me in this search. For us, this is a minor and annoying hardship. For someone without support, this can be a serious issue. If you’re reading this, and you’re on a search committee, do be aware that planning early early and being flexible with dates is really helpful with folks who are juggling these responsibilities.


The primarily undergraduate institution search is a little different than many of the job search we often hear about. I’ll follow up after the Evolution Meetings, where I will speak on Saturday (see event 8) about PUIs and why I chose that path, with a less personal essay and some nuts-and-bolts of these applications.


Edit0: Since I started writing, NSF announced they are ending the DDIG. The faculty position I have builds directly on the research I wrote my DDIG on. That award allowed me to be independent and establish my research program early. I am grateful for the opportunities afforded by that program, and I am saddened that others will not have this funding available to them.

Edit1: A couple of people have asked if I will publicly post job materials. My plan is not to do that, but if you want to see them, please do get in touch. I’m open to reconsidering this position, so if anyone thinks I’m wrong to not share, weigh in!

Edit2: A couple people have asked about bringing up nursing when scheduling an on-campus interview. I went with a simple:

I am presently still nursing an infant. I will need $X lactation breaks of $Y duration.

To push back on a schedule:

Thank you so much for sending this schedule. I have some concerns about the lactation breaks, which are scheduled for $X times. Would $Y alternative schedule be workable?

In my case, I just wanted a quick pump before my talk, so they slated me in an additional 20 minutes of prep time. No big deal.

End of the Semester!

This semester sure went by fast! Here, we have a couple wrap up statements from our two project undergraduates, Patrick Mendoza and Andre Flores. We were really lucky to have such great students working (project discussed here and here) with us this spring. In their words, here is what they learned, what more they hope to learn, and what is next for them.

Patrick Mendoza

One of the big takeaways from this semester will be the fact that biology is not as far removed from computing and statistics as I normally thought.  The biological sciences are an incredibly interdisciplinary branch of science that have many different areas and sub-disciplines.  I have learned that there are many ways to merge interests in math and computer science to complement biology, and particularly, phylogeny.

This past semester has been a great experience.  Working for April Wright and Walker Pett has been especially gratifying as that both are extremely knowledgeable and took time to distill complex concepts through exercises, lectures, and applied research.

I would like to spend more time understanding the root mathematical models and their derivation along with their correlation to applied phylogenetic analysis.  Not being a computer science major hasn’t been a hindrance, but it could help to learn more fundamentals.

This summer I will be attending an internship at the Boyce Thompson Institute in Ithaca as part of their Plant Genome Project.  Solanaceous crops have been genetically modified via CRISPR/Cas9 regarding leaf polarity.  I will be using in situ hybridizations and histology to detect modified genes in plant progeny.  I am looking forward to expanding my skillsets to include microscopic laboratory experiments.

Andre Flores

Although I generally felt proficient for my age in biological knowledge as I first the lab, I quickly realized I had considerably more advanced concepts to learn. The beginning was mostly a breeze as I refreshed my knowledge in phylogenetics, but concepts began to get harder afterward. Because I was taking a Genetics course concurrently, I found it easier to understand my lab work. In particular, it was much easier to grasp the concepts of DNA evolution. Since I had a better understanding of these molecular structures and their types of mutation, I was able to apply it to the models of evolution. The simplest of these was the Jukes Cantor substitution model, which was important to learn as both a basic model of evolution but also due to its importance in understanding statistical analysis.

When I first started to apply my programming skills as an analytical tool for biology, I had almost no clue how to summarize the data I was gathering. This started to change when I started learning about substitution models, as they explained mutation rates through methods in statistical probability. It was especially helpful when we worked through concepts and problems as a group while using the whiteboards, as it was much easier to grasp with relatively simpler material (like the JC model). Later on, I found it interesting to visualize the data using Tracer or the notebooks, especially as a visual learner. The same notebooks were also a great way to apply programming skills and try out new languages. Despite learning a considerable amount, I would still say I have much more to learn in statistical analysis before being confident in my abilities with it.

By far, the most trouble I had in the lab was trying to grasp understanding of the data. Although I learned how different models worked, I still had trouble understanding what to do with the data. The first time I ran scripts on Rev, I remember being basically clueless when I saw all the terms like “marginal” or “prior” followed by random numbers. While these were quickly explained to me, it was a lot to take in at once, so I had to do a lot of extra work refreshing my knowledge in stats before I could progress. Looking back, I think it would have been helpful for us to have spent more time learning about statistical terminology and their roles in affecting a data plot. Essentially, I would have liked to learn more about the stats/computing side of the lab. This would relate to a better understanding of my understanding on how to improve upon these models, but I luckily have the opportunity to continue learning these topics through the summer.

My summer plans are to continue learning under April and Will to hopefully prepare for presenting at SACNAS in October. I hope to learn a lot more about statistical analysis and be able to apply my knowledge to more datasets with better understanding. Being able to apply my new skills to more datasets is important because it is basically the most important point of what I hope to present over – providing data to support the efficiency of using Bayesian methods to study phylogeny. I plan to start immediately after finals week and be able to continue at least into July, from where I’m hoping to spend 4 weeks in Costa Rica at a conservation facility. I would spend half my time working hands-on with endangered species, then spend the other half in the jungle helping record data as a bioinformaticist. Afterward, I’ll likely pick right back up on studying molecular evolution, whether I continue the fall semester at Iowa State or not. Regardless, I would have a summer with a lot of valuable practice in the bioinformatics field.

Overall, I had a great experience learning alongside Pat in the Heath Lab. April and Will did an excellent job teaching us new material and were helpful in making sure we understood key concepts. Early on, I refreshed knowledge on basic biological processes that led to evolution, then transitioned into the computational methods used to analyze those biological processes. Although I still have a lot to learn, under April and Will’s guidance I hope to gather valuable experience continuing summer research as I build my biological skillset.