This paper was a long time coming. It started as the forward to my dissertation, in fact! In the time since, one issue I’ve persistently come across is needing to onboard young systematists into research. I work at a primarily undergraduate institution, which means my students are, well, undergraduates. And to get them involved in research can be tough! In statistical phylogenetics, there’s no real equivalent of washing test tubes or feeding fish while you read papers and develop an independent project. Getting involved in our work means getting to work, right away. It’s like drinking from a firehose.
Computation is still not heavily incorporated in curricula basically anywhere. I have students take my computational biology course before starting in the lab, so that I don’t need to teach every student, personally, how to use Python or R.
But I do still need to work with students fairly intensively on systematics and mathematical modeling. This manuscript came from a need to have something accessible I could hand to each student, and say “Here, this is what we do in the Wright lab.” It’s a labor of love for the science. But it’s also a labor of love for me. Writing all this down in one place allowed me to reduce my training burden, and providing a solid overview of these methods allows the students to get a solid grounding on methods and be exposed to some of the literature.
I’ve already had multiple lab members tell me that the paper was clarifying for them to read. As I get older, as I train more students, that’s the only thing I really want to hear: that a paper helped them learn to be systematists, and helped them think through problems better. I hope it will for you, too.
Last semester, I taught computational biology for the first time at Southeastern (schedule, course materials). This is a little bit of a different ‘flavor’ of computational biology than a lot of the courses we see, since I’m not really a genomics person, but an evolutionary biologist, working in a department of mostly population (ecology, evolution, behavior) biologists. The audience was upper division undergrads and MS students, and one faculty member.
This semester, I decided to try something different than I have in the past, which is that I decided to forego installs at the start, and had them run everything in a JupyterHub. My blogpost on setting all that up is here. As I covered in that post, teaching undergraduates is different than graduate students. With graduate students, “I need to install this so I can analyze my data and get my MS/PhD” is a powerful motivator. They’re captive. My course is an elective. If the students feel super shitty and incapable after a day of installs, they can leave. And when they do, this is how they’ll feel:
Undergrads require a reframing of how we teach computation. The goal might not be that they have a laptop full of software ready to go, but that they learn something about computation, feel confident in those skills, and get to interact with some MS students and research active peers. So I didn’t start with installs. We did them at the end, for students who wanted to keep working in Python on their personal computers or the state HPC. This was very smooth.
I used a combination of Jupyter Notebooks and the Hub’s command line to teach. I’ve documented a lot of my thoughts on this choice here. Fundamentally, to me, the argument for notebooks boils down to this: Our competitor isn’t C++ or MatLab. It’s Excel. The retreat to the familiar. To get people working a little more reproducibly, and taking those first steps in computation, why take away all the nice interface bells and whistles they’re familiar with? Notebooks render well, they enable note taking, and data tables printed in a notebook look familiar.
Over the first month, we went through the Data CarpentryPython ecology materials. This by and large went great. I’m a maintainer on those materials, and using them in class lead to new pull requests from me, and has informed my own thinking on some of the issues and pull requests raised during the Spanish translation of the lesson.
One feedback that I got was that the first part of the course is very fast. I think next time, we’ll do 6 weeks on the really basic Python stuff. I’ll also split the first assessment into two pieces – one on the basic slicing operations, and one on functions and scripts. I kind of thought 4 weeks would be enough time to cover material that’s supposed to be covered in a 6-hour workshop. Alas.
The rest of the course, we do some querying of data from the web (Open Tree of Life, BLAST), phylogenetic computing with Dendropy, project management, and Git & GitHub. We also talk about Louisiana-specific stuff, like using the state supercomputer.
The final assessment was building Python packages, and doing teach-ins. Everyone did really well in the parameters of the assignment. The idea was that they would implement a couple functions in a package, document them, and then teach their lab (for the MS students) or the class (for the undergraduates) how to use the package. I self-doubted a little too much and let them re-use functions from earlier in class. For some reason, I thought I hadn’t shown them enough to do something totally novel. But it’s pretty clear from conversations after the fact that I could have aimed higher with this. Next time around, I’m going to structure my assessments like so:
1. Indexing, slicing, filtering data (Python in Pandas)
2. Functions and scripts
3. Querying the web, visualization
4. Making a Python package and putting it on GitHub
I was really worried about overwhelming them too early with assessment, but paradoxically made those concerns worse by holding off on the first assessment until too late.
Overall, I’m really happy with how the course went, and student evaluations suggest the learners were, too. For a first pass, I’m immensely satisfied. My second round with this course next fall will probably involve coming up with more biological narrative for the package making and GitHub steps.
If you’re interested in any of this: I’m working on a SciPy proposal right now with Jeet Sukumaran. One of the things we would like to do is develop some Carpentries-style materials focused more towards phylogenetic data science – querying and cleaning data from the web, assembling phylogenetic datasets, processing MCMC output, visualization. We’d love collaborators!
I’m one of the hosts of iEvoBio this year, and the afternoon session will be on teaching computation biology. We’ll get started with lightning talks – 10 minutes on what you’re teaching, to whom, how you’re doing it, and what’s working about content delivery. Then, we’ll have a birds of a feather session where we try to write some of that info down to demystify course content delivery tech for instructors. We’ll put out a call for lightning talks soon. Feel free to get in touch early if you’re really keen!
I’ve also been having some conversations with the state supercomputing complex about making JupyterHubs available to host courses for free on those servers. If you might be interested (and are at an LA institution!), please get in touch.
A few weeks ago, I wrote about using JupyterHubs to make computational biology education more accessible to my students. I know teaching with Jupyter Notebooks has it’s detractors, but I’ve always noticed a difference when teaching with notebooks, as opposed to a text editor + the interpreter. The conversations in the classroom stay more focused on the material, rather than what they missed when the interpreter moved too fast, or when I switched from the script to the terminal. And, in fact, multiple students have told me similar – that they’ve felt lost in other courses, but not mine. I feel like there’s an education paper there that I don’t have the experience to write, but would happily collaborate on.
When we think about teaching computation, or phylogenetics, we often think of PhD students. Their questions look like this:
“I have data, can you please please please help me analyze it?”
“I have data and my advisor says I need a phylogeny like tomorrow, help???”
“I read about this technique, can you help me understand if it’s appropriate for my data?”
And so I’ve mostly switched over to a read-try-create model for teaching. Read an example, try to run the example and understand the output, create your own extension or apply the concept to novel data. I find this works better for early-stage MS students, and my undergraduate students. Their questions often look like this:
“I think I might find phylogeny interesting, can I try?”
“I’ve heard that computation is the wave of the future, but can I really do it?”
“I’m not sure this will be part of my career as a scientist, can you try it with me?”
Read-try-create in a notebook environment puts content over delivery.
I got to wondering if I could see similar shifts in phylogenetics if I adopted this framework for teaching phylogeny. I typically teach phylogeny with RevBayes. There are a few reasons for this – I’ve implemented things in RevBayes, I like the graphical model framework, the analyses that I want to do are implemented there. The tutorial materials are also wonderful. But RevBayes has a framework in which you specify almost all parts of a phylogenetic model, including some concepts that are quite an abstraction from empirical biology, like specifying MCMC moves. Learners get overwhelmed, fast, and switching back and forth between text editor and interpreter is a lot for many of them.
Finally, I created a link for students to click on to automatically sync their copy of the lessons with mine.
So far, so good! For the most part, the students are working through the lessons on their own, and it’s going so much better than last year when I was assigning them lessons, and having them work from the PDF at the Rev interpreter. Anecdotally, I feel like comprehension, and crucially retention, is higher.
But don’t they need to learn command line???
Yes. Yes, they do. But I think if they really understand both phylogeny and Rev before I turn them loose on our friendly local HPC, it will be better. It’s not that hard to run a script. We’ve practiced running some Rev scripts in the terminal of the JupyterHub. I really don’t doubt that we can take those skills are port them to an HPC. I’ll probably make my second-year undergraduates give the new members the five-cent tour of the HPC. But understanding what’s in the script … that’s hard.
I wanna see the RevHub!
RevBayes is too memory-intense to compile in a MyBinder, but if you want to play around, I can give you access to my RevHub. Just drop me a line.
Additionally, I am unhappy with my workflow for converting our Latex and Markdown tutorials to Notebooks. If you’d like to help, I’d love a buddy. A couple of us have been floating the idea of a SciPy meeting sprint to develop a set of notebooks for teaching phylogenetics in Python and Rev. Get in touch, if you’re interested? No, that doesn’t look right. Get in touch!!! Diversity in contributorship is our strength.
This semester, I have the pleasure of teaching a semester-long Computational Biology course for the first time at Southeastern. This is really exciting – I’m very excited to help a new generation of students learn to use computation to do their research. We have about 12 students – MS, undergraduates and even a faculty colleague.
It’s always a challenge to teach these sorts of courses. In terms of the material, computation is just different than the normal biology classes. It’s a different skill set, and many students have low awareness of scientific and technical computing.
But that’s not what I’m going to talk about today. Today, I’m going to talk about the technical aspects of a course like this. Computing courses aimed at graduate students typically go something like this:
Show up; say hello; get a cup of coffee.
Sit down and do installs.
Do additional installs later in the course, as needed.
This works OK. Graduate students are generally aware that they a) have research data and b) need to analyze it if they’re leaving with a PhD at some point. While it’s nice to make the installs not be horrifically painful, the audience is sort of captive.
Undergraduates aren’t. My class is an upper-division elective; they could do something else. If the installs are too torturous or their computer can’t run the software, they can leave. And that’s a particular issue for the students I serve. Southeastern is in Southeastern Louisiana, a historically low-income region of a low-income state. Most students work 20+ hours a week outside class and end up doing homework on a work computer that isn’t their personal machine. Some might not have reasonable computers. Some might be military or coast guard and might need to keep up with homework on a deployment weekend.
Enter JupyterHub. JupyterHub allows instructors to serve multiple instances of Jupyter notebooks for their classes. The servers can be then hooked up to a custom domain, so students can navigate in their browser to a course website, login, and start an interactive compute session without doing any installs. Need to do homework on a work computer? No problem! Your computer is 6 years old and you might need to finish the class out on a loaner laptop? No problem.
In particular, I used the Littlest JupyterHub, a JupyterHub variant for small classes. TLJH is meant to be used in a single-server set up. This works well for me for a couple of reasons:
What we’re doing is fairly simple – install a few Python packages for working with data. We’re not doing a ton of complicated installs, or working with multiple languages, or compiling a whole lot of finnicky FOSS software. I don’t have a real need for containers, either.
I have a newborn at home, and I’m very tired and I need everything explained like I’m five. TLJH docs are very good at this.
I’d like this course to work, and work smoothly, and work with easily-communicated technology so I can encourage other faculty to adopt the model and infrastructure.
I had originally intended to serve the course off of a the State of Louisiana supercomputer, but a major equipment breakage has been taking up all the staff’s time, so they couldn’t set up a JupyterHub to their security specifications for me. I ordered a small server to run the course … and it didn’t arrive in time to start class. TLJH has well-written instructions for deploying on Digital Ocean, a provider for customizeable cloud servers. The time from making the choice of server to having a working hub was about 5 minutes. I purchased a domain name, and linked the nameserver and server address and had a working web portal in another few minutes.
Once the hub was working, I logged into it, opened a terminal and followed these instructions to start installing packages.
Each week, I make my lectures in Jupyter Notebooks, make a homework notebook, and push them to github. [We’re in week three of the class and still working our way through Data Carpentry‘s python materials. As a maintainer for these materials, I am slightly concerned that we’re in instructional hour 12 and still on these lessons … lot of stuff in there. We do every exercise, though.] Then, I use the nbgitpuller utility to generate a link to my repo. This causes the materials in each student’s hub to sync to my personal repo when clicked. This way, I can serve materials based on my GitHub and use version control in the way I’m used to, without inundating novice students with git lingo right away. I put the link on the schedule for that day. The students arrive, click the link, sync, and we get started.
Overall, this is very easy, and I’m very happy. I have 12 students + me working through the dc-ecology-py materials on a 2 CPU, 4 GB memory system. That seems sufficient, and my Digital Ocean server can be resized on the fly in under 2 minutes if I decide it isn’t. So far, the course has cost like $2.50 to run.
I’m the Bioinformatics and Computational Biology Core organizer for the Southeastern campus, under the state’s INBRE funding. I’m hoping to discuss these experiences more at the INBRE retreat in two weeks, and hopefully drive forward adoption of these types of course set-ups. TLJH is clearly a very important tool in the kit for serving the students that we have.
Since RevBayes also now has a Jupyter Kernel, this also seems like a potentially exciting way to serve systematics and macroevolution classes. Stay tuned for more on this.
Edit: Got a good question on twitter:
Glad to hear the experience has been so positive! I was thinking about JupyterHub for my class, but opted for a Linux VM instead. Can students get the experience of navigating a *nix filesystem through a hub?
And the answer is yes! There is a terminal in JupyterHub, so you can still practice the command line, command line based revision management, and running python scripts at the command line. Below are screenshots of how you access it.
I started with the idea of navigating the file system from within Python last week, then did some shell navigation the following class period. I even had props!
Just kidding, this is really important and I’m happy to talk about it. I’m going to start by admitting that this month, work-life is really skewed to work. I’m due to have a baby in between 1 (my due date) and 7 (scheduled C-Section) days. So I’m working pretty hard to try and get those last few things done. Or close to done. In the previous few, it has been skewed to life, since I’m tired and need to go to bed quite early. In a not-hugely-pregnant person, this might mean getting up earlier. No such luck.
I feel like, on average, my work-life balance is reasonable. Typically, I get up about an hour earlier than my daughter, and work on correspondence triage, then writing and/or programming. Then, when my daughter gets up, I put her in the stroller and run her to school and take myself on a short run. I haven’t done that in about a month; it’ll resume in the fall. Get dressed, get in to the office. Pick Alice up at 5, my husband and I cook dinner and hang out with the kid. After bed time, I normally put in a couple hours doing fairly mindless tasks. Especially now, I’m very fried, so this time is often spent checking HPC jobs, email triage, posting reminders for classes, making todo lists and schedules, or downloading files to prepare for productive analyses in the morning. I try not to work while Alice is up, and succeed a majority of the time.
I sleep well, I eat well, I get lots of exercise. Here’s how I facilitate that in different parts of my life.
Hard deadlines. My homeworks become available one week before due; they cover information that has already been covered in class. I have 50% of my office hours in the first half of the week. I make all this very clear to the students. I think it is more than reasonable to expect that a student could look at the homework, review it, get in touch with me early. And I observe this – the vast majority of homeworks are turned in more than 24 hours early. Obviously, there is forethought that needs to happen here to get homework up a week in advance, but my homeworks largely come from my lecture outlines. So to plan the lectures is to plan the homeworks. I had very few students miss homework assignments, and in my undergraduate courses, spent mere moments negotiating late homework or arguing back for points. Key with explanations goes live the minute homework closes, so they can plan to meet me in office hours with questions.
Asking proactive questions about meetings. What are you confused about? What are you struggling with? That way, I can tell a student what to bring to the meeting, and what to do to prepare for it. I can also do my prep – a meeting with a student who is struggling with how to study is really different than a meeting with a student who wants to walk through some calculations in-depth. If we both prepare, we’re in and out quicker with fewer follow-ups.
Enjoying teaching. I really like teaching, so I don’t find the labor of preparing courses to be awful. Time-intensive, but if you hate teaching, you should go somewhere were this is not expected. On a really fundamental level, if you don’t enjoy the distribution of activities you do in a day, you will feel negatively about your job, and it will impact how you feel about your life, and how you interact with all parts of your life.
First time around: I had three new preps this year, and that’s just hard. It’s hard however you slice it. I feel quite good about where each of those courses stand, in terms of being able to make incremental improvements that are much less time-intensive the second time around. But that first time with any course is a real bugbear.
Choosing reviews wisely. I turn down a lot of reviews. I suggest reviewers in my stead, and feel no guilt about it. I also email editors frequently. I have my limitations – maybe the review will be late. Maybe I won’t be available for the re-review. If I’m not sure I should accept, I just ask.
Choosing service wisely. I’m on the council for the Society of Systematic Biologists. That’s my home society, and it’s important to me to take part. I’m also a maintainer for the Data Carpentry Python materials. I use that material regularly in my courses, so maintaining it, and having it maintained by others, inures to my benefit.
Saying no. I’ve applied for my first university committee. It’s one I feel strongly about, and feel that my research program would be benefitted by participating. Important work needs to be done, but not necessarily by you. I am slowly adding responsibilities. Life is long, and there will be time.
Research and Mentorship:
Tracking my habits for a couple weeks. Are there certain times of the day when I am fresher for certain tasks? I schedule my day aggressively, and protect my schedule. There is an assertiveness to doing that that is hard to develop when you’re so used to being in “pleaser mode” from the job search. I use a variety of tactics: turning off notifications on messaging apps, closing my door and changing my venue to find appropriate ways to get time alone when I need it.
Proactivity about meetings. Much like under teaching. What is the agenda? Is there reading, by either of us, that needs to be done before we meet? I got very assertive in my second semester: if I had asked for something to be read, and it clearly wasn’t, meeting over. We will reschedule, and I will use the remainder of the currently scheduled meeting however I see fit. Likewise, I will not give up time from the task after the meeting if you’re late. This is really important with undergrads, particularly, who don’t necessarily understand that when you’re just “on your computer”, you’re working. Especially as an evolutionary biologist who primarily uses computational tools to get at their questions.
Proactivity about meetings. I was not proactive enough about agendas for some of my undergraduate meetings. I could have moved some projects forward more if I had a little more prep time. Which brings me to …
Being new faculty. Everything just takes a long time. SO much longer than you think – like that Stephen King story, The Jaunt. It’s an eternity in here.
The first year is just hard, emotionally. You move somewhere new. That costs money. My spouse has to redo his professional certification because it doesn’t transfer. That costs money, and means he can’t bring in money. You need to make all new friends, when funds are tight (and I can’t drink beer). You don’t know where things are, you make a lot of choices from a distance without having full pro/con info. I think we have largely made good choices as a family, and we are on track to be where we want to be, but this stuff is hard, and that is inherently part of the process.
Just asking. I need to be more proactive about just asking. Purchasing? Someone knows what to do – ask before shopping so you don’t waste time when there’s an approved vendor for something. I’m getting better about it, but there’s that little voice inside that tells you not to be a bother. That voice is a jerk; strangle it.
The first year is just hard, but it would be impossible without my husband. He is the better parent, debatably the better cook (he cooks meat and I don’t – people get hung up on that point), and the absolute only person I could imagine doing this with. I don’t want to get too mushy here, but the good company of a true partner is an inestimable boon in this whole process.
Getting regular exercise. The 5 weeks since I stopped running (due more to poorly-managed allergies than pregnancy) have been not amazing. It turns out I feel better and more focused by a long shot when I’m getting a morning run. And by month 7, even in the heat, I was running a sub-7 minute mile. When you need a win, hit the gym! I worked exercise into my day – I run my kid to daycare. Because I can take a different route, it’s actually about as quick as driving her would be.
Cooking, at home. It’s hard to summon the energy to cook after a long day. But it’s time with my spouse and kid. Kids love to cook, and we all have to eat, so put ’em to work! I have never really eaten fast food extensively, but I think I would not feel as good, physically or mentally, if I were eating … cheese curds? what even is vegetarian fast food? daily. Walking to daycare, getting my daughter, and going home and making a meal as a family is a good point of disconnection from the day. It’s too hard to get work done with the toddler up, and trying to do it just makes everyone upset, so there needs to be a nice, clean cut at the end of the work day until she’s in bed.
Also, that first year, it’s hard to get out for lunch because (say it with me) everything takes so long as new faculty. So you need to pack some leftovers and put some snacks in your office. That’s not just my pregnant belly talking.
I’m still hugely pregnant, so I can’t do a what worked/what didn’t on this. Here are some disorganized thoughts:
Continuing to exercise was a good move. I didn’t do this well my first pregnancy, and I felt better and had better energy this pregnancy.
Listening to my body. There is an inflection point, every night, at 9:15 where my body tells me “No serious work can happen after this.” And at this point, I wrap up what I am doing and do the last mindless tasks I need to prep for the new day. Then it’s tea and reading in bed.
Scheduling courses that can be taught seated.
I’m not traveling without the baby while nursing this time. Last time around, I had a fairly seriously upsetting experience while traveling that lead to me having to quite nursing 10 months early. It was shocking, and traumatizing, and I still struggle with a lot of negative emotions (anxiety, grief) about it … and I’m not someone who has a lot of negative emotions, so that’s really confusing. This time, I’m taking him with. It turns out that if someone wants you to be faculty at their thing, they can probably help you on the cost of a daytime nanny. It’s worth asking. I don’t think we win this one by contorting our bodies to be small and unobtrusive to a labor system that wasn’t invented for them.
Being honest about leave. What will happen? How will I monitor students’ research progress? When will I be reachable? I talk about diversity issues with my students pretty often, and I think it’s important for them to see this aspect of that, too. Undergraduates are trying out the identity of scientist as much as they are trying out science, so they should appreciate that “scientist” is not separate from “wife” or “mother” or “runner” or “avid reader” or “kitchen hermit”. You get to be any, or all, or completely different things, as well as “scientist.”
I’ll probably revisit this post in late summer when the baby is here and I’ve actually been working as a mom of a toddler and an infant.
Edit: One thing I forgot. Lower your standards for your house when you have kids. My standard is this: Imagine my husband and I get in a car wreck. Someone has to come take care of the kids at the house for a bit. They must find: enough food in the fridge and non-perishables that they can take care of them. Enough clean laundry that they don’t immediately have to start washing up. The kitchen and bathrooms clean and organized enough that they can find things to cook a meal, do bath time, etc, safely and efficiently. As long as those standards are met, fine. Anything else is lagniappe, and we treat it as bonus rounds.
When I started telling people I was starting a faculty position, one of the things I didn’t expect was how many people would chuckle and say “Oh, I remember that. I didn’t get anything done.”
Everyone who said that to me wins. I got virtually nothing done. Even a paper that is fairly close to completion is still limping on in some last simulations (really is almost there, though).
So what worked and what didn’t, year one?
I mentioned in yesterday’s post that Southeastern is a primarily-undergraduate institution. That means most of my research involves undergraduates. Louisiana is also a low-income state. Undergraduates don’t necessarily have access to the latest and greatest personal Mac laptop to do compile cutting edge software and run it at all hours of the night. This is a serious equity issue – if you try to do research and are stymied by your equipment, are you going to continue? Particularly when we’re talking about research that is really different than your training, as computation tends to be for biologists, this is a potential huge bottleneck. But access to compute power is important to my research, so here’s what worked for solving this:
Keeping compute-intense tasks off of student laptops. Anything that runs a long time needs to be run either on a high-performance cluster computer or a lab server. I really do think interacting with remote machines is an important skill, and Louisiana has very nice compute infrastructure. I opted to solve this by applying right away for allocations on those resources, and this was very smooth. The resources are free, and they are eager to develop better relationships at the PUIs, so support is top-notch.
The students who are staying on for the summer have Linux laptops. I bought cheap (~$270 on sale) machines (4 core, 8 GB mem, 1 TB conventional hard drive) machines a couple weeks ago. One night, after my daughter went to bed, I sat down, made an Ubuntu boot disk, and installed Linux on all the machines inside of an hour. Then, I installed a minimal set of scientific programming tools (Python via Anaconda, Dendropy) and software (RevBayes, Tracer, FigTree), and a revision management system (git). Because we keep the heavy computations off the personal computers, the quality of the computer isn’t strongly important. I also got one for myself so I can eat my own dogfood and be familiar with the operating system I’m asking them to use.
Here’s what didn’t work as well:
Not getting standardized laptops into their hands sooner. With graduate students, they have a little more resilience to things like minor interface differences, and they have a little more experience to try to solve errors on their own. With undergraduates, it’s more important to have that standardized environment. Confusion about absolute paths and copying data to and from a remote server, continued to be issues throughout the semester, particularly when students are collaborating. Standardizing the environment means everyone sees the same things when they fire up a terminal. When I set up the laptops, I set up a project bin that they’ll keep all the code, data and output in, so there’s not confusion when they look over and see their buddy working in /home/user/Documents/sciencestuff/myproject, but they’re in /home/user/projects.
As those of you reading are probably aware, I’m at a PUI, so the emphasis on funding is different than it would be at an R1. But I still do need money to spend on equipment and, more importantly, labor.
Focusing on the state-level. This was a suggestion from my chair – focus on funds with smaller applicant pools. They’re smaller dollar amounts, but that’s fine – I’m mostly paying undergrads. These types of funds are less competitive, but also allow you to build up a track record of getting lab funds, and doing stuff with them. I got some funds this way, and they’re more-or-less exactly the money that I need to spend over the term they’re for. I’m probably going to write about maternity leave and work-life balance in the first year later this week, and I think a good grant strategy was a key piece of this.
Asking for guidance. Doing everything is not sustainable. Talk to your chair and your dean about where to focus. If you have questions, just ask them.
With planning my move, and the summer conference season last year, I sort of dropped the ball on seeing the deadlines for some of those state-level funds for particular purposes (like undergraduate fellowships or course development). This year, I’m more proactive. Two of my students have developed plans for applying for state-level funds for undergrads in the fall, and I’ll apply for some course-based research initiative funding in late summer, as well.
This is the hardest part because it’s not just a science problem, it’s a people problem. It’s a social problem when all you might want to do is shut your office door, put on your headphones and code. But your mentees depend on you.
Scheduled standing meetings. My undergraduates had one meeting a week with me, and one lab meeting. I’ve now had three undergraduates continue with me for a full year. The first semester is run like a class: you learn some computation, and some Bayesian stats for phylogenetics, then move on to specific software and analyses (RevBayes, in this case, for phylogeny and divergence time estimation). Second semester, we really dig into playing with the data.
Pair work. Having students start in pairs and work together is much, much better than having them not. I suspected this, but allowed some students to work alone, and this was a bad move.
I need to implement more structure next semester. Here’s what I’ll be doing:
I’m teaching a computational biology course, that will focus on data management and good practices for reproducible science. I will teach this each fall, and students who want to work with me will register for this course. They will also register for a small number of research hours to start learning about phylogenetics.
Only pairs. Seriously, working in pairs is the best.
Office hours. Per the Carnegie classifications, each credit hour can be assigned 3-4 out-of-class preparation hours. I am going to insist that 50% of those hours be worked in the lab, during the work week. It’s not sustainable for students to be lumping research in with “homework time” in the evening or weekends. As an undergrad, they often look at a Saturday afternoon and say “Oh wow, I can do my 6 hours of research work in one long, focused session!” And then they go to pull the updated scripts from GitHub, hit a conflict, and tank the whole afternoon because they can’t get a hold of me on Slack to help them solve it. I’m not going to sit there and watch them those 50% of hours, but getting work sorted out while I’m just down the hall or active on Slack will be helpful.
More goal orientation. Undergraduates can be tricky, because you don’t know they’ll stick around. But putting conferences on the calendar with research benchmarks won’t hurt anything.
Blocking off time. I make a to-do list every night. I block off time for specific tasks. Something I did was, for a few weeks, tracked how I felt during the day. What tasks do I want to do, and when? That way, I can schedule time for programming when I know I typically have the brainspace to do that.
Using space effectively. I close my door more often now. It’s a good move.
Saying no. There has been so much ink spilled on saying no that I can’t possibly add to it. You can even think something is important, and needs to get done, and say no to being the one to do it. For real!
Being new faculty. It turns out this is just hard. Everything takes longer because you are learning it. Tasks that, now that I know how they work, will only take a few moments took me forever. Course prep took forever. Fighting Moodle took forever. This is a process, and it will get easier.
Not grouping meetings effectively enough. Even though my teaching schedule was very clustered this semester, I didn’t get that many “deep thinking” blocks because I didn’t group meetings effectively. I feel much more productive this semester, because I was able to guard my time better. I’m getting very close to finishing a paper I’m passionate about and have been working on for a long time. But those thinking blocks are important, and I need to really look carefully at ensuring those blocks occur. Especially since I don’t have PhD students and postdocs, and handle more of the research stuff personally rather than delegating.
Better workflow for results documentation. This summer, I’m piloting a more checklist-based model for research tasks for undergraduates, with results deposition. That way, even if we can’t meet to discuss a result in short order, I can take a look and be prepared for the next meeting with a new set of tasks.
Even though I feel like I didn’t accomplish what I wanted before the baby comes (which is any day now), I do feel like I accomplished a lot. In particular, learning the institution and developing mentoring pipelines that will allow me to effectively leverage undergraduate enthusiasm into research. This is a process that looks different at PUIs than R1s, since the structure of delegation is really different and the labor pool is different. I can’t necessarily say “What would $R1_MENTOR do.” While I went to a PUI for undergrad, I didn’t have research-active role models there, so the learning curve has been steep. But with all I’ve learned, and excellent departmental support and colleagues, I feel very confident about the road ahead.
Been a little while since I blogged. Well, being first year faculty is hard.
In this blogpost, I’m going to tackle what worked and what didn’t about courses I offered. Particularly, I’m going to focus on Spring, since that’s when I added courses to the books (Systematics lab and a bioinformatics component of a genomics course).
A brief note about the set-up of our institution. I’m at a primarily undergraduate institution (PUI), which means I’m heavily incentivized towards course development. Course sizes are pretty small at a school like this, and so I can get away with a lot of personal help and personal feedback to students. Some of the materials (particularly the RevBayes tutorials in the Systematics lab) have been used in classes of >50, but beyond that, scalability is not something I have thought much about.
Integration of good computational principles with the course content. Both courses started out going through the Carpentries UNIX shell and Git lessons. All the software we used this semester was command-line executeable, and that is very much the norm in biological computation. It’s fine to not be a shell expert and do all this stuff, but things are much smoother if you do have some expertise in navigating directories and managing data.
Superb HPC support. Public institutions in Louisiana have access to LONI, a high-performance computer cluster run out of LSU in Baton Rouge. The facilities offered are varied and met our needs well, from having availability of interactive single nodes for live-typing demos to big mem nodes for running assemblies overnight. They’re really eager to see adoption of HPC at the state PUIs, and so we got really excellent customer support for getting things figured out on the fly.
Backending with Github. Using revision management for text documents is very standard practice in much of my training. For the plain text lab materials, data and scripts, it doesn’t really make sense to do anything but. Backending with github also meant that every day, we would log into LONI, start an interactive session, and then do a git pull and get the day’s materials. No need to work out other storage bins or dissemination for materials. Each lab (example here) had its own directory in the course repo, and the learners would just copy that into their user directory.
Less is more. For both classes, I did fairly minimal slide decks. We spent a lot of time discussing in both classes. I’d estimate that each lab period for systematics, we spent about 1 of 3 hours just talking. Asking questions, answering them. And that where real learning occurs. I’m always clear on something: you learn as much from your peers, their questions, and their research, as you learn from your instructors. So we need to take the time to encourage that sort of interaction.
Plugging into existing structures. The last month of systematics was basically RevBayes. I work on and with RevBayes and maintain one of their tutorials. I hack on these tutorials, contribute to them, and use them. They’re precise, they work well, and are really amenable to whatever lecture stuff you want to add. It’s a better use of my time not to re-invent the wheel. And it’s good for the development team of RevBayes to be able to show classroom penetrance of the software and instructional materials. This also meant that I had community: I felt very comfortable in front of my systematics class, not just because I’m an expert in that topic, but because I had a support community to bounce things off of.
Great subject matter. I love phylogenetics. A lot. I like showing people the first steps of computation, and helping them not be afraid to become independent and try new things.
Great co-instructors. The bioinformatics component I taught was in a genomics course lead by Dr. Raul Diaz. His lectures were really excellent, and I learned a lot about genomics from him. And he’s a true biologist. I don’t find genomics and bioinformatics inherently interesting, but the biology is right in my wheelhouse, so having him ground the course so strongly in the actual messy stuff of biology was excellent.
Great students. This material is not easy. This was my first time being at the helm. This could have been much harder if our learners weren’t engaged, interested, and (crucially) willing to roll with the punches.
Backending with GitHub: I should have done several things:
Make sure I add more things to the git ignore file so learners don’t develop conflicts
Review Git and GitHub more actively throughout the semester. Particularly conflicts arise, and it’s hard to recall how to resolve them if you’re not using Git outside of class. And new situations arise – suddenly, you might have large files that you probably shouldn’t commit. So I need to do a mid-semester revisit of Shell (for file transfer) + Git.
Figure out how to edit my template better. WHY AM I SO BAD AT THIS.
Software installs. This was fine for systematics – everything is some combo of git clone –> cd –> ./configure –> make. Most software builds in 5 minutes. And that’s a vital skill for people who will be doing cutting-edge research! But bioinformatics software is chaos! Everyone has some wild install thing, and it takes super long. I’m inclined to do more requesting of things to be installed as modules, but like I said, installing software is a crucial skill. I’m at a loss here, and would love to hear from all of you.
Less is more: I was better at living this in systematics than bioinformatics. Next time, I’ll start from a more reduced set of materials and add in more project time. I think 6/10 times I felt frustrated this semester, I was frustrated because I was putting too much pressure on myself to cover too much stuff. The other four I was installing software.
Debugging. I need to revamp how I discuss debugging and error-handling. There was a real patchwork in how independent became. Some students were doing marvelous independent work, others were still struggling with figuring out how to deal with error messages. Kate Hertweck on Twitter posed a prompt about developing better materials for introducing googling and debugging. I think I have some funds to get people together for a short meeting on this; stay tuned. I think this will go a long way towards helping people be independent.
Learning from each other: Since homeworks were turned in via GitHub, they were public (feedback is not), and I should have reminded learners to read each other’s. Everyone comes from a different experiential and empirical background, and those contrasts are very informative.
Slides: I’m still really dependent on Keynote, and I’d like to break that habit. But it was just too much right now. Maybe next time.
Overall, this was a great semester. I did a lot of work I’m proud of in both these courses, and I have a solid foundation to improve on in the future. Because I’m TT faculty, I have many course iterations to make those improvements. That’s the ideal situation, and the one learners deserve: one in which instructors are engaged in material development and have job security to know that we will be here to make improvements for the future.
I really can’t say enough good things about the learners. I had a lot of proud moments when students emailed to ask about pieces of software we hadn’t used that they had compiled on their own, and needed some help to get started. I don’t know every piece of software, but I can read a paper, understand something, and give advice. And giving that help and advice is much easier when we have a collaborative relationship – the student has done some reading, has tried something, has maybe failed, and is prepared to ask pointed questions. Watching students go from being stuck in point-and-click interfaces to being able to read the latest research, say “Hey, that’s a good approach and I’m going to try it”, then compile software, move data around on a remote machine, and actually do it (maybe with a little help) remains one of the great joys in my job. I’m not always the most demonstrative with that joy, particularly being so pregnant and so tired, but it’s there, and I feel it, and it keeps me going.
I might update this when my student evals come in. Hopefully I can share some thoughts this week on setting up a computational research lab in a low-income state, where student access to personal computers and laptops is a challenge. Or I might go into for-real labor and never think about this blog post again. No way to know.