PUIScienceSlack

I had a little bit of a rough women in science day Thursday. I turned to Twitter for some support:

It turns out quite a few people, and not just people who identify as women, do want to chat about being faculty at PUIs. So I created PUIScienceSlack.

Edit March 12: Thanks to our new members for noticing our invite link had expired. I’ve regenerated an invite link that does not expire.

Who is it for: Anyone teaching science at a primarily undergraduate institution. I’ve left science broadly defined – social scientists and applied sciences welcome. If you would call yourself scientist, please, come aboard!

I’ve also left “PUI” slightly nebulously defined. Different organizations define this differently, often capping the ratio of graduate students to undergraduate students at something like 1:10. I don’t feel as strictly – some R2s straddle that boundary, and some departments at more research-intensive institutions don’t have well-supported graduate programs and mostly work with undergrads. If you feel like you belong in this group, you probably do.

Edit: I think it would be fine if trainees who want to have PUI jobs in the sciences joined, too! The types of discussions we have might really benefit you!

What is it for: Discussions of teaching and research in the sciences. To start, I made several public channels, #women (since that’s what prompted this), #grants, #general, #research, and #quantitativeskills. These are public – they can be viewed by anyone who clicks the link. There is also one private channel, #confidential, that members who want to discuss a sensitive issue can request to join. Direct messages are private between members. This is a free-tier plan, which means messages over the 10k line get purged in send order.

A note on “private”: private refers to the fact that you need a log-in to access this info. Like most digital communication, this information can be requisitioned in a court case.

Some of us are starting to work on CAREER proposals – perhaps this can be a jumping off point for writing groups. Many of us are solving the same challenges about curricula – let’s solve them together. At many institutions, there is a unique isolation. You might be the only person who works on any topic even remotely related to yours. But maybe we can all make some new friends! Or maybe this is all a big bust and goes no where. No big deal. Slack is pretty low-stakes;)

Other info: My lab’s code of conduct applies to this space. I consider that code of conduct expansive – if you are afraid to participate in the space because of prior harassment from another member, please get in touch.

Addition, 2/10/2020 on rules: We took two votes last week. Here are the results:

  • Real names: Unless you get in touch and have an exceptional circumstance, please use your real name and sign up with an email that can be verified as belonging to you
  • Confidential: Conversations on the Slack channel should not be shared outside the Slack channel without explicit permission from those involved.

Wright Lab @LBRN

This week, the Wright Lab is at LSU for the LBRN annual meeting. Here are the talks and posters for the lab:

My talk: https://wrightaprilmblog.files.wordpress.com/2020/01/lbrn.pdf
Christina’s Talk: https://wrightaprilmblog.files.wordpress.com/2020/01/lbrna-2020-pdf.pdf

Christina’s Poster: https://wrightaprilmblog.files.wordpress.com/2020/01/lbrna-2020-poster.pdf

Basanta and Courtney’s poster: https://wrightaprilmblog.files.wordpress.com/2020/01/poster.pdf

RevBayes & the universe

One of the interesting things about having a blog is that you can see what people are interested in, and when. This week, I can see a lot of traffic going to a couple of blog posts, Teaching Phylogenetics in the Cloud and Plan C. This is pretty common around the start of the semester – people are interested in trying new things in their teaching, and particularly using cloud technologies to improve access to compute resources.

I want to emphasize some new developments on this front. Jeremy Brown and I taught a workshop at the SSB2020 meetings on developing a hands-on classroom using RevBayes. Slides are here.

We covered three main things: Graphical models, and why they’re cool; how to build a graphical model; and some of the graphical interfaces that you might use to deliver RevBayes to your students. That last point is important – systematics isn’t a science isolation. To use phylogenetic methods, students need to understand statistics, they need to understand where their data come from, and what the biases in those data may be. Common tools like R and RStudio, or Python and the Jupyter notebook are often used for “data science.” Given the integrative nature of systematics as a discipline, doesn’t it make sense to make our tools interoperate more smoothly with a broader universe of tools for working with data?

We have some tools to help RevBayes play nicer with tools like RStudio and Jupyter. Michael Landis and Sebastian Frost developed a Jupyter Kernel for using Rev inside of Jupyter notebooks. Lately, David Bapst and I have been working on an RStudio interface. If you’re interested in any of these tools, please do see the install page for them, and give us feedback. A manuscript describing the classroom contexts for these tools is forthcoming; in the mean time, you might find interesting tidbits related in the paper described here.

If you were at the workshop, you ought to have received an email inviting you to take a survey on it, and inviting you to comment on the issue tracker for the RevKnitr repository. I’d like to extend that invite more widely – if you are teaching with RevBayes and would like to join the conversation with other educators, do feel free to open an issue on the RevKnitr issue tracker. It would be wonderful to have an active discussion on how to teach systematics expansively and inclusively.

Nantucket DevelopR 2019

This past week, we (Drs. Liam Revell, Klaus Schliep, Josef Uyeda, Claudia Solis-Lemus, and myself) hosted the Nantucket DevelopR Phylogenetics workshop again.

This is a really interesting course because it’s aimed at intermediate learners. Intermediate is slippery to define. I often think of it as the point where questions stop having clear answers – i.e., when you google for an answer, you don’t just get back “How to initialize a list.” You have to start thinking about optimization, or making code clean to read for other contributors.

Basically, an intermediate learner is someone who might not have a clear path forward. And at many universities, they might not have someone more advanced to go to for help. For intermediates, we don’t just need skills/information transfer, we need network-building.

So our goal with this workshop was a few things:

  • Get everyone a base of some basic intermediate skills: functional programming, efficient use of Git and GitHub, packaging R code, and phylogenetics in R. Materials here
  • Get motivated folks who might be on different ends of the “intermediate” spectrum together to work together productively on R phylogenetics packages
  • Create a diverse network of people who now all know each other and are connected by work on packages. Build a community of R-phylo developers

What worked

Bear in mind, these are personal reflections; we’ve not yet done our surveys.

  • Diverse leadership. Diverse teams are known to produce better results. And it’s known that diverse faculty can assist in establishing and maintaining diverse communities. I think that is reflected in the make-up of our course, which is more gender-balanced than the previous offering. We also took other steps, like a more verbose course advert, since literature suggests minoritized students don’t apply for things unless they meet more of the criteria than white male students. Making it easier to see how you fit means you see how you fit.
  • More coordination on the front end. Last time we offered this, Klaus and I were really unsure what the students would already know and need to know. This time, we had a little more blueprint, so we decided a few topics we would cover in advance.
  • Larger leadership team. Last time we did this, it was Klaus and I doing everything (+ my husband doing the cooking). This time, there were four of us with a more distributed knowledge base. This meant better lectures, and a wider array of things to accomplish.
  • Balance of work & lecture time. We only had four real days on the island. Two were mostly spent lecturing, two mostly working. The students got a lot done on the various projects.

What we could improve

  • More organization on the lead end. We had some last-minute upsets to infrastructure, which meant we did some last minute scrambling. This probably won’t happen next time, but we could do a bit more polling on interests for lecture topics, and organize food purchases somewhat better.
  • Scalability. This workshop was great, and there was far more interest than we could accommodate. Many great applicants we just couldn’t make room for. And for sustaining something, funders often want to see through-put. It would be great to keep the feel of everyone in one lodging, coding in the shared spaces, eating in the shared spaces. But we have little room to grow in the current location.
  • Something else will come up in evals, I’m sure.

In conclusion

What a wonderful week. I hope we can do it again. On a personal note, as much as I adore being PUI faculty, we do have fewer research active faculty. It was really nice to go somewhere and be in the company of other researchers and new PIs for a week. I feel very much refreshed going into the last month of classes.

Our next steps are to put together a post-workshop survey + check-in schedule to keep people motivated to finish projects.

The why, when, and how of computing in biology classrooms

Along with Rachel Schwartz, Catherine Newman, Jaime Oaks, and Sarah Flangan, I am an author on a preprint reviewing common technologies and teaching practices in teaching computation to biologists. In it, we review some of the technologies that educators might choose to use to deliver a course in computational biology. We also review the evidence for various strategies for teaching, including various ways to incorporate live coding and active learning into class.

I’m immensely proud of this paper for a few reasons. Weirdly, the inspiration for this paper came from Twitter. In the linked thread, we really saw that a lot of early career folks are struggling to keep up with the glut of educational technologies on the market. What is RStudio Server? When is that what I want, compared to a JupyterHub? Do I need to pay for hosting to teach computational biology?

This year’s iEvoBio theme was “Enabling the next generation of computational biologists.” So I decided (as the current head of iEvoBio) to put a little money into getting speakers to have this discussion at the meeting. Organically, this discussion became a meeting, and the meeting became a paper.

Something that I think is really cool about this manuscript is that the authors are from different types of institutions (R1s, PUIs) that attract different types of students. And so we decided to pay special mind to the challenges our real students have faced. What happens for students who can’t afford the latest and greatest laptop? Or who might go on deployment over the weekend and be without their personal computer? All of the challenges we discuss in this paper are real. The solutions we cite are solutions we use.

This manuscript is currently a preprint. If you see things that you think should change, you can make a difference! On the right hand side of the screen, you should see a link to post comments. We welcome your feedback! This is an F1000 preprint, and the reviews will be visible to readers as they become available, which is pretty cool.

Writing as training as writing is training

It’s been a little while since I blogged! I wanted to I wanted to highlight a new paper I authored called “A Systematist’s Guide to Estimating Bayesian Phylogenies From Morphological Data.”

This paper was a long time coming. It started as the forward to my dissertation, in fact! In the time since, one issue I’ve persistently come across is needing to onboard young systematists into research. I work at a primarily undergraduate institution, which means my students are, well, undergraduates. And to get them involved in research can be tough! In statistical phylogenetics, there’s no real equivalent of washing test tubes or feeding fish while you read papers and develop an independent project. Getting involved in our work means getting to work, right away. It’s like drinking from a firehose.

Computation is still not heavily incorporated in curricula basically anywhere. I have students take my computational biology course before starting in the lab, so that I don’t need to teach every student, personally, how to use Python or R.

But I do still need to work with students fairly intensively on systematics and mathematical modeling. This manuscript came from a need to have something accessible I could hand to each student, and say “Here, this is what we do in the Wright lab.” It’s a labor of love for the science. But it’s also a labor of love for me. Writing all this down in one place allowed me to reduce my training burden, and providing a solid overview of these methods allows the students to get a solid grounding on methods and be exposed to some of the literature.

I’ve already had multiple lab members tell me that the paper was clarifying for them to read. As I get older, as I train more students, that’s the only thing I really want to hear: that a paper helped them learn to be systematists, and helped them think through problems better. I hope it will for you, too.

Semester Wrap Up

Last semester, I taught computational biology for the first time at Southeastern (schedule, course materials). This is a little bit of a different ‘flavor’ of computational biology than a lot of the courses we see, since I’m not really a genomics person, but an evolutionary biologist, working in a department of mostly population (ecology, evolution, behavior) biologists. The audience was upper division undergrads and MS students, and one faculty member.

This semester, I decided to try something different than I have in the past, which is that I decided to forego installs at the start, and had them run everything in a JupyterHub. My blogpost on setting all that up is here. As I covered in that post, teaching undergraduates is different than graduate students. With graduate students, “I need to install this so I can analyze my data and get my MS/PhD” is a powerful motivator. They’re captive. My course is an elective. If the students feel super shitty and incapable after a day of installs, they can leave. And when they do, this is how they’ll feel:

i-m-feeling-very-poorly_custom-719c1a5e50bf268586daf9408ad8380100313b48-s800-c85
Ye Olde Darwin Chestnut: “But I am very poorly today and very stupid and hate everybody and everything” Image via NPR

Undergrads require a reframing of how we teach computation. The goal might not be that they have a laptop full of software ready to go, but that they learn something about computation, feel confident in those skills, and get to interact with some MS students and research active peers. So I didn’t start with installs. We did them at the end, for students who wanted to keep working in Python on their personal computers or the state HPC. This was very smooth.

I used a combination of Jupyter Notebooks and the Hub’s command line to teach. I’ve documented a lot of my thoughts on this choice here. Fundamentally, to me, the argument for notebooks boils down to this: Our competitor isn’t C++ or MatLab. It’s Excel. The retreat to the familiar. To get people working a little more reproducibly, and taking those first steps in computation, why take away all the nice interface bells and whistles they’re familiar with? Notebooks render well, they enable note taking, and data tables printed in a notebook look familiar.

Over the first month, we went through the Data Carpentry Python ecology materials. This by and large went great. I’m a maintainer on those materials, and using them in class lead to new pull requests from me, and has informed my own thinking on some of the issues and pull requests raised during the Spanish translation  of the lesson.

One feedback that I got was that the first part of the course is very fast. I think next time, we’ll do 6 weeks on the really basic Python stuff. I’ll also split the first assessment into two pieces – one on the basic slicing operations, and one on functions and scripts. I kind of thought 4 weeks would be enough time to cover material that’s supposed to be covered in a 6-hour workshop. Alas.

The rest of the course, we do some querying of data from the web (Open Tree of Life, BLAST), phylogenetic computing with Dendropy, project management, and Git & GitHub. We also talk about Louisiana-specific stuff, like using the state supercomputer.

The final assessment was building Python packages, and doing teach-ins. Everyone did really well in the parameters of the assignment. The idea was that they would implement a couple functions in a package, document them, and then teach their lab (for the MS students) or the class (for the undergraduates) how to use the package. I self-doubted a little too much and let them re-use functions from earlier in class. For some reason, I thought I hadn’t shown them enough to do something totally novel. But it’s pretty clear from conversations after the fact that I could have aimed higher with this. Next time around, I’m going to structure my assessments like so:

1. Indexing, slicing, filtering data (Python in Pandas)
2. Functions and scripts
3. Querying the web, visualization
4. Making a Python package and putting it on GitHub
5. Teach-In

I was really worried about overwhelming them too early with assessment, but paradoxically made those concerns worse by holding off on the first assessment until too late.

Overall, I’m really happy with how the course went, and student evaluations suggest the learners were, too. For a first pass, I’m immensely satisfied. My second round with this course next fall will probably involve coming up with more biological narrative for the package making and GitHub steps.

Get Involved!

If you’re interested in any of this: I’m working on a SciPy proposal right now with Jeet Sukumaran. One of the things we would like to do is develop some Carpentries-style materials focused more towards phylogenetic data science – querying and cleaning data from the web, assembling phylogenetic datasets, processing MCMC output, visualization. We’d love collaborators!

I’m one of the hosts of iEvoBio this year, and the afternoon session will be on teaching computation biology. We’ll get started with lightning talks – 10 minutes on what you’re teaching, to whom, how you’re doing it, and what’s working about content delivery. Then, we’ll have a birds of a feather session where we try to write some of that info down to demystify course content delivery tech for instructors. We’ll put out a call for lightning talks soon. Feel free to get in touch early if you’re really keen!

I’ve also been having some conversations with the state supercomputing complex about making JupyterHubs available to host courses for free on those servers. If you might be interested (and are at an LA institution!), please get in touch.