In This Section

Serving Up Reproducible, Shareable Research: Q&A With Will Struebing

Published on August 16, 2019 in Cornerstone Blog · Last updated 1 month ago


Subscribe to be notified of changes or updates to this page.

1 + 0 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.

The Arcus team at the Research Institute is solving a number of challenges at once: Decrease the time it takes for researchers to access data, increase the reproducibility of research, ensure data security, and speed up the rate of breakthroughs. Will Struebing loves that his role as supervisor of Scientific Computing for the Department of Biomedical and Health Informatics (DBHi) pulls him in many directions. In this fourth in a series of Cornerstone posts about the convergence of talent and expertise to build Arcus — an internal program that is providing findable, reusable, trustworthy research data — find out more about Struebing and how his DevOps team is enabling cloud-first development efforts.

As Supervisor of Scientific Computing, you have a unique skillset: part technical support for supercomputing, part data security specialist, and part consultant for researchers. Why do you enjoy wearing these many hats?

I get a ton of satisfaction knowing that I’ve helped someone solve a problem. So, I think the simplest answer is that it makes people happy. CHOP is a really great environment to grow your computing skills, if you’re willing to network, ask for help, and collaborate. It’s pretty hard to do those things without picking up a lot as you go along. It’s funny to hear it described as a unique skillset — we often joke about only knowing a little about a lot.

I cooked professionally before I got into DevOps, and the learning culture is very similar. You pick up a little from everyone with whom you interact. I was lucky to switch careers because, as a chef, I pushed others to ask questions without being afraid to look inexperienced. When I started a new career, I had to practice that, and it was hard. That being said, the Arcus team perfectly matches that culture and attitude, so I am pretty grateful to be able to wear different hats with such a cool group of people who are doing the same.

The construction of the Arcus project involves moving and organizing enormous scientific data. What are the best practices in high-performance computational science the Research Institute has in place to accomplish this efficiently?

Arcus is a collective of expertise, so we draw from the Library Science Team and our privacy and security analyst to help drive the technical capabilities of our systems. We have the automation technology and the skillset to deliver the required solutions that library sciences and privacy define.

For our team, it’s about ensuring that data is versioned, that we have audit logs in place, that we’re preemptively communicating access anomalies, that data can’t be shared or accessed outside of the organization. I’ll be the first to say that access control is extremely dry stuff; however, when you add in archivist techniques, librarian metadata, and privacy concerns, we cultivate best practices for data.

At the end of the day, this means that scientists and researchers can offload their organization, privacy, and security concerns to the Arcus experts. We’re in the business of helping the folks who can make meaning out of data focus on doing just that.

Tell us about the development of new services and process improvements that are under way as the Arcus project gathers momentum?

The most exciting thing that our team is working on are on-demand computational environments. We work with researchers and informaticists to determine what tools, packages, and resources they want to use. Those become use cases for creating reproducible computational environments that allow end users to quickly interact with the data that are available in Arcus. When they’re done, they can dispose of the environment, knowing that they can reproduce their environment whenever they want. The computational environments reduce the burden of getting spun up with a workspace and let researchers get to business more quickly.

For example, if you use R, which is a language and environment for statistical computing and graphics, and need to work with a massive data set for a large population, we provision Rstudio virtual machines that run on our secure cloud platform. We can seamlessly attach your data to a machine that scales elastically as your compute needs grow or shrink. Your results can be stored securely in Arcus in accordance with the NIH initiative to make scientific data Findable, Accessible, Interoperable, and Reusable. These systems help the Arcus digital archivists and metadata librarians preserve and protect your valuable work. In this way, we’re able to provide ease of use and speed up reproducible and shareable research.

In what ways do you work closely with other team members in the development and tuning of Arcus?

Our group is lucky to have our hands in a little bit of everything. In the past two months, we have helped Bill Flynn from the Arcus Data Repository automate database deployments, worked with Joy Payton’s education group to implement security controls for the Arcus Education Portal, assisted Byron Ruth’s Arcus Developers team to deploy containers to Kubernetes (run by the Research Infrastructure Team’s John Daniels), and worked with Juan Giarrizzo to get an operational support platform going. Juan is the only program manager I have ever met who can deploy his own Jira server.

If you put all of these together, you can see some of the capabilities of Arcus. The Education team is teaching workshops to grow the institution’s population of skilled data scientists. Those education materials are housed on the website and open for CHOP research teams who wants to learn about R or statistical methods.

Arcus Cohort Discovery is a user interface that the developers built to help researchers discover new sources of data. This means that researchers can explore clinical data and quickly understand if their research ideas are feasible. And, the operational support platform means that we can support end users as usage grows.

Abi Srinivasan and Patrick di Bussolo are the other two members of the Arcus Infrastructure team (besides myself). They’re constantly lending their time, skills, and energy to the people around them (sometimes I have to remind them they have their own work). It’s a pretty awesome team to be a part of.

Why are you passionate about helping scientists at the Research Institute navigate the computational resources that will make the Arcus project successful?

When people hear that you work at CHOP, they automatically assume that you’re saving children. The first year I worked here, I was a little disappointed with the reaction I would get when people would ask what I did. Cloud Automation Engineer doesn’t conjure the coolest mental imagery. Enabling researchers to work more efficiently isn’t exactly saving kids in the same sense that an ER doctor is, but it is working to save future kids. So, as Arcus takes off, I feel like I am part of that.

We also are blazing the trail for the Research Institute’s computational landscape. A lot of engineers tend to see anything that isn’t writing code as “extra work,” but I like meeting with people and talking about what it takes to push the envelope forward in a way that is safe for the organization. There aren’t a ton of hospitals that have Kubernetes, multiple cloud providers, and the skilled staff to drive those changes. Making those things a reality here brings me a lot of job satisfaction. If deploying computational environments lets researchers do research more easily, then navigating the landscape of enterprise IT lets developers develop more easily.

Fast-forward five years from now. What are some of the professional goals you anticipate accomplishing by being part of the Arcus project?

I’m very excited to see what the next few years bring in terms of growth since we’ve recently gone into general availability. When I say grow, I mean growing our variety of data, growing the population sizes of those data sets, growing our user count, and growing the types of computational environments that we offer. All of those things should combine to grow the amount of actionable results we glean from the research being done here.

Professionally, I am sure that I will continue to be pulled in many directions and I love that. I enjoy both technical and organizational challenges, so I am just keeping an open mind on what the future brings for Arcus and for me.

Working here is a dream come true. I used to make sure steaks had the right grill marks on them, and now I am ensuring the confidentiality, integrity, and availability of pediatric research data to CHOP researchers. What an amazing gift.

(This post is the fourth in a series exploring how Arcus team members are using their expertise to find innovative ways to expedite the scientific process and uncover novel research opportunities at CHOP. Read more about Jeff Pennington, Dianna Reuter, JD, and Spencer Lamm, MLIS.)