Women in Data Science Week
April 15, 2024— April 22, 2024
Register now!
Register now to block the dates in your calendar, and receive updates as soon as they become available.
You can sign up for participating in the datathon, (lunch) lectures, talks, and data science drinks separately using the links below.
What is the Women in Data Science Week?
The goal of the data science week is to introduce interested students and staff to data science in a fun and cooperative way, and help create a community of data scientists at the University of Twente, the faculty of Behavioural and Management Sciences, and beyond. BDSi organizes various events during the week, including a datathon, lectures, workshops, and a networking drink.
The spring 2024 women in data science week is independently organized by BDSi and our partners to be part of Women in Data Science Worldwide’s (WiDS) mission to increase participation of women in data science and to feature outstanding women doing outstanding work. The datathon will use a curated dataset provided by WiDS, and women in data science will be in focus during the entire week.
Of course, the event isn’t limited to only women. People of all shapes and sizes are welcome to join - though we encourage you to bring your female friends and colleagues along!
Datathon
A datathon is an event in which teams collaborate and compete to create a solution to a shared problem. By learning from experts and peers and immediately applying your skills on a relevant and engaging real-world dataset, the BDSi datathons provide a great environment for both students and staff, beginners and experts to further hone their skills. For the spring 2024 edition, we will join the Women in Data Science worldwide initiative, and compete together in their datathon.
Women in Data Science datathon
The WiDS datathon is an opportunity for women worldwide to discover and hone their data science skills while solving an interesting and critical social impact challenge. WiDS provides a supportive environment for women to connect, share interests, learn from, and help each other… and have a lot of fun! First launched in 2019, this annual event drew over 4,000 participants from 100 countries in 2023. Our plan for WiDS 2024 Datathon is to provide more flexibility and attract a broader community of data scientists.
WiDS datathons are based on well-curated, real-world datasets that are not readily available in the public domain. Each year, the datathon tackles interesting, relevant, and critical, topical questions. Participating in a WiDS Datathon is a fantastic opportunity for students to gain experience and see an application to a real and critical challenge.
2024 Challenge Theme: Equity in Healthcare
We (WiDS) are thrilled to partner with Gilead Sciences and Health Verity to provide a set of datathon challenges utilizing a real-world oncology dataset which contains information about demographics, diagnosis and treatment options, and insurance provided about patients who were diagnosed with breast cancer from 2015-2018.
Speakers
Stay tuned for updates!
We’re still coordinating with more inspiring women to come and speak during the Women in Data Science Week.
Lectures & Practicals
Every lunch break (12:45 - 13:30, Tuesday - Friday) expert data scientists from BDSi and our partners will provide a lecture on the most important tools in a data scientists’ toolbox; data wrangling, modelling, and communicating results. These lectures will be structured to support the datathon materials, but can be attended without participating in the datathon itself.
After a short coffee break (13:30 - 13:45), the lecture will be followed by a hands-on practical session (13:45 - 15:30). During these two hours, the lecturer - supported by a team of motivated coaches - will support participants in applying the lecture materials to their datathon submissions. While these sessions are meant to accompany the days’ lecture, they can be attended by any datathon participants. Coaches will be on hand to answer any questions about the days’ lecture, the datathon, or data science in general.
Drinks
On thursday afternoon, we invite all data science week participants as well as anyone interested in data science at the University of Twente to join us for networking drinks. This is a great opportunity to mingle with the other teams, and create lasting connections with peers and data science experts!
Competition
The best solutions to the datathon challenge will compete on global, regional, and local leaderboards. The global and regional leaderboards are maintained by WiDS, you can sign up for the datathon on their website to join. The local leaderboard includes just those teams participating in the Women in Data Science week at the University of Twente. Compete against your peers for the coveted BDSi Trophy!
Who can join?
Staff, students, family, and friends
Everyone related to the University of Twente and their friends and family can join. You can join with friends, colleagues or even family. The event is open to both novices and experts, and everyone in between. You can join the datathon as a team, alone, or skip it altogether and only participate in the workshops. If you do join alone, you can choose to be assigned to a team with other data science enthusiasts.
Men, women, and everyone in between is free to join - but only teams with 50% or more women are able to compete in the WiDS regional and global leaderboards.
A tutor giving advice to two participants at a previous Women in Data Science event.
Women and men
Both women and men are free to join. In order to compete on the regional or international WiDS datathon leaderboards, teams must be at least 50% women. This restriction does not apply to joining any the data science week events. You’re welcome to join a lecture, practical, or the drinks on your own (regardless of whether you identify as male, female, or otherwise) or with your male friends and colleagues, you just won’t be able to compete on the WiDS leaderboards.
Some experience with R or Python
Some programming knowledge is required!
You'll need to have a basic idea of either R or Python in order to follow along with the lectures and practicals. Materials will be prepared for R by BDSi and WiDS, and for Python by WiDS.
While we will do our best to introduce data science topics in the various workshops without relying on code, a basic understanding of R and/or Python will make it much easier to follow along.
If you have some experience with other programming languages, you should be able to follow along with a little preparation. More information on installing and using R can be found in the What can I do to prepare? section.
If you're new to programming in general or would like a deeper understanding of R, and would rather learn from one of our colleagues, the Cognition, Data and Education (CoDE) section provides courses and materials aimed at teaching staff and Johannes Steinrücke teaches half-day introduction to R and data visualization in R workshops for PhD's (and EngD's).
If you’re confident you can participate in the datathon in another programming language, you’re more than welcome to do so (we challenge you to try in C, Fortran, Brainf***, or JavaScript). Just be aware that we probably can’t offer support if or when you get stuck.
What can I do to prepare?
Get a team
First off, get a team together. The datathon is meant to be a collaborative experience where you work alongside a variety of expertises. In order to compete on the regional and international leaderboards, your team should be at least 50% women.
Create a Kaggle.com account
The WiDS datathons are hosted on Kaggle.com. Kaggle.com is a platform hosting various competitions, datasets, courses, and other data science and machine learning related content. It boasts building “skills in our competitions, co-hosted by world-class research organizations & companies”, “learn cutting edge ML techniques and what worked and didn’t from the top Kaggle competitors”, and a diverse community of “16 million data scientists, ML engineers & enthusiasts from around the world”.
Set up your coding environment
If you’re new to data science, you’ll want to set up a working environment. We recommend working in R or Python, depending on your experience.
Install R and RStudio, and prepare a working environment - Our colleague Johannes Steinrücke has written a good guide on how to set up R and RStudio for your projects, including some practical advice not covered in many other sources. The guide was written for students starting with coursework with R, but is equally applicable for other data science projects.
Install Python - The Women in Data Science team maintains a set of tutorials on installing Python (using Anaconda to manage packages and environments), Jupyter notebooks and the basics of Python data structures: https://github.com/keikokamei/WiDS_Datathon_Tutorials.
Further reading
If you’re looking for more information, a competitive edge, or just a good way to spend some time, we can recommend some more reading materials:
Sharada Kalanidhi has written an excellent deep dive into the 2023 WiDS datathon, including links to further resources for both R and Python: https://www.widsworldwide.org/get-inspired/blog/a-data-scientists-deep-dive-into-the-wids-datathon/.
An Introduction to Statistical Learning is a free to download book providing an excellent introduction to practical machine learning using both R and Python.
Kaggle.com provides resources to get started with Kaggle, as well as a long list of competitions that are approachable for beginners - with code and discussions available from hundreds of other participants. Trying your hand at a competition or two is a good way to spend a rainy weekend.
R for Data Science is a free online book compiled by Hadley Wickham and a long list of community contributors, covering the whole gamut of modern data science in R. It is well worth a look, and a good reference even for experienced data scientists.
Schedule
Stay tuned for updates!
There may still be some minor tweaks to the schedule as we coordinate with external lecturers and speakers.
The Women in Data Science week starts Monday the 15th and ends Monday the 22nd of April. The data science week will start and end with a group session on Monday the 15th and 22nd of April, respectively. Lunch sessions and practicals will be organized on Tuesday 16th through Friday 19th. The deadline for submissions for the local leaderboard is Sunday at midnight, and we’ll ask the team(s) with the best and most interesting submissions to present their work on Monday the 22nd.
Monday
April 15th
Opening and kickoff
12:45 - 13:00 - Location: Citadel T300
Lunch talk: Breast cancer epidemiology and the clinical use of prediction models
13:00 - 13:45 - Location: Citadel T300
Marissa will talk about breast cancer, its risk factors, incidence, survival and the disease trajectory. She will show examples of prediction models that are used in breast cancer care, and discuss their relevance for clinical practice.
Hands-on session
13:45 - 15:30 - Location: Citadel T300
Getting started: introduction to the datathon, finding a team, using Kaggle, installing python/R, setting up an environment.
Tuesday
April 16th
Lunch Lecture: Data Wrangling 101
12:45 - 13:30 - Location: Citadel T300
Exploring a dataset: where to start, finding patterns, visualizing for clarity, creating informative features.
Hands-on session: Data Wrangling 101
13:45 - 15:30 - Location: Citadel T300
Hands on: getting an overview, inspecting descriptives, visualizing distributions and relations, cleaing up and reshaping data, creating new features.
Network Analysis Community
16:00 - 17:00 - Location: TBA
The Network Analysis community is one of several peer communities that brings together researchers across disciplines using similar methods, if not topics.
In this meeting, Doina will present her work on complex networked systems. Many networks are hiding in plain sight: words are connected into discourse, books are connected via the readers they have in common, concepts are connected into knowledge graphs, biological species into food webs, and stars into constellations. All these networks are intangible, but measuring and analysing them provides insight about how the mind and societies work. This talk runs through Network Science (a creative and very cross-disciplinary field), and demonstrates recent research, with diverse data sources, methods (from various areas of artificial intelligence), and case studies. (This talk is based on a conference keynote.)
Wednesday
April 17th
Lunch lecture: Introduction to model building and evaluating
12:45 - 13:30 - Location: Citadel T300
Introduction to modelling in R using the tidymodels framework.
Hands on: Introduction to model building and evaluating
13:45 - 14:30 - Location: Citadel T300
Practical modelling, evaluating models, creating features, working with the tidymodels framework.
Afternoon Talk: (in)Equity in breast cancer care: two examples (direct reconstruction and Gene expression profiles)
14:45 - 15:30 - Location: Citadel T300
In the Netherlands breast cancer care is reimbursed by the health insurers. Still, we see inequity in the application of for instance a direct reconstruction after a mastectomy and the application of gene expression profiles. The first treatment option is proven to improve quality of live. The latter is used in diagnostics to determine the possible profit of chemotherapy for the patient and supports the decision on having chemotherapy.
Thursday
April 18th
Lunch Lecture: Advanced model building
12:45 - 13:30 - Location: Citadel T300
Hands-on session: Advanced model building
13:45 - 15:30 - Location: Citadel T300
Data Science Drinks
16:00 - 18:00 - Location: The Gallery Theatre
(Social) networking with other participants, and other University of Twente students and staff interested in data science.
Friday
April 19th
Lunch Talk & Lecture: Building Fair AI; Methods and Metrics for Reducing Bias in Machine Learning
12:45 - 13:30 - Location: Citadel T300
This talk focuses on the important issue of bias in AI systems, especially when they make decisions about people. We will discuss the need to find and lower bias in machine learning. Our main point is the methods and ways we can do this.
First, we look at where bias in machine learning models comes from and what effects it has. Then, we will talk about the different ways to make machine learning models that think about fairness.
We will go into detail about these methods, focusing on the new and effective techniques being used today. A big part of the talk will be about the methods and measurements we use to check how fair these models are.
We will look at the latest ways to measure bias and see how well we are doing at making things fairer. Finally, we will talk about the challenges and questions that we still face in this area, showing why we need to keep researching and developing better AI systems.
The goal of this talk is to give a clear understanding of how we can make machine learning decisions fairer and more responsible.
Practical: Measuring and modeling equity
13:45 - 14:30 - Location: Citadel T300
Putting into practice the theory and methods discussed in the lunch talk. Practical examples of measuring model biases, and how to ensure equity of the model by mitigating biases.
Hands-on session: Measuring and modeling equity
14:45 - 15:30 - Location: Citadel T300
Sunday
April 21st
Submission Deadline
23:59
Deadline datathon submissions for local leaderboard
Monday
April 22nd
Closing session
12:45 - 13:30 - Location: Citadel T300
Prize ceremony and presentations by winning team(s)