Data Science Week - Summer 2023

September 25, 2023— October 2, 2023

Summer 2023 Data Science Week

Registration for the Summer 2023 Data Science Week, September 25th - October 2nd, is already open!.

Register now!

What is the Data Science Week?

Over the course of two weeks, BDSi will organize a datathon and a series of workshops and seminars. In the datathon, you will compete with other teams of behavioural data scientists to solve a real data science case. BDSi staff will organize seminars and workshops throughout the first week to introduce the various steps involved in data science. These activities are meant to support the datathon, but are open to all students and staff. Coaches will be available throughout both weeks to guide you when you run into problems.

The goal of the Data Science week is to introduce interested students and staff to data science in a fun and cooperative way, and help create a community of data scientists at the University of Twente, the faculty of Behavioural and Management Sciences, and beyond. After one week, the teams with the best solutions and most interesting approaches to the data science problem will present their work, and be presented with a suitable prize.

Who can join?

Staff, students, family, and friends

Everyone related to the University of Twente and their friends and family can join. You can join with friends, colleagues or even family. The event is open to both novices and experts, and everyone in between. You can join the datathon as a team, alone, or skip it altogether and only participate in the workshops. If you do join alone, you can choose to be assigned to a team with other data science enthusiasts.

BDSi primarily supports Data Science for the faculty of Behavioural and Management Sciences (BMS). Other University of Twente students and staff are welcome to attend (with or without their family and friends), as long as spaces are available.

Some experience with R (or another programming language)

Some programming knowledge is required!

You'll need to have a basic idea of R in order to follow along with the workshops and seminars as all of our examples will be using R and various R packages.

While we will do our best to introduce data science topics in the various workshops without relying on code, a basic understanding of R will make it much easier to follow along.

If you have some experience with other programming languages, you should be able to follow along with a little preparation. More information on installing and using R can be found in the What can I do to prepare? section.

If you're new to programming in general or would like a deeper understanding of R, and would rather learn from one of our colleagues, the Cognition, Data and Education (CoDE) section provides courses and materials aimed at teaching staff and Johannes Steinrücke teaches half-day introduction to R and data visualization in R workshops for PhD's (and EngD's).

If you’re confident you can do the datathon in Python (or any other language - we challenge you to try in C, Fortran, Brainf***, or JavaScript), you’re more than welcome to do so. Just be aware that we probably can’t offer support if or when you get stuck.

What can I do to prepare?

Get a team

First off, get a team together. The datathon is meant to be a collaborative experience where you work alongside a variety of expertises.

Read the book (or at least skim through it)

The materials we will use are loosely based on the freely available Introduction to Statistical Learning book. If you’re interested in data science, statistical learning or machine learning, this book would be a great place to start. BDSi also organizes a yearly reading club around this book.

Install R, RStudio, and tidyverse

As a faculty, BMS has decided to use R for statistical education. We will follow this example, and use R and the tidyverse packages in the workshops and seminars. If you do not already have a preferred programming language, you may want to install R and RStudio. ModernDive has a good primer on installing R and RStudio, that also covers the basics of working in R. If you’d like to go further, we recommend the free R for Data Science book by Hadley Wickham - a name you’ll encounter often in the R community.

What will we do?

The Data Science Week workshops and datathon are designed to be related but stand-alone experiences. The workshops will cover basic Data Science concepts applicable to any data science projects, using the datathon materials as a case study. During the afternoon practical sessions we will expand on the topics covered in the morning. Usually, this will involve a hands-on walkthrough of the datathon materials. You can follow along and use the scripts as provided, expand and improve upon them, or write your own, all while having BDSi staff on hand to answer your questions.

During the final session, we ask the groups that prepared the best and most interesting models to present their solutions, and the process by which they arrived at them. Finally, the winning team will have the honour of receiving the BDSi Data Science Trophy™!

Previous participants

Amin Asadi was kind of enough to record a video presentation of his submission for the 2021 Data Science Week - have a look!

Amin chose to meet the challenge on his own, but you don't have to. In fact, the datathon is best performed as a group - so that you can learn from each other, and explore together. BDSi staff will be available to give you a helping hand if and when you get stuck.

Schedule

The Data Science Week will start and end with a group session. You will be free to work on the case on your own schedule, and coaches will be available for questions and feedback throughout. The workshops and seminars are scheduled throughout the week, gradually introducing new topics by creating a baseline solution to datathon.

Timeline with the schedule for the data science week

Kickoff

Monday September 25th, location TBA

12:45 - 13:15

After a quick introduction about BDSi, we will introduce the goal of the datathon, and how you can compete. We will also explain how to reach the coaches for help, and give a brief overview of the schedule. Finally we will announce the teams for those who signed up alone and want to join a team.

13:15 - 13:45

Coffee, tea, and cookies while meeting your team and having the opportunity to ask questions to BDSi staff and coaches.

13:45 - 14:30

Quick introduction to resources you can use, followed by a hands on exploration of the dataset for the datathon. Bring your laptop, you’re expected to get down and dirty with the data!

Workshop Data Wrangling

Tuesday September 26th, location TBA

12:45 - 13:30

A 45 minute guided introduction to data wrangling in R, using the ‘tidy’ data principles. Karel Kroeze will show how to prepare a ‘raw’ dataset for analysis, by cleaning, reshaping and mutating the data until it gives up all its secrets.

This workshop is also open for those who do not want to participate in the Data Science Week. You can find more information about the workshop here.

13:45 - 14:30

Hands-on data wrangling for the datathon dataset.

Workshop Modelling I

Wednesday September 27th, location TBA

12:45 - 13:30

A 45 minute guided overview of basic machine learning techniques. Anna Machens will take you through the basics of model fitting, parameter selection and hyperparameter tuning, ending up with a simple but effective predictive model.

This workshop is also open for those who do not want to participate in the Data Science Week. You can find more information about the workshop here.

13:45 - 14:30

Hands-on creation of a basic model for the datathon.

Workshop Modelling II

Thursday September 28th, location TBA

12:45 - 13:30

A 45 minute deeper dive into more advanced modelling techniques with Anna Machens.

This workshop is also open for those who do not want to participate in the Data Science Week. You can find more information about the workshop here.

13:45 - 14:30

Hands-on tuning and improvements of a categorization model for the datathon, and plenty of time to ask questions.

Workshop Data Visualization

Friday September 29th, location TBA

12:45 - 13:30

A 45 minute guided overview of data visualization using ggplot2 and the grammar of graphics. Karel Kroeze will explain the principles of creating and layering visualizations with ggplot in R, and give a quick introduction to interactive visualizations with plotly, shiny and beyond.

This workshop is also open for those who do not want to participate in the Data Science Week. You can find more information about the workshop here.

13:45 - 14:30

Hands-on visualization practical, with a focus on visualizing model results and parameter importance for the datathon.

Submission Deadline

Sunday October 1st

23:59

After spending all weekend with your team fine-tuning your solutions, you will have to submit them before midnight on Sunday. That gives us a bit of time to check your models and pick a winner. In the meantime, you can practice your victory speech - or suddenly have a brilliant idea that it’s too late to implement before submission.

Closing Session

Monday October 2nd, location TBA

12:45 - 13:15

Debriefing by the BDSi team, and announcement of the winning team(s).

13:15 - 13:45

Presentations by the winning team(s) of their solution(s) and approach. The teams that created the best and most creative solutions will give a short presentation about their approach, and there will be time to ask questions to the winning teams as well as BDSi staff and coaches.

13:45 - 14:30

Coffee, tea, cookies.

References

header image adapted from upklyak