IEEE BigData 2024 Cup
Contact Network from Observed Disease Infections
The Challenge
This challenge involves a modified version of the network inference problem, where participants are tasked with predicting the contact network edges of a test county. To aid in this endeavor, we will supply three distinct data sets: the basepop data set, which is a synthetic representation of Virginia's population, a training data set consisting of synthetic pandemic outbreaks, and a test data set with additional synthetic pandemic outbreaks. The synthetic population and the training data sets will be released at the beginning of the challenge. The test data set will be released on June 17, 2024.
The basepop synthetic network for Virginia consists of approximately 7.7 million synthetic individuals, each with demographic and geographic attributes, and labeled by their residing county. The data of each synthetic outbreak consists of confirmed infections with their person IDs in the basepop and their confirmation date. Using an agent-based epidemic simulation, outbreaks are generated in a few selected training counties by seeding infections from randomly chosen individuals and allowing the disease to spread through a synthetic social contact network. The training data only includes a subset of confirmed infections, and there is a delay between the infection and confirmation dates. The distributions characterizing the partial confirmation and delay will be provided, along with the true synthetic contact network for each training county. All simulations use the same SEIR model with consistent parameterization. The agent-based simulation generates outbreaks in a different test county, with the same disease model and distributions for partial confirmation and delay. However, the true synthetic contact network for the test county is not fully provided
Submission
Solutions in this competition should be submitted to the online evaluation system as parquet file (.parquet or .parquet.gz). Each line in the submission should contain a predicted edge (pid1, pid2, probability), with pid1 < pid2.
Evaluation
The probability of connection between pairs of individuals in the test county population is estimated through predictive analysis. The accuracy of the submitted results will be measured against the true synthetic contact network of the test county for Precision-Recall AUC. In the initial submission stage, participants are required to submit their predictions for the test data set. The final submission should include the comprehensive report and solution (due by November 17, 2024, as per the Cup chairs' tentative deadline), along with the source codes and instructions to execute the code.
If you have any questions, please contact codi@virginia.edu