How to reproduce Tables and Figures (Section 3 and 4) on the paper?

Note that due to storage issues, we can not provide our longitudinal data but they are available upon request.

Reproducing the analytics from the datasets

This section introduces a way to verify the results produced via our Internet wide measurement.

Datasets and scripts

(1) Datasets and prerequisites for the analysis.

Filename Download
SPF snapshot 2021-10-13 link
SPF snapshot 2023-03-19 link
MX snapshot 2021-10-13 (in zipped format) link
MX snapshot 2023-03-19 (in zipped format) link
spf-include-centralization.txt link
spf-include-centralization-over-10.txt link

(2) Scripts for the analysis

Filename Download Description
generate-table1-table2-data.py link This script is for reproducing table 1 and table 2
generate-fig4-data.py link This script is for reproducing the data for fig 4
plotting-scripts.zip link These are plotting scripts for reproducing Fig. 3, 4, 7

How to run the scripts?

  1. Requirements: We use PySpark for big data analysis, python3 for scripting, and gnuplot for plotting scripts from data.
  2. Download the scripts in any directory, create a subdirectory called temp within the directory where the scripts reside
  3. Download spf-include-centralization.txt and spf-include-centralization-over-10.txt in temp.
  4. For reproducing Table 1 and 2, download spf and mx snapshots and run spark-submit generate-table1-table2-data.py. Match the outputs with table 1 and table 2 data.
  5. For producing temporary data needed to plot Figure 4, run python3 generate-fig4-data.py
  6. Unzip plotting-scripts.zip in a subdirectory called plots and use gnuplot to plot the figures.