How to reproduce Tables and Figures (Section 3 and 4) on the paper?
Note that due to storage issues, we can not provide our longitudinal data but they are available upon request.
Reproducing the analytics from the datasets
This section introduces a way to verify the results produced via our Internet wide measurement.
Datasets and scripts
(1) Datasets and prerequisites for the analysis.
| Filename | Download |
|---|---|
SPF snapshot 2021-10-13 |
link |
SPF snapshot 2023-03-19 |
link |
MX snapshot 2021-10-13 (in zipped format) |
link |
MX snapshot 2023-03-19 (in zipped format) |
link |
spf-include-centralization.txt |
link |
spf-include-centralization-over-10.txt |
link |
(2) Scripts for the analysis
| Filename | Download | Description |
|---|---|---|
generate-table1-table2-data.py |
link | This script is for reproducing table 1 and table 2 |
generate-fig4-data.py |
link | This script is for reproducing the data for fig 4 |
plotting-scripts.zip |
link | These are plotting scripts for reproducing Fig. 3, 4, 7 |
How to run the scripts?
- Requirements: We use PySpark for big data analysis, python3 for scripting, and gnuplot for plotting scripts from data.
- Download the scripts in any directory, create a subdirectory called temp within the directory where the scripts reside
- Download spf-include-centralization.txt and spf-include-centralization-over-10.txt in temp.
- For reproducing Table 1 and 2, download spf and mx snapshots and run spark-submit generate-table1-table2-data.py. Match the outputs with table 1 and table 2 data.
- For producing temporary data needed to plot Figure 4, run python3 generate-fig4-data.py
- Unzip plotting-scripts.zip in a subdirectory called plots and use gnuplot to plot the figures.