How to reproduce Tables and Figures (Section 3 and 4) on the paper?

Note that due to storage issues, we can not provide our longitudinal data but they are available upon request.

Reproducing the analytics from the datasets

This section introduces a way to verify the results produced via our Internet wide measurement.

Filename	Download
`SPF snapshot 2021-10-13`	link
`SPF snapshot 2023-03-19`	link
`MX snapshot 2021-10-13 (in zipped format)`	link
`MX snapshot 2023-03-19 (in zipped format)`	link
`spf-include-centralization.txt`	link
`spf-include-centralization-over-10.txt`	link

Filename	Download	Description
`generate-table1-table2-data.py`	link	This script is for reproducing table 1 and table 2
`generate-fig4-data.py`	link	This script is for reproducing the data for fig 4
`plotting-scripts.zip`	link	These are plotting scripts for reproducing Fig. 3, 4, 7

Requirements: We use PySpark for big data analysis, python3 for scripting, and gnuplot for plotting scripts from data.
Download the scripts in any directory, create a subdirectory called temp within the directory where the scripts reside
Download spf-include-centralization.txt and spf-include-centralization-over-10.txt in temp.
For reproducing Table 1 and 2, download spf and mx snapshots and run spark-submit generate-table1-table2-data.py. Match the outputs with table 1 and table 2 data.
For producing temporary data needed to plot Figure 4, run python3 generate-fig4-data.py
Unzip plotting-scripts.zip in a subdirectory called plots and use gnuplot to plot the figures.