Internet Archive Infrastructure
I just watched a presentation by Jonah Edwards from the Internet Archive where he talks about the infrastructure behind it and answers questions.
If I understood correctly, the Internet Archive currently stores about 100 PB and every quarter 5 to 6 PB are added. And because everything is stored in RAID arrays, the raw capacity is actually double that.
Why doesn’t the Internet Archive use AWS? First, it would be a much bigger cost, and second, it would come with a loss of control, Amazon could even track visitors, which is not desired.
Definitely a great work they are doing!