11th Apr 2019

Check Out Australia’s Web Archive


How you ever heard of the Wayback Machine? This is an online archive of over 300 billion web pages that date back as early as 1996, when the Internet was first dawning across the globe. Nearly every site is available for viewing on the Wayback Machine, so long as the site was crawlable and not password protected.

The Wayback Machine, created in 2001, is indeed an impressive collection of the Internet’s treasures. But now in Australia, we have our very own. In March of 2019, the National Library launched its Australian Web Archive, or AWA. There are 900 billion records stored in the AWA, making it one of the largest in the world. Or, in the word’s of the library’s director-general Dr. Marie-Louise Ayres, the collection is “enormous.”

Why Archive Websites?

So why bother archiving websites? Archiving the web is extremely important from a historical, sociological, and cultural standpoint. While traditional archives house the material documents of the past, a web archive performs a similar role but provides a place to store those artifacts that are solely digital for future generations. And truth be told, the online world “has increasingly become where Australia's history and culture is located and created.”

A web archive has value because it allows historians and researchers to look back at any given moment of our Internet-connected history, and see exactly what information was being readily spread. What topics were being talked about? What kinds of websites were taking off? How were people communicating? This can reveal so much about our country, from how commerce and the economy grew and transformed to the way the Internet---and eventually artificial intelligence---would shape our homes and workplaces. Being able to trace this evolution through the past few decades is fascinating, and important when it comes to knowing our own history.

In addition to being historically significant, it can be just plain enjoyable to take a step back in time and remember how the Internet appeared in its earliest days. Just take a glance at this archive of the Google homepage in 1999, and you’ll see immediately how far we’ve come.

In terms of design, having a web archive is extraordinarily valuable. It can seem that modern, sleek websites have always been around, but when we look into the archives, we see that clunky, neon-coloured sites adorned the World Wide Web. It’s an amusing walk down a visual memory lane, giving us instant access to what one might call the “ugly websites of the 90s.”

All About Australia’s Web Archive

Much like the Wayback Machine, Australia’s sites began to be stored and archived in the 90s. Today’s completed web archive contains sites from as far back as 1996. The archive, which houses roughly 600 TB of data, includes billions of Australian .au domain web pages. Those stored in the archive are those deemed by the National Library of Australia (NLA) curators to be “culturally significant.” The recently released archive combines records from the PANDORA Archived websites and the Australian Government Web Archive, as well as other websites relating to Australia that have been collected throughout the years. You can head over to check out Australia’s Web Archive online for yourself to see what’s there.

Allison Dellit, the senior executive in charge of the National Library's digitisation program, Trove, tells ABC News that, from the start, it was evident that the web was something that demanded archival records. "There was such a recognition early on that the internet was going to change the way Australians communicated with each other, the way that we shared information, what we had access to and how we did it," she said.

Of course, not every site in Australian history is included in the Archive. It was the aim of the NLA to present to users what is relevant and culturally significant. In other words, a great deal of effort has been made to exclude spammy sites or those that could easily be labelled as “fake news.” Older sites of political, cultural, social importance etc. are generally included, no matter how ridiculous or outdated their design aspects may seem to us now. Allison Dellit herself enjoys the brightly goofy website for The Wiggles circa 1997.

Perhaps what is most exciting about the Australian Web Archive is that it is fully searchable. This makes it more than just a mere encyclopaedic resource, but takes advantage of its digital state by allowing users to browse and cross-reference with ease. This also means that, unlike the Wayback Machine, users of the AWA don’t need to know a specific site’s URL to find what they are looking for. Instead, they can search by topic.

Searchers will enjoy a combination of techniques to ensure their searches return accurate results. Chief information officer David Wong explains that the tech team created their own complex algorithm using several factors. They adapted their own version of Google’s page rank algorithm from 1998, which ranked content based on the frequency a page was clicked, and generally helped point towards useful, high-quality resources. Other technologies include a Bayesian filter (basically a spam filter), a Yahoo NSFW (Not Safe For Work) classifier, and machine learning. The machine learning helped use image recognition to identify and delete pages that display pornographic material.

What Can You Find in the AWA?

What might you find when searching the Australian Web Archive? Lots of things!

Users may enjoy peering at the early websites of PMs and other government figures. Some of the earliest sites are reliably cheesy and graphics heavy. The NLA team mentioned that capturing these types of sites is often a time-sensitive task, as new elections might make one site obsolete. So it has been important for them to stay on top of patterns and trends with a changing web.

There are definitely many familiar sites that you’ll want to take a look at. From your favourite news providers to blogs from the early millennium, the sites will bring back a certain nostalgia to those who remember the infancy of the Internet. For today’s teenagers and young people, these sites will seem surprisingly old-fashioned! It is also fascinating to see the earliest iterations of sites that have now become hugely popular, such as Facebook.

Finally, feel free to search for yourself! It’s very common for users to want to find out what exists about them on the Internet (both now and in the past), and the archive allows you to do this.

What’s Next?

The recent Federal Budget has allocated AU$10 million to the National Library of Australia. This is earmarked to be distributed over the next four years to set up a Digitisation Fund. This is a necessary step forward, and will no doubt continue the success of previous digitisation work, including the Australian Web Archive.

Budget documents stated that "The Digitisation Fund...will enable the continued digitisation of the NLA's significant collection and expand its availability to all Australians through its online database, Trove."

It will be fascinating to watch the growth of digital archiving going forward. Will we begin to look for ways to archive social media sites, profiles, and posts? How big would a library of the world’s tweets need to be? These answers remain a mystery for now, but we can explore the sheaves of Australia’s Internet past on the Australian Web Archive.

Hold on. We are fetching more posts for you ...