Deatomizing the web: New project tackles the bottleneck of superfast cloud computing

[Translate to English:] "Vi behøver ikke hele filen for at gemme et billede. Vi skal bare bruge en slags indholdsfortegnelse af, hvordan billedet skal bygges op af andre stumper af data. Ligesom instruktionsbogen til en Lego model," siger lektor Daniel Lucani Rötter fra Institut for Ingeniørvidenskab, Aarhus Universitet. Foto: Peer Klercke.

22 March 2019 by Jesper Bruun

1,100,000,000. That’s around the number of new Internet of Things (IoT) devices connected to the internet in 2018.

These are not mobile phones, computers or tablets. The IoT has devices such as smart TVs, wearables, security systems and other items with sensors that are connected to the internet.

The figure corresponds to more than three million new devices connected to the internet every single day throughout 2018.

The IoT is growing so fast that the internet is rapidly approaching a bottleneck. The huge amount of data generated lowers reading speed throughout the internet.

"If you look at the way things are going, there's a massive amount of data that needs to be stored, and one of the challenges is how to gain access to it. You can have a huge amount of storage space in a single disk, but your access speed remains the same. That means that, if you're running a datacentre, you run into a bottleneck. Already now, some datacentres are aiming for smaller hard drives, simply because the access speed is a bottleneck," says Associate Professor Daniel Lucani Rötter from the Department of Engineering at Aarhus University.

ALSO READ: AU researchers develop the carbon-free fuel of the future from air, water and electricity

Therefore, he has just kicked off a project aiming to reduce demand for storage space.

"This project is about limiting the amount of data needed for storage. Instead of simple compression, it's more about how to manage the data. How we can exploit the characteristics of different types of data to be able to compress it dramatically," he says.

(The article continues below the picture)

The Scale-IoT project is not only limited to IoT and cloud data. With modifications, it can also be used for normal local data storage, says Daniel Lucani Rötter. Photo: Lars Kruse / AU Foto.

ALSO READ: Good news! Europe's electric grid will still work even as the world crumbles

It's all about similarity; different data that share similarities.

Take a JPEG image for example. As soon as the picture is taken, it will be compressed. Every pixel is not usually saved because there is a lot of redundancy, so the picture is divided into parts to save and redundant parts. Associate Professor Daniel Lucani Rötter is aiming to use the same technique in his project.

But rather than only compressing pictures, he wants to embrace all data.

"Normally, when people think of data compression, they might think about Winzip. What happens there is that you compress a bunch of files, but if you want to read them, you need to decompress all of them. The idea behind our concept is that we basically want to be able to compress everything and still be able to read every single file without having to decompress other files every time you want to access it," he says, and continues:

"In theory we take a file and split it into many different small chunks. The critical thing is that you fragment the data into smaller chunks and try to identify similarities between the chunks in the system."

ALSO READ: New world-class research laboratory opens at Aarhus University

And since you can compress across all the data you have, there's a good opportunity to exploit similarities, for instance when we're talking about cloud storage and datacentres, and he goes on:

"In order to save a picture, we don’t need the entire file. We just need a sort of index of how the picture is built up. Like the instructions for a Lego kit. A detailed list of how to put the picture together with bits from other pictures."

Instead of searching for exact matches in the small chunks of split up data, Associated Professor Rötter looks for something that's "close enough". There may be a small error or something that's different, but then the error is stored, and the rest is indexed.

What you end up with is a number of ID blocks with associated errors. That way, you can recover your original data without errors.

"The project is not only limited to IoT and cloud data. With modifications, it can also be used for normal local data storage. What if suddenly you could get a lot more space out of your 256 GB hard disk drive, simply because data didn’t take up so much space? There's a huge potential in compressing data in this way," says Daniel Lucani Rötter.

Short fact

Access speed is becoming a growing concern for many major IT companies around the world. Amazon actually measured the projected cost of slower speed. Just a change of 100 milliseconds could cost them billions in revenues, simply because customers become agitated over slower speed and cancel their subscription.

Contact

Daniel Lucani Rötter
Associate Professor
Mail: daniel.lucani@eng.au.dk
Phone: +45 93508763