Introduction to streaming algorithms

An introductory talk presented at the summer academy 2014 of the Studienstiftung des dt. Volkes.

I attended a summer academy in Italy that discussed the topic of big data. In our group we managed to cover a variety of interesting fields and algorithms that deal with data that either is too large in size, too fast in transport or too variable in its structure.

The topic of my talk was "Streaming I: An introduction to streaming algorithms". I also implemented an example using the most basic version of the Flajolet-Martin algorithm and a more advanced version of the Bloom filter algorithm.

The last one is very interesting. It usually provides a huge speedup with a controlled error rate. It is definitely worth a look for some problems and is heavily used in applications such as Cassandra, BigTable or Hadoop. Within these applications the usage of a Bloom filter saves a lot of IO and communication costs.

Created . Last updated .

References

Sharing is caring!