The participants will after the course have practical experience with experimental evaluation of the performance of algorithm implementations and the comparison of experimental performance with theoretical asymptotic performance and insight into the influence of computer architecture on the design of efficient algorithms. The working method of the course will also train the participants to read and understand research papers. The participants must at the end of the course be able to:

**Construct**and**analyze**performance experiments for algorithm implementations.**Construct**and**implement**algorithms optimized towards efficient usage of the underlying computer architecture.

Algorithm engineering combines the theoretical area of algorithm design with practice. Traditional algorithm design assumes the unit cost random access machine model (RAM) as the model of computation - a reasonable adequate model for studying the worst-case performance of algorithms. For more accurate estimations of an algorithm implementations running time of actual hardware, more refined models of current computers are needed, eg. to model cache hierarchies, translation lookaside buffers (TLB), branch prediction strategies, conditional operations, pipelining, trade-offs between cache-faults and branch mispredictions. The core of algorithm engineering is the cycle of algorithmic design, analysis, implementation, and experiment. The goal of this course is to facilitate this cycle of algorithmic development among the students. The initial case studies of the course will be very basic algorithmic problems like scanning, searching sorted lists, merging, sorting, matrix multiplication, and shortest path computations.

January 27-March 10: Tuesdays 12.15-15.00, Auditorium D4 (Building 1531, room 219).

Date | Content | Literature | Slides |
---|---|---|---|

27/1 |
Introduction and discussion of Project 1 |
[J02,S09],[BFJ02,BM06] | intro | cache |

3/2 | Memory hierarchies; Memory layout of data structures Code: matrix.cpp,array-summing.cpp |
[FLPR12,BFJ02,JM13] | memory |

10/2 | Branch mispredictions; Trade-offs between cache-misses and branch mispredictions | [KS06,BM05,BFM05,BM06] | mispredictions |

17/2 | Certifying algorithms | [MMNS11, Sect. 1-7,12-15] | certifying |

24/2 | Special instructions: Predicated instructions, SIMD Code: min.cpp, binary-search.cpp |
[SW04],[SGL09],[K+10] | special |

3/3 | Parallel Algorithm Engineering (lectures by Kenneth Sejdenfaden Bøgh) |
[FM10,CPU14,MHSM09,AKN12,CRSY10,YRV11] | parallel |

10/3 | Route planning | [ADGW11,BFMSS07,GSSD08] | route |

- Project description.
- General comments about content of algorithm project reports.
- Group report due on
**March 17, 23:59**. One PDF file, max 20 pages (+ appendix). - Handin the project report online using the course menu.

*General papers about algorithm engineering*

[J02] | David S. Johnson. A Theoretician's Guide to the Experimental Analysis of Algorithms, in Data Structures, Near Neighbor Searches, and Methodology: Fifth and Sixth DIMACS Implementation Challenges. Editors: M. H. Goldwasser, D. S. Johnson, and C. C. McGeoch. American Mathematical Society, 215-250, 2002. |

[S09] | Peter Sanders. Algorithm Engineering - An Attempt at a Definition, in Efficient Algorithms, Essays Dedicated to Kurt Mehlhorn on the Occasion of His 60th Birthday. Susanne Albers, Helmut Alt, Stefan Näher (Eds.). Lecture Notes in Computer Science, Volume 5760, Springer, 2009, ISBN 978-3-642-03455-8. |

[M07] | Catherine C. McGeoch. Experimental algorithmics, Communications of the ACM, 50(11), 27-31, November 2007. |

[MM99] | Catherine C. McGeoch and Bernard M. E. Moret. How to present a paper on experimental work with algorithms, ACM SIGACT News, 30(4), 85-90, December 1999. |

[P88] | Ian Parberry. How to Present a Paper in Theoretical Computer Science: A Speaker's Guide for Students. SIGACT News, 19(2), 42-47, 1988. |

Books

[B00] | Jon Bentley. Programming Pearls, Second Edition. Addison-Wesley, Inc., 2000. 239+xi pages. ISBN 0-201-65788-0. |

[M92] | Catherine C. McGeoch. A Guide to Experimental Algorithmics, 272 pages, 2012, Cambridge University Press. ISBN: 9781107001732. |

Java

[O14] | Scott Oaks. Java Performance: The Definitive Guide - Getting The Most Out of Your Code , 426 pages, 2014, O'Reilly Media. |

Memory layout

[FLPR12] | Matteo Frigo, Charles E. Leiserson, Harald Prokop, Sridhar Ramachandran. Cache-Oblivious Algorithms. ACM Transactions on Algorithms, 8(1), Article No. 4, 2012. |

[BFJ02] | Gerth Stølting Brodal, Rolf Fagerberg, Riko Jacob. Cache-Oblivious Search Trees via Binary Trees of Small Height. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 39-48, 2002. |

[JM13] | Tomasz Jurkiewicz, Kurt Mehlhorn. The cost of address translation, In Proc. 15th Annual Meeting on Algorithm Engineering & Experiments (ALENEX), 148-162, 2013. |

Branch mispredictions

[KS06] | Kanela Kaligosi, Peter Sanders. How Branch Mispredictions Affect Quicksort. Proc. 14th Annual European Symposium on Algorithms (ESA), Lecture Notes in Computer Science, Volume 4168, 780-791, Springer, 2006. |

[BM05] | Gerth Stølting Brodal, Gabriel Moruz. Tradeoffs Between Branch Mispredictions and Comparisons for Sorting Algorithms. Proc. 9th International Workshop on Algorithms and Data Structures (WADS), Lecture Notes in Computer Science, Volume 3608, 385-395, Springer, 2005. |

[BFM05] | Gerth Stølting Brodal, Rolf Fagerberg, Gabriel Moruz. On the Adaptiveness of Quicksort. In Proc. 7th Workshop on Algorithm Engineering and Experiments (ALENEX), pages 130-140, 2005. |

[BM06] | Gerth Stølting Brodal, Gabriel Moruz. Skewed Binary Search Trees. In Proc. 14th Annual European Symposium on Algorithms (ESA), Lecture Notes in Computer Science, Volume 4168, 708-719, Springer, 2006. |

Correctness

[MMNS11] | R.M. McConnell, K. Mehlhorn, S. Näher, P. Schweitzer. Certifying algorithms. Computer Science Review, 5(2), 119-161, 2011. |

Special instructions

[SW04] | P. Sanders and S. Winkel. Super Scalar Sample Sort. 12th Annual European Symposium (ESA), Lecture Notes in Computer Science, Volume 3221, 784-796, 2004. |

[SGL09] | Benjamin Schlegel, Rainer Gemulla, Wolfgang Lehner. k-ary search on modern processors. Proceedings of the Fifth International Workshop on Data Management on New Hardware (DaMoN), Pages 52-60, 2009. |

[K+10] | C. Kim, J. Chhugani, N. Satish, E. Sedlar, A.D. Nguyen, T. Kaldewey, V.W. Lee, S.A. Brandt, and P. Dubey. FAST: fast architecture sensitive tree search on modern CPUs and GPUs. Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (SIGMOD), Pages 339-350, 2010. |

Multicores

[FM10] | Samuel H. Fuller and Lynette I. Millett. The Future of Computing Performance: Game Over or Next Level? The National Academies Press, 2010. |

[CPU14] | CPU Overclocking World Records. 2014. |

[MHSM09] | Daniel Molka, Daniel Hackenberg, Robert Schone, and Mathias S. Muller. Memory performance and cache coherency effects on an intel nehalem multiprocessor system. 18th International Conference on Parallel Architectures and Compilation Techniques (PACT), 251-270, IEEE 2009. |

[AKN12] | Martina-Cezara Albutiu, Alfons Kemper, Thomas Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. Proceedings of the VLDB Endowment, 5(10), 1064-1075, ACM 2012. |

[CRSY10] | John Cieslewicz, Kenneth A. Ross, Kyoho Satsumi, and Yang Ye. Automatic contention detection and amelioration for data-intensive operations. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, 483-494, ACM2010. |

[YRV11] | Yang Ye, Kenneth A. Ross, and Norases Vesdapunt. Scalable aggregation on multicore processors. In Proceedings of the Seventh International Workshop on Data Management on New Hardware (DaMoN), 1-9, ACM 2011. |

[MSHM11] | Daniel Molka, Robert Schöne, Daniel Hackenberg, & Mathias S. Müller. Memory performance and SPEC OpenMP scalability on quad-socket x86_64 systems. In Proceedings 11th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP), Lecture Notes in Computer Science Volume 7016, 2011, pp 170-181. |

Route planning

[ADGW11] | Ittai Abraham, Daniel Delling, Andrew V. Goldberg, Renato Fonseca F. Werneck. A Hub-Based Labeling Algorithm for Shortest Paths in Road Networks. Processing 10th International Symposium on Experimental Algorithms (SEA), Lecture Notes in Computer Science Volume 6630, 2011, pp 230-241. |

[BFMSS07] | Holger Bast, Stefan Funke, Domagoj Matijevic, Peter Sanders, and Dominik Schultes. In Transit to Constant Time Shortest-Path Queries in Road Networks. Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX), 2007. |

[GSSD08] | Robert Geisberger, Peter Sanders, Dominik Schultes, and Daniel Delling. Contraction Hierarchies: Faster and Simpler Hierarchical Routing in Road Networks. Proceedings 7th International Workshop on Experimental Algorithms(WEA), Lecture Notes in Computer Science, Volume 5038, 2008, pp 319-333. |

3 group projects (2-3 persons per group) and individual oral exam, 7-point grading scale, internal examiner.

The oral exam takes 20 minutes, including evaluation. The exam will be a discussion of the reports on the projects and the material covered in class (see "course plan" above). The final grade will be based on the project reports and the oral examination.