High-Performance and Low-Latency C++ with Herb Sutter
Efficiency, Concurrency, Parallelism, Modern Hardware, and Modern C++11/14/17
Herb Sutter is back in Europe! Herb is the chair of the ISO C++ committee and the best-selling author of four books and hundreds of technical papers and articles, including the world famous essay “The Free Lunch Is Over”. Register to attend his 3 day training, arranged in Stockholm, 25-27 of April.
Performance and efficiency are C++’s bread and butter, and they matter more than ever on modern hardware: In processors, single-threaded performance improvements are slowing down (unless your code is parallel); in Internet of Things, we are often asked to do more work with less hardware; and in cloud computing, processor/hardware time is often the major component of cost and so making code twice as efficient often means saving close to half the processing cost. Today, getting the highest performance and the lowest latency on modern hardware often means being aware of the hardware in ways that most other programming languages can’t – from hardware caches where simply arranging our data in the right order can give 50x speedups with otherwise identical code, to hardware parallelism where using parallel algorithms turns on high-performance parallel and vector processor hardware that otherwise sits idle.
Additionally, low latency increasingly matters at all scales: In user interfaces, low latency means responsive apps and productive users without the dreaded “wait…” donut; in financial trading, low latency regularly saves large amounts of cash; in embedded real-time systems, low latency is crucial to meeting deadlines and can even save lives. Today, this makes concurrency more important than ever, because it delivers two things: It hides latencies we have to deal with and cannot remove, from disk I/O latency to speed-of-light network latency; and it makes our code responsive by not introducing needless latencies of our own even when we’re not hiding someone else’s latency.
This intensive three day course will provide developers with the knowledge and skills required to write high-performance and low-latency code on today’s modern systems using modern C++11/14/17.
The courseware is in English and the trainer speaks English.
- The changing hardware landscape in a nutshell; the three pillars; how many cores are you coding for
Fundamentals: Common Tools and Considerations
- Overview of primitives: memory access order, threads/pools, lambdas, locks, atomics
- Think in transactions: exclusive code blocks, safety, robustness
- Prefer structured/bounded lifetimes: tasks; locks
- Avoid overheads: context/domain switching; cost of unrealized concurrency
- Understand the nature of deadlock: not just mutexes; any waiting/blocking cycle
Low Latency: Responsiveness Through Concurrency and Isolation
- Basic tools: threads; messages
- Review history of threading: single; cooperative; preemptive; truly concurrent
- Common problematic constructs: the ____ thread; the GUI thread
- Prefer responsive objects: increase abstractions; avoid using raw threads; helper classes to automate
- Using agents for performance: pipelining
Machine Architecture: Dealing with the Reality of Modern Hardware in Our Software
- Bandwidth vs. latency; memory vs. CPU; Little’s Law; pipelining; cache; analyzing CPU profiles of cache misses
- Effects on code correctness: memory mode; sequential consistency; legal and illegal transformations; object layout considerations; a general pattern to avoid, conditional locking / speculative lock elision
- Effects on code performance: locality matters; access patterns matter even more; hardware considerations
- Migrating to the New Free Lunch: High Bandwidth Computation Through Parallelism
- Understanding scalability: 1-core, K-core, N-core
- Choosing data structures: ones that are concurrency-friendly vs. concurrency-hostile
- Coping with different semantics: parallelizing algorithms can change their meaning; how to gain stronger semantics when needed
- Superlinear scalability
Performance Without Races (Corruption), Contention (Latency-Killing), or Deadlocks (Composition-Killing)
- Think in transactions: nested transactions; lock reacquisition (safe vs. unsafe); when invariants must hold
- How to avoid race conditions: associate data with locks; helper classes to automate
- How to avoid deadlocks: apply lock hierarchies, other lock ordering; helper classes to automate
- How to avoid composability problems: combining modules/plugins/extensions safely
Correct Use of Atomics: The C++ Memory Model, std::atomic<>, and Modern Hardware
- The C++11 memory model and what it requires you to do to make sure your code is correct and stays correct: how the compiler and hardware cooperate to remember how to respect these rules; what is a race condition; how a race condition and a debugger see the same pink elephants
- The tools: the deep interrelationships and fundamental tradeoffs among mutexes, atomics, and fences/barriers; why standalone memory barriers are bad; why barriers should always be associated with a specific load or store.
- The rapidly-changing hardware reality: how locks and atomics map to hardware instructions on ARM and x86/x64 (and throw in POWER and Itanium for good measure); how and why those answers are different from a couple of years ago, and how they will likely be different again a few years from now; how the latest CPU and GPU hardware memory models are rapidly evolving, and how this directly affects C++ programmers
- Using explicit std::memory_order_*: never by default; specific low-level patterns and how to code them correctly
Writing Robust Lock-Free Code: When the Lowest Latency Matters, Consider Using the Highest Concurrency
- The tools: atomic variables, transactional thinking
- Double-Checked Locking pattern: example and analysis; why no longer broken
- Producer/Consumer pattern variations: using locks; locks + lock-free for different phases; fully lock-free using mailboxes
- Implementing a singly linked list: implementing find/insert_at_front/pop; problems and how to avoid them
- config_map: locks + atomics together to satisfy a specific lazy initialization requirement
- The Near Future of Hardware: Switching the Horse We’re Riding, Off Moore’s Law and Onto the Cloud
- In the twilight of Moore’s Law, the transitions to multicore processors, GPU computing, and HaaS cloud computing are not separate trends, but aspects of a single trend – mainstream computers from desktops to ‘smartphones’ are being permanently transformed into heterogeneous supercomputer clusters. Henceforth, a single compute-intensive application will need to harness different kinds of cores, in immense numbers, to get its job done. – The free lunch is over. Now welcome to the hardware jungle.
This intensive three day course will provide developers with the knowledge and skills required to write high-performance and low-latency code on today’s modern systems using modern C++11/14/17. During the training you’ll learn how to get the highest performance and the lowest latency on modern hardware in ways that are unique to C++, including how to arrange data to use hardware caches effectively, and how to use standard and your own custom-written parallel algorithms to harness high-performance parallel and vector processor hardware to compute results faster. You’ll also learn how to manage latency for responsive apps and for real-time systems, and techniques to hide the underlying latencies we have to deal with and cannot remove such as disk and network latency, and to make your own code responsive by not introducing needless latencies in your own code.
C++ programmers, programmers working in performance- and efficiency-demanding application domains, programmers working in latency-sensitive domains, system designers, HPC experts, data specialists
Intermediate to advanced C++ programming experience required: at a minimum, participants should be comfortable with the C++ standard language (data structures, control flow, functions, modules, packages, file I/O, and so on.
Some experience with concurrency, parallelism, and/or multiprocessing in languages such as Java, C, C++, or another is recommended, but not required.
|Event date||25. Apr 2017, 9:00|
|Event End Date||27. Apr 2017, 16:30|
|Individual Price||22950 SEK/21950NOK/2395€|