VSnake notes: C10K

2014-06-24

C10K

Случайно наткнулся на замечательную статью, в которой содержится обзор и оценка разных подходов к вопросу — а как нам обеспечить 10 000 одновременных соединений (сетевых сессий) на одной машине:

http://www.kegel.com/c10k.html

Читал, не мог оторваться, просто как детектив какой-то. Масса информации.

А вышел я на эту статью разглядывая вот эти слайды:

https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploited-all-cores

Заголовок провокационный, да. Вообще, презентация способна вызвать немало баттхерта у незрелых личностей. Сначала речь идет о реальной асинхронности ввода/вывода в противовес имитируемой через event loop. Потом делается плавный переход от пула реально асинхронных потоков операционной системы (MS Windows) к обработчикам сетевых протоколов, написанных на Python (Python/C API). Изящно.

А чтобы обеспечить thread-safety и не упираться в GIL, PyParallels делает ход конем: проверяет – главный поток выполняется или нет и если нет — работает в обход GIL, перехватывая thread-sensitive вызовы ядра Python.

Побочным неприятным эффектом изящности их решения можно назвать то, что в параллельных контекстах (код, выполняемых в параллельных потоках) память не освобождается. Поэтому они обязаны быть нежадными и короткими.

А есть видео, где автор показывает слайды:

During the fall of 2012, a heated technical discussion regarding asynchronous programming occurred on python-ideas. One of the outcomes of this discussion was Tulip, an asynchronous programming API for Python 3.3, spearheaded by Guido van Rossum. A lesser known outcome was PyParallel: a set of modifications to the CPython interpreter that allows Python code to execute concurrently across multiple cores.

Twisted, Tulip, Gevent, Stackless/greenlets and even node.js are all variations on the same pattern for achieving "asynchronous I/O": non-blocking I/O performed on a single thread. Each framework provides extensive mechanisms for encapsulating computational work via deferreds, coroutines, generators and yield from clauses that can be executed in the future when a file descriptor is ready for reading or writing.

What I found troubling with all these solutions is that so much effort was being invested to encapsulate future computation (to be executed when a file descriptor is ready for reading or writing), without consideration of the fact that execution is still limited to a single core.

PyParallel approaches the problem in a fundamentally different way. Developers will still write code in such a way that they're encapsulating future computation via the provided APIs, however, thanks to some novel CPython interpreter modifications, such code can be run concurrently across all available cores.

http://vimeo.com/79539317