for the global Python community

Reliability Engineer

Posted by Two Sigma on Tue, 25 Oct 2016
Contract type: permanent. Location: London, England

Two Sigma

We’re not your typical investment manager. We follow principles of technology and innovation as much as principles of investment management. Fields like machine learning and distributed computing guide us. Since 2001, we’ve searched for ways that these kinds of technologies can make us better at what we do. We never stop researching and developing.

In the process, we work to help real people. Through our investors, we support the retirements of millions around the world. And we help fund breakthrough research, education and a wide range of charities and foundations.

Reliability Engineer

As a member of this versatile group of full stack engineers, you will be on the front line for maintaining and expanding the capabilities of Two Sigma’s many and varied systems. The team exists in the space between traditional systems administration and development, and seeks to merge the capabilities from both disciplines. Our remit includes:

  • Acting as a conduit between infrastructure and development teams, being sympathetic to the concerns and priorities of both;
  • Primary operational support for multiple large distributed software applications;
  • Improving all aspects of software reliability, including better monitoring, alerting and documentation;
  • Engaging with our software engineering teams on support issues and improvements to our tools, processes, and software;
  • Gathering and analyzing metrics from both operating systems and applications to assist in performance tuning and fault finding.

Requirements include:

  • A bachelor’s degree in computer science or another highly technical, scientific discipline.
  • Experience with Python (or a high degree of competence in another language and a willingness to learn).
  • In-depth knowledge and experience in at least one of: host based networking, Linux/Unix administration, systems programming, distributed systems, databases, and a desire to learn more.
  • The ability to quickly leverage off the shelf and open source systems and utilities to rapidly provision production systems in a variety of domains, especially for multi-tenant use.
  • A proven track record of automation and an algorithmic approach to solving problems.
  • A proactive approach to spotting problems, areas for improvement, performance bottlenecks, etc.
  • An understanding of the operational concerns in a demanding environment; ideally, but not necessarily, finance.

Additional skills preferred:

  • Familiar with relational database concepts and have the ability to construct at least moderately complex SQL queries.
  • Experience with authentication and encryption technologies like SSL, Kerberos and GSSAPI.
  • Networking experience, analyzing packet dumps, multicast routing on hosts, packet filtering.
  • OS/kernel experience such as familiarity with OS tunables, log analysis.
  • Experience with automated configuration management tools.