Job Overview
A Site Reliability Engineer position requires strategic engineering, holistic troubleshooting of the customer’s systems, and deep technical work. The Reliability Engineer uses technical analysis to assess the availability, latency, scalability, and efficiency of a product or infrastructure by engineering reliability into software and systems.
For this position, exceptional critical thinking, problem-solving, and in-depth technical skills are necessary. A very good balance of process-oriented thinking skills and experience in managing customer expectations is a must. The successful candidate must be able to function at a high level in critical situations.
Responsibilities
- Coordinates with Customer care, consulting services, and customers to install, configure and maintain Adobe Campaign servers.
- Implements policies, procedures, and technologies to ensure Adobe Campaign hosted system security through secure system access, monitoring, control, and routine security evaluations
- Monitors, tests, and tunes system performance; preserves and provides verification of success when requested.
- Recommends and executes modifications to managed services systems to improve efficiency, reliability, and performance
- Interact with internal stakeholders in a professional and collaborative manner
- Develop and maintain knowledge of customer's specific business environment.
- Solve customer requests escalated via various channels.
- Engage in On-Calls, Ownership of ComplexOps Tasks.
Requirement in Nutshell
- 24x5 Operations (Including Night shift)
- One weekend a month
- Noida location
- In-office
- Time-sensitive work
- SLA driven requests (4 hours to 24 hours)
- Customer origin tasks
- Repeat work
- Fast learner
- Production engineer mindset
- Multi-tasking.
- Excellent communication skills
Skills Required
- Good verbal and written communication skills
- Demonstrated effective production management skills
- Strong working knowledge of networking, packet tracing, and understanding latency, and throughput.
- Strong working knowledge of Linux operating systems and their underlying components, system statistics, performance tuning, filesystems, and io.
- Java or C/C++ development experience including solid scripting skills in Ruby, Perl, or Python.
- Experience with production deployment, monitoring, and operational support for Enterprise-class applications (Amazon Web Services and Microsoft Azure Preferred)
- Strong troubleshooting skills
- Experience in performance diagnostics, capacity planning, performance architecture design, performance tuning, performance monitoring
- Experience working with high-traffic solutions/services.
- Hands-on experience in Apache, Java, Python, Tomcat, Databases, Load Balancers, and Firewalls
You Should Have
- Can-do attitude no problem is too big or too small.
- A systematic problem solver, with the ability to think outside the box.
- Good data analysis skills to pick up trends before they become major problems.
- Passion for great customer service.
- Ability to work on multiple priorities and/or projects simultaneously.
- Proven ability to work with little or no supervision
- High-quality, detail-oriented approach to work.
- Ability to quickly learn new technologies.
Technical Expertise
1. Linux Administration
- RHCE/RHCA
- User Management
- File System & package Management
- Cloud –
- AWS at least
- Azure
2. Production Experience /Strong Troubleshooting
- DNS
- ISO OSI stack
- Troubleshooting
- Web-based application
- Performance
- Network
- SSH/SSL/SFTP etc.
- Security
3. Development/Scripting(Good to have)
- Languages (at least one)
- Python
- Bash
- Java
- JS
- Config Management
- Ansible
- Salt