Company logo

Data Analytics Engineer

Prophaze Technologies (P) Ltd

Trivandrum

in 28 days

Brief DescriptionAbout the RoleYou will architect, build and maintain end-to-end data pipelines that ingest 100 GB+ of NGINX/web-server logs from Elasticsearch, transform them into high-quality features, and surface actionable insights and visualisations for security analysts and ML models. Acting as both a Data Engineer and a Behavioural Data Analyst, you will collaborate with security, AI and frontend teams to ensure low-latency data delivery, rich feature sets and compelling dashboards that spot anomalies in real time. Key Responsibilities ETL & Pipeline Engineering: • Design and orchestrate scalable batch / near-real-time ETLworkflows to extract raw logs from Elasticsearch.• Clean, normalise and partition logs for long-term storageand fast retrieval.• Optimise Elasticsearch indices, queries and retention policiesfor performance and cost. Feature Engineering & Feature Store: • Assist in the development of robust feature-engineeringcode in Python and/or PySpark.• Define schemas and loaders for a feature store (Feast orsimilar).• Manage historical back-fills and real-time feature look-ups,ensuring versioning and reproducibility. Behaviour & Anomaly Analysis: • Perform exploratory data analysis (EDA) to uncover trafficpatterns, bursts, outliers and security events across IPs,headers, user agents and geo data.• Translate findings into new or refined ML features andanomaly indicators. Visualisation & Dashboards: • Create time-series, geo-distribution and behaviour-patternvisualisations for internal dashboards.• Partner with frontend engineers to test UI requirements. Monitoring & Scaling: • Implement health and latency monitoring for pipelines;automate alerts and failure recovery.• Scale infrastructure to support rapidly growing log volumes. Collaboration & Documentation: • Work closely with ML, security and product teams to aligndata strategy with platform goals.• Document data lineage, dictionaries, transformation logicand behavioural assumptions. Minimum Qualifications• Education – Bachelor’s or Master’s in Computer Science, Data Engineering, Analytics,Cybersecurity or related field.• Experience – 3 + years building data pipelines and/or performing data analysis onlarge log datasets. Core Skillso Python (pandas, numpy, elasticsearch-py, Matplotlib, plotly, seaborn; PySparkdesirable)o Elasticsearch & ELK stack query optimisationo SQL for ad-hoc analysiso Workflow orchestration (Apache Airflow, Prefect or similar)o Data modelling, versioning and time-series handlingo Familiarity with visualisation tools (Kibana, Grafana).   DevOps – Docker, Git, CI/CD best practices. Nice-to-Have• Kafka, Fluentd or Logstash experience for high-throughput log streaming.• Web-server log expertise (NGINX / Apache, HTTP semantics)• Cloud data platform deployment on AWS / GCP / Azure.• Hands-on exposure to feature stores (Feast, Tecton) and MLOps.• Prior work on anomaly-detection or cybersecurity analytics systems. Why Join Us?You’ll sit at the nexus of data engineering and behavioural analytics, turning raw traffic logsinto the lifeblood of a cutting-edge AI security product. If you thrive on building resilientpipelines and diving into the data to uncover hidden patterns, we’d love to meet you.Preferred SkillsData analyst ETL Pipelines Feature engineering Data engineer Visualizations Matplotlib Seaborn Plotly Kafka Elasticsearch Pyspark