
β
We are seeking a Data Engineer I who will support a GenAI-powered insights assistant initiative by building and scaling ingestion and embedding pipelines for unstructured WW FBA knowledge bases. Your role ensures the retrieval-augmented generation system accesses fresh, relevant document embeddings to enhance AI-driven insights and user query satisfaction.
β
β
Key job responsibilities
- Build batch and streaming data pipelines using Spark and AWS streaming services.
- Implement automated checks to ensure data consistency across different data types.
- Define and maintain data contracts with source teams to keep schemas consistent.
- Develop cross-domain metadata services linking structured and unstructured data catalogs.
- Create APIs and event-driven workflows integrating AI insights with business tools.
- Monitor pipeline health, costs, and SLA adherence.
β
β
- 1+ years of data engineering experience
- Experience with data modeling, warehousing and building ETL pipelines
- Experience with one or more query language (e.g., SQL, PL/SQL, DDL, MDX, HiveQL, SparkSQL, Scala)
- Experience with one or more scripting language (e.g., Python, KornShell)
β
β
- Experience with big data technologies such as: Hadoop, Hive, Spark, EMR
- Experience with any ETL tool like, Informatica, ODI, SSIS, BODI, Datastage, etc.
- Familiarity with RAG (Retrieval-Augmented Generation) principles.
- AWS experience: Lambda, S3, SageMaker, Bedrock Knowledge Bases.
β