Yirifi.ai

Yirifi.ai

Software Industry

Remote

We at Yirifi.ai offer cutting-edge AI-Powered Risk Analysis & Partner Aggregator Platform to navigate the complexities of Web3. We also provide consulting and integration services to ensure our clients get the most out of our solutions. ...

View Company Profile

Data Engineer

Apply Before : 2025-02-06 ((11 Days Left)) View: 518

Job summary

  • No. of Vacancy
    : 1
  • Job Type
    : Full Time
  • Offered Salary
    : Negotiable
  • Gender
    : Both
  • Career Level
    : Mid Level
  • Experience
    : 3 Years
  • Apply Before
    : 2025-02-06 (11 Days Left)
  • Skills
    :
    Python
    AWS

Job Description:

The Data Engineer will play a pivotal role in transforming unstructured data into structured, actionable formats, enabling efficient and reliable data processing across Yirifi’s operations. This includes designing and maintaining data architecture, building robust pipelines, and collaborating with cross-functional teams to deliver scalable, high-quality data solutions. Success in this role will be measured by the ability to convert unstructured data into structured formats that meet business and compliance requirements, enhance decision-making, and drive innovation.

Responsibilities and Deliverables

Design and Build Data Pipelines

  • Develop scalable ETL pipelines to process and transform unstructured data into structured formats using Airflow, AWS Glue, and Python.
  • Ensure pipelines handle diverse data sources (e.g., JSON, XML, text, and raw logs) and support structured outputs like relational databases or parquet files.

Collaborate with Stakeholders

  • Translate business and compliance requirements into technical specifications for data ingestion and transformation.
  • Partner with data scientists and analysts to create tailored data structures for machine learning models and analytics.

Data Quality and Validation

  • Implement automated validation checks for data consistency, completeness, and accuracy using AWS Glue Data Quality or custom Python scripts.
  • Build a reporting dashboard to monitor data quality metrics and pipeline health.

Metadata Management

  • Enhance and maintain metadata in AWS Glue Catalog, ensuring all data sets have clear descriptions, schema definitions, and lineage tracking.
  • Create a centralized metadata repository for easy data discovery and governance.

Pipeline Reliability and Performance

  • Establish monitoring mechanisms (e.g., AWS CloudWatch) to track pipeline performance and detect bottlenecks or failures proactively.
  • Optimize data workflows for speed and cost efficiency, particularly for high-volume unstructured data processing.

Emerging Technologies and Best Practices

  • Identify and pilot emerging tools and frameworks for handling unstructured data (e.g., text parsing with NLP libraries or distributed processing with Spark).
  • Regularly update the team on innovations in data engineering, focusing on areas like schema evolution and real-time data processing.

Key Performance Indicators (KPIs)

  • Pipeline Efficiency: Reduce end-to-end processing time for unstructured data to structured formats by 20%.
  • Data Coverage: Achieve 99% structured data generation from all input unstructured data sets.
  • Error Rates: Maintain data validation error rates below 1% across pipelines.
  • Stakeholder Satisfaction: Ensure all data structures meet defined use cases for analytics, machine learning, or compliance reporting.
  • Metadata Utilization: Ensure 100% of structured datasets are cataloged with complete metadata in AWS Glue Catalog.

Required Knowledge, Skills, and Abilities:

Programming:

  • Strong proficiency in Python, including libraries such as Pandas, NumPy, and data parsing frameworks for unstructured data (e.g., JSON, XML).

AWS Expertise:

  • In-depth experience with AWS services like S3, Glue, Lambda, and CloudWatch for data storage, transformation, and 
  • monitoring.
  • Familiarity with AWS Glue Catalog for metadata management and schema evolution.

Workflow Orchestration:

  • Hands-on experience with Apache Airflow to schedule, monitor, and manage complex data pipelines.

Data Transformation and Processing:

  • Proficiency in designing and implementing ETL/ELT workflows to transform unstructured data into structured formats.
  • Experience with SQL and relational databases to build structured data repositories.
  • Knowledge of tools for handling large-scale unstructured data, such as Spark (preferred but not required).

Data Quality and Governance:

  • Familiarity with data validation tools and frameworks (e.g., Great Expectations) to ensure data quality and integrity.
  • Strong understanding of data governance best practices and compliance requirements.

Soft Skills

  • Analytical Thinking: Strong analytical and problem-solving skills, particularly in working with unstructured data sources.
  • Communication: Excellent communication and collaboration skills, with the ability to translate technical solutions into business outcomes.
  • Adaptability: Ability to work effectively in a fast-paced, dynamic environment with competing priorities.
  • Attention to Detail: A detail-oriented mindset is needed to ensure the accuracy and reliability of data pipelines

Education + Experience:

  • 3+ years of experience in Python, including libraries such as Pandas, NumPy, and data parsing frameworks for unstructured data (e.g., JSON, XML).

Benefits:

Be part of a rapidly growing company revolutionizing crypto risk and compliance. As a Data Engineer at Yirifi, you will have the opportunity to build innovative data solutions that directly impact our mission. Work with cutting-edge technologies, solve complex challenges, and drive meaningful change in the digital assets space. Join us to accelerate your career and make a significant impact

Apply Instruction:

Interested candidates fulfilling the mentioned criteria are encouraged to Apply using the Easy Apply Button below. Registered candidates may also apply using Apply Now Button.