Job Title : Data Scientist GenAI Models and AI Agentic Systems
Location : Remote candidates are fine
contract
Look for senior resource
Position Overview:
We are seeking a Data Scientist with strong expertise in evaluating open-source Generative AI models, creating
compelling data visualizations, and designing agentic AI workflows. This is a unique opportunity to work on the
frontier of Generative AI, validating LLMs and agents that power critical enterprise Legal use cases. This position
will work within the Legal department. While there is some flexibility in location, work hours for this position are
expected to largely align to US standard working days.
The ideal candidate is fluent in the nuances of GenAI model performance, understands both open-source models
(e.g., LLaMA 3, Mistral, Falcon, Gemma) and enterprise-grade models like Granite, and is comfortable
developing multi-agent orchestration pipelines using frameworks such as LangGraph, CrewAI, AutoGen, or
LangChain Agents. You will also collaborate with MLOps teams to deploy and evaluate models using platforms
like Hugging Face, MLflow, and Weights & Biases.
Key Responsibilities:
Model Validation & Evaluation
Design rigorous evaluation pipelines for GenAI models, including IBM Granite models and open-source
alternatives hosted on Hugging Face Hub.
Define metrics and build test suites to assess model behavior across factuality, coherence, bias, safety,
hallucination, and robustness.
Leverage tools such as Ragas, TruLens, Giskard, or LM Eval Harness to compare performance across
prompts, tasks, and domains.
Data Visualization & Insight Communication
Create compelling dashboards and visual narratives to track model performance, error distributions, and
drift over time.
Use libraries like Plotly, Dash, Seaborn, or Matplotlib to build visuals that support real-time and
retrospective evaluations.
Summarize and present findings to cross-functional teams, including product owners, compliance
stakeholders, and LLM developers.
Agentic Frameworks & AI Workflows
Develop and validate multi-agent systems using frameworks like LangGraph, CrewAI, LangChain
Agents, or AutoGen.
Optimize agent roles, memory, and tool selection for task-oriented pipelines in legal, enterprise, or research
domains.
Evaluate how different orchestration strategies (e.g., sequential vs. parallel, static vs. dynamic role
assignment) affect task quality and efficiency.
Required Qualifications:
3+ years of experience in data science, NLP, or ML research roles, or equivalent academic credentials.
Practical experience evaluating open-source LLMs (e.g., LLaMA, Mistral, Falcon, Gemma) and
familiarity with IBM Granite model capabilities.
Fluency in Python and data science tools: Pandas, NumPy, Scikit-learn, Jupyter, and at least one
visualization library.
Familiarity with GenAI evaluation frameworks and benchmarks (e.g., TruthfulQA, MMLU, BBQ,
Hellaswag).
Experience with versioned model repositories and libraries via Hugging Face Transformers and Datasets.
Minimum Master's degree in computer science, Data Science, math or related
Preferred Qualifications:
Demonstrated experience with agentic frameworks (e.g., LangGraph, CrewAI, AutoGen, or LangChain
and Langflow Agents).
Knowledge of Retrieval-Augmented Generation (RAG), vector databases (e.g., FAISS, Weaviate,
Chroma), and hybrid retrieval strategies.
Exposure to prompt tuning, chain-of-thought prompting, function/tool calling, and memory-aware agents.
Prior work in regulated environments with a focus on explainability, auditing, or trustworthy AI.
Experience deploying and monitoring models in cloud platforms (e.g., AWS SageMaker, Azure ML, IBM
watsonx.ai
High Tech Company based in Nanshan, Shenzhen is looking for an experienced Industrial and Product Designer. The ideal designer will work closely with our Creative Team in market research, mood board concept creation, designing and creating spec files for production. You...
...analysts and assistant behavior analysts in the preparation of client materialsRequirements - Entry Level Experience: RBT Certificate requiredChildcare: 1 year (Preferred)Healthcare: 1 year (Preferred)Education: High School Diploma or equivalent (Required)License: CPR...
...airport locations, 18 countries, and on 5 continents. Are you ready to take off on your next career with us? Job SummaryThe role of Controllers function as the central clearing house for all data and information for the local operation and the customer. In this role, the...
Class A Vactor Truck Operator JobVactor Truck OperatorDenver, CO 80229Are you ready to work for a company that will provide you with all the tools to help you build a long-lasting and successful career? Youre not stuck. WE WANT YOU! Join our team, and help us achieve...
...The Traveling General Heavy Equipment Operatormust be willing to travel up to 100% of the time, with an average of three weeks on the... ..., scoop, lift beam and swivel hook, fork grapple, clamps, elevating platform or trailer hitch to move materials. Operates compressors...