Your intake queue is already loud. A report is due. A partner wants answers. Then a generative AI vendor promises to serve as your strategic technology partner and “save time” with summaries, triage, or a chatbot. That tool might also touch intake notes, safety plans, immigration status, or donor records. The risk isn’t abstract. It’s trust, and trust breaks fast.
An AI Vendor Due Diligence Checklist is the set of questions (and the evidence behind the answers) you collect before you buy, renew, or expand an AI tool. Done well, it protects clients from privacy harm, protects staff from unsafe workflows, and protects your organization’s reputation when boards, funders, or courts ask, “How do you know this is safe?” as part of vendor risk management.
If your systems already feel fragile, start by naming the real constraints (see Common technology challenges faced by legal nonprofits) and then run tighter vendor checks in your procurement process.

Key takeaways
- Privacy: Perform a risk assessment by asking about data privacy and security, what data is used, where it goes, and what the vendor can do with it (in writing).
- Bias: Require proof of testing, ongoing monitoring, and a clear complaint path.
- Explainability: You need reasons, sources, and logs you can audit later.
- Next step: Pick one use case, assign one owner, run a time-boxed pilot with pass/fail criteria including model performance.
AI vendor due diligence checklist for privacy and data safety

In legal aid and court help settings, when evaluating third-party AI tools, “data” isn’t just contact info. It can be domestic violence safety planning, medical details tied to housing, detention history, or notes that reveal where someone sleeps tonight. Treat AI vendor claims like you’d treat a grant budget. Show the work.
A helpful starting point is to align your internal questions with established audit thinking, like the EDPB AI auditing checklist, then narrow it to what your programs actually do.
What data will the AI touch, and what is the vendor allowed to do with it?
Review data handling protocols through these questions to ask (and get in writing):
- What data types are in scope? PII, PHI, case notes, documents, audio, chat logs, donor data.
- What’s the minimum access needed? Least access by default, role-based permissions, separate admin roles.
- What is the purpose limit? “Only to provide the service” should be plain language, not marketing.
- Will our data be used to train or improve models? Include affiliates, subcontractors, and “optional” settings. Require an explicit yes/no.
- What are retention and deletion timelines? Ask for default retention, backup retention, deletion attestations, and details on data handling protocols.
- Can we export our data in standard formats? CSV, JSON, PDF, or your case system’s supported formats.
- How do legal holds work? If you must preserve records, how does the vendor support it without keeping extra data forever?
Simple rule leaders can repeat: if it’s not in the contract, it doesn’t exist.
Can they prove security basics, and will they tell you fast if something goes wrong?
Ask for evidence, not assurance, to ensure data privacy and security:
- Do you have SOC 2 Type II or ISO 27001? Request the report (or at least the bridge letter and scope), including evidence of GDPR compliance.
- Do you run penetration tests and fix findings on a schedule? Ask for a summary, remediation process, and recent security audit results.
- Is data encrypted in transit and at rest? Confirm data encryption key management approach at a high level.
- Do you require MFA for all accounts? Include admins, support staff, and contractors.
- Do we get audit trails? Logging of access, exports, and admin actions (and how long logs are retained).
- What’s your breach notification timeline? Put hours and days in the contract, not “promptly.”
- Who pays for forensics and notification costs if you cause the incident? Don’t leave this vague.
- Who are your sub-processors, and where is data stored? Ask about cross-border transfers, regulatory compliance, and how they isolate your tenant.
A quick “evidence map” helps you keep this board-ready:
What you need to verifyEvidence to requestWhy it mattersSecurity controls exist and operateSOC 2 Type II report or ISO 27001 certificate scopeReduces blind trustReal incident readinessIncident response plan, notification termsTime matters in harm preventionData handling limitsData processing terms, retention scheduleStops quiet reuse of client data
If you want a practical way to tighten incident expectations with vendors, pair these questions with a simple template like CTO Input’s https://ctoinput.com/vendor-incident-response-plan-maker.
Questions that reduce bias risk and protect clients from unfair outcomes
Bias in AI isn’t a debate club topic. In justice work, algorithmic bias can show up as harm: people routed to the wrong program, language access gaps, “fraud” flags that waste staff time, or eligibility screens that quietly push clients away. Strong bias mitigation protects clients from unfair outcomes.
Courts and justice partners are actively wrestling with this in 2025, and the NCSC AI Readiness for the State Courts (2025) is a useful reference for governance and vendor engagement expectations, even if you’re not a court.
How do you test for bias before go-live, and what do you monitor after launch?
Ask the vendor to show how they measure, manage, and monitor risk in plain terms as part of ethical AI practices:
- What fairness tests do you run, including on training data sources, and on which groups? Language, disability status, geography, race or ethnicity (when lawful and available), income proxies.
- What metrics do you use? Ask them to define the metric and the tradeoff, not just name it.
- How often do you re-test after launch? Quarterly is a reasonable starting point for higher-risk uses.
- Can we run a pilot using our own realistic cases? Even a small sample can reveal routing and tone problems.
- Do you support independent review? Third-party audits, red teaming, or an external evaluation.
For a social-sector oriented vendor review tool, the MERL Tech assessment for AI-enabled service providers offers questions that translate well to legal nonprofits.
What is the human override plan when the AI gets it wrong?
Require a simple, staff-friendly plan:
- Challenge and override: Staff can correct outputs, document why, and move on.
- Escalation path: Who at the vendor responds, and by when.
- Feedback loop: Overrides become fixes that improve model performance, not just buried exceptions.
- Safety controls: The system shouldn’t give unsafe advice (especially in DV, immigration, or urgent housing).
- Decision boundaries: Keep AI as decision support unless leadership explicitly approves automation. Any automated denial or other high-impact action should require sign-off.
Explainability and accountability, can you defend the AI’s output to a board, funder, or client?
Explainability sounds technical until you picture the meeting: “Why was this case prioritized?” “Why did the chatbot say that?” “Why did the document get flagged?” If you can’t answer in plain language, you’re carrying hidden risk.
A solid AI Vendor Due Diligence Checklist treats explainability as part of accountability, not a nice-to-have. For an industry view of vendor assessment structure and model governance, the Data & Trusted AI Alliance AI Vendor Assessment Framework and the NIST AI RMF are helpful models for organizing these requirements.
What explanations do we get for model explainability, and are they understandable to non-technical staff?
Ask for demonstrations using your scenarios:
- Reasons and key factors: What inputs influenced the output, in human terms.
- Sources and citations: For summaries or Q&A, require references (and a way to open them).
- Uncertainty handling: What happens when the model isn’t confident. Does it say so, or guess?
- Hallucination controls: Guardrails, retrieval, and “don’t answer” behavior for risky topics.
- Audit-ready logs: Explanations should be captured in logs, not only shown on screen.
What documentation and rights do we need in the contract?
If you can’t verify it, you can’t govern it. Start with an AI vendor questionnaire to gather the necessary data factors, then subject the contract to legal review:
- Change notices when the model, prompts, or safety settings change.
- Transparency reports for higher-risk uses (what’s collected, what’s monitored, what’s improved).
- Audit rights (or at least the right to receive independent audit results) to support regulatory compliance.
- SLA terms for uptime, support response, and security incident handling.
- Exit plan: data return, deletion attestation, and migration help.
A simple line that holds up in board minutes: you need the right to test, the right to audit, and the right to leave. Always confirm via legal review.
Conclusion: make due diligence a safety practice, not a paperwork sprint
Strong due diligence, including risk assessment and reputational checks, protects people first. It also reduces surprises, calms staff anxiety, and gives you a story you can defend when questions come.
Integrate strong due diligence into your procurement process. Start this week:
- Pick the use case, focusing on integration capabilities (one workflow, one team, one outcome).
- Run the questions above and require artifacts, not promises.
- Do a time-boxed pilot with a clear implementation timeline, pass/fail criteria, testing integration capabilities, and an owner who can say “stop” if needed.
FAQ
What is an AI vendor due diligence checklist?
It’s an AI Vendor Due Diligence Checklist, a key part of vendor risk management: a set of questions and evidence you collect before using an AI tool, so you know what it does, what data it touches, and what risks you’re accepting.
Do we need SOC 2?
Not always, but you need some proof of security controls. SOC 2 Type II or ISO 27001 are common forms of proof.
Can we use AI without sharing client data?
Sometimes. You can pilot with fake data, de-identified samples, or tools that don’t retain prompts. Confirm this in writing.
How do we test bias with small datasets?
Use a pilot with real scenarios, compare outputs across groups you serve, and track errors. Small tests still reveal patterns.
What counts as explainable?
A non-technical staff member can say why the AI gave an answer, what it used as sources, and what to do when it’s unsure.
If you want board-ready help with ROI calculation, shaping requirements like an AI vendor questionnaire, and running a safe pilot, schedule a practical clarity call: https://ctoinput.com/schedule-a-call. Which single chokepoint, if fixed, would unlock the most capacity, scalability, and trust in the next quarter?