ML2021 Talk 13 - NIPS2020

# Talk 13 - NeurIPS 2020

## Interactive Proofs for Verifying Machine Learning

![:scale 80%](intro.jpg)

06/08/2021 - Talk 13 - Saurabh B

---
class: middle

## Information

* Github Repository - https://github.com/bakkenbaeck/ml-seminar-information-theory

* Original talk - "Robustness, Verification, Privacy: Addressing Machine Learning Adversaries"

* Focus on "Verification"

???
- Practical information about updates to index page and talks on the github repo

- Original talk had three parts, explain each part

- Cryptography inspired models and results to address three challenges that emerge when worst-case adversaries enter the machine learning landscape.

- These challenges include verification of machine learning models given limited access to good data

- training at scale on private training data

- robustness against adversarial examples controlled by worst case adversaries.

- robustness -  accurate predictions when test distribution adversarial deviates from training distribution
---
class: img-left

## Author Bio1

.left-column[
![:scale 100%](author.jpeg) Shafrira Goldwasser
] 
.right-column[
* Professor of Electrical Engineering and Computer Science at MIT

* Professor of mathematical sciences at the Weizmann Institute of Science, Israel

* Winner of the Turing Award in 2012, Gödel Prize (1993, 2001)

* Area of interest include: Theoretical computer science, Cryptography
]

.footnote[.red.bold[1:] https://simons.berkeley.edu/people/shafi-goldwasser]
???
- Prestigious Turing award (Nobel prize of Computing!) and many more awards

- One of the inventor of Zero knowledge proofs

- Presentation and paper co-author with other researchers
---
class: middle

## Addressing Machine Learning Adversaries: Verification

* Based on the paper "Interactive Proofs for Verifying Machine Learning" by Goldwasser et al.

* Verifying hypothesis given distribution

???
- given trained model how to verify the correctness for model and other relevant properties
---
class: middle

## Verifibility: Why?

* True fact, Fake fact

* Доверяй, но проверяй

???
- increasing lack of trust in companies, how to check models they built do what they say they do

- Doveryay, no proveryay

- can technology help?

- even if model can be verified, can it be backed legally?

---
class: middle

## Example: Cash bail risk assessment

* Unanimous agreement that cash bail is classist, unfair and needs to end

* Replace with system based on statistical data

* Rejected by large margin

* Cash bail opponents would like to end cash bail but don't trust algorithms to make a fair decision

* Another example - [ShotSpotter](https://www.eff.org/deeplinks/2021/07/its-time-police-stop-using-shotspotter)
???
- Example of models used in practical setting and why verification is important

- California Proposition 25, replace cash bail with risk assessment tool

- risk assessment system would automatically decide if suspect would stay in jail or not based on past statistical data

- Problem with rich vs poor suspect with same legal issues

- belonging to the right group

- Bail bond industry exploits these problem as people

- Cash bail opponents don't trust the algorithms and people who will build it

---
class: middle

## Ungoverned Algorithms2

* Data driven algorithms can further complicate existing issue

* Algorithms can be accurate and efficient but might not hold the underline values & concept that are more important

* Varying governance with laws, markets, ethics

.footnote[.red.bold[2:] https://simons.berkeley.edu/workshops/fairness-workshop-1]
???
- algorithms might undermine democratic values, institutional standards, principles, accountability

- who & what determines fitness, inclusion in new technological landscape

- governance: participatory, responsive, inclusive, law abiding, accurate/efficient

- new system might lead to more stigma, group discrimination

---
class: middle

## Technical concerns

* Who builds the algorithms?

* Who does the model verification part?
???
- for us as developer or researchers we have two main concerns

- what past data algorithm developers have, where is the training done

- if randomness is used (for sampling), noisy data cleaning, etc

- access to "good" data

- even if problem above is solved and we assume that prover has good data and randomness

- do verifiers have access to the code, is it open source?

- do verifiers have access to historical data

- can underline concepts be verified, is functional verification possible

- How to ensure if algorithm verified is the same one as deployed, patched buggy code (we won't talk about it)

---
class: middle

## Verification w.r.t software systems

* County in California accidentally discovered that tabulation software they used dropped 197 paper ballots!

* A (voting) system is software independent if an undetected change/error cannot cause undetected change/error in outcome3

* Need of verification for ML system

.footnote[.red.bold[3:] Rivest, Wack: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.310.3729&rep=rep1&type=pdf]
???
- The accident forced country to drop technology and revert back older voting tech for following election in 2008

- So can NOT trust physical machines blindly, they might be buggy, poorly designed etc.

- need paper trail (i.e. verification!)

- Even with system in industries like aviation, banking where do trust machines: undetected change/error results in detectable change/error

- Similarly in context of ML we need verification!

---
class: middle

## Tools for verification

* Model checking/ Formal verification

* Logic, Probabilistic model checking -CPAchecker, UPPAAL, PRISM (not the NSA one!)

* Zero knowledge proofs
???
- Formal verification: rigorous, mathematics-based technique to check the correctness of a system

- Probabilistic model checking is a formal verification technique for modelling and analysing systems that exhibit probabilistic behavior

- Model checking is field on it's own and depends on the system, example rockets, cars, airplane hardware verification

- checking if system meets the given specification (in automata theory a finite state machine)

- Model system in finite state machine, write properties in propositional temporal logic, do exhaustive search of the state space

- Above verification system still has straightforward goal but in ML system we don't know what to verify exactly

- Probabilistic model checking involves using stostic processes like markov chain and Probabilistic temporal logic specification to check the model

- above used in planes and sensitive system for detecting hardware failure (aribus?)

---
class: middle

## Short detour: (Interactive) Zero Knowledge Proofs

* Method though which one system can prove possession of value/information to other system without revealing anything about the value/information

* The proof statement consists only of the fact that the prover possesses the secret information.

* Notable practical application: Avoiding WWIII, authentication, blockchain (Zcash)4

.footnote[.red.bold[4:] https://en.wikipedia.org/wiki/Zero-knowledge_proof#Practical_examples]
???
- explain with example on next slide

---
class: middle

## The Ali Baba cave

![:scale 30%](zero_knowledge_proof.png)

* Peggy has uncovered the secret word used to open a magic door in a cave

* Victor wants to know whether Peggy knows the secret word

???
- but Peggy, being a very private person, does not want to reveal her knowledge to Victor or to reveal the fact of her knowledge to the world in general.
 
--

* Peggy takes either path A or B; Victor is not allowed to see which path she takes
???
- Then, Victor enters the cave and shouts the name of the path he wants her to use to return, either A or B, chosen at random. 
- Providing she really does know the magic word, this is easy: she opens the door, if necessary, and returns along the desired path.

* What happens if she did not know the word?

???
- Then, she would only be able to return by the named path if Victor were to give the name of the same path by which she had entered. 
- Since Victor would choose A or B at random, she would have a 50% chance of guessing correctly. 
- If they were to repeat this trick many times, say 20 times in a row, her chance of successfully anticipating all of Victor's requests would become vanishingly small (about one in a million).

* If Peggy repeatedly appears at the exit Victor names, he can conclude that it is extremely probable that Peggy does, in fact, know the secret word.

---

## Hypothesis verification setting

* The prover suggests a hypothesis and the verifier either accepts or rejects this suggestion

* We assume that honest prover has high quality data and trained a good model

* Verifier can easily test accuracy

* When does \$V\$ accepts or rejects the hypothesis?
]

???
- Prover and verifier can have different access to the data both in quantity and quality

- They exchange messages with each other.

- we obviously want verifier to work less than that of the prover

- go through infimum, PAC learning to build basis for verification theory
---
class: middle

## Infimum

* Let S be subset of ordered set F

* Infimum of S is the greatest member for F that is less than equal to all members of S

* Example: \$\mathbb{N} = \\{ 1, 2, 3, ... \\} \$

???
- in this example natural number can be considered to be subset of ordered set of real number
--

* 0 seems to a good candidate for inf(\$\mathbb{N}\$) as \$ 0 \leq n \$ \$ \forall n \in \mathbb{N} \$

* but inf(\$\mathbb{N}\$) is not 0 since \$ 1 \leq n \$ \$ \forall n \in \mathbb{N} \$

$$inf(\mathbb{N}) = 1$$

---
class: middle

## PAC Learning

* Probably Approximately Correct Learning Theory

* Theoretical framework for analysing machine learning algorithms

???
- Probably approximately correct learning

- PAC learning theory manifests results of machine learning algorithms the same way children learn the differences between cats, dogs, apples and chairs.

- Young children can sort apples from balls even when both are round and red. — Leslie Valiant, creator of PAC
--

* Let's say we want to learn concept \$C\$ from some unknown distribution \$D\$ given labelled training data \$\\{x,f(x)\\}\$

* Define loss function: \$L(h)=Pr_{x \in D} [h(x) \neq f(x)]\$

* Optimum \$o =\$ \$ inf_{(h)\in H} L(h) \$

* \$ \forall \epsilon, \delta \geq 0\$ PAC learning efficiently generates hypothesis \$h \in H\$ such that

$$Pr_{x \in D} [h(x) \neq f(x) > \epsilon] \leq \delta$$

???
- \$h(x) \neq f(x)\$ is generalization error

- efficiently: the sample size will be function of \$\epsilon, \delta, |c|, n\$

- A concept is PAC learnable if there exists an algorithm such that for \$\epsilon, \delta\$, unknown distribution, m (sample size) we have Prob(error>epsilon) < delta 
--

$$Pr [ L(h) - o > \epsilon] \leq \delta$$

* Probably: \$\delta\$
* Approximately: \$\epsilon\$

---
class: Middle

## Half line Learning

![:scale 100%](pac_example_1.png) Concept class \$c\$

???
- Could be temperature or any number of practical learning class

- We would like to learn this concept class
--

![:scale 100%](pac_example_2.png) Hypothesis \$h\$

???
- We simply pick some hypothesis h that is consistent with data from sampling

- We can do this by scanning the test data in ascending sorted order until we find the greatest
negatively labeled point and the smallest positively labeled point.

- We then set the threshold of our half-line anywhere in this interval.
---
class: middle

## Half line Learning

![:scale 100%](pac_example_3.png) Error gap \$\epsilon\$

???
- The generalization error of h is the probability mass between h and c on both side

- There are two cases where errD(h) > \$\epsilon\$.
- Let B+ be the event that h is more than \$\epsilon\$ probability mass to the right of c
- and let B−be the event that h is more than \$\epsilon\$ probability mass to the left of c.
--

* We will have not so accurate model if there is no training data from these error gaps

* Say probability of getting from the error gap is \$\epsilon\$

* So probability of NOT getting any of the points from one of the error gap (\$\bigtriangleup\$) is

$$\prod_{i=1}^{m} Pr[x_i \notin \bigtriangleup] = (1 - \epsilon)^m $$

???
- Equation takes one error gap at a time, apply symmetry later

- Sampling is independent, product rule can be used

---
class: middle

## Half line Learning

$$\prod_{i=1}^{m} Pr[x_i \notin \bigtriangleup] = (1 - \epsilon)^m $$

* Using \$1 - x \leq e^{-x}\$

$$\prod_{i=1}^{m} Pr[x_i \notin \bigtriangleup] = (1 - \epsilon)^m \leq e^{-\epsilon m}$$

* Considering both of the gaps

$$Pr [ o - L(h) > \epsilon] \leq 2e^{-\epsilon m}$$

* We want \$Pr [ o - L(h) > \epsilon] \leq \delta\$, so

$$2e^{-\epsilon m} \leq \delta$$

$$e^{\epsilon m} \geq {2 \over \delta}$$

$$m \geq {1 \over \epsilon} \ln{2 \over \delta}$$

---
class: middle

## PAC in General

* For finite hypothesis space, i.e \$|H| < \infty\$

* If there is algorithm A that find hypothesis \$ h \in H\$ consistent with \$m\$ examples

$$m \geq {1 \over \epsilon}  \left( \ln |H| + \ln{1 \over \delta} \right)$$

* Then

$$Pr [ o - L(h_{A}) > \epsilon] \leq \delta$$

* To PAC learn concept class \$C\$ with VC dimension \$d\$, needs labeled sample5

$$\Theta \left( {d \over \epsilon} \ln{1 \over \epsilon} \right)$$

- Using similar techniques used in last example along with conditional probability on independent events

- Vapnik-Chervonenkis dimension measures complexity of the model

- model complexity determines performance/cost of training/testing

- VC is measured by ability of hypothesis (once you have it) to shatter n points (classify points into all possible labels)

- big theta is same as big O with additonal bound below asymptotically (limit, close curve)

---
class: middle

## PAC Verification

* Class of hypothesis \$H\$ is PAC verifiable if there is verifier \$V\$ and

* \$\forall \epsilon, \delta \geq 0 \$ and \$\forall\$ distributions D:

$$L(h)=Pr_{x \in D} [h(x) \neq f(x)]$$

$$o = inf_{(h)\in H} L(h)$$

* \$L_D\$: Loss function w.r.t to distribution \$D\$

* Completeness: $$Pr [ L(h) \leq o + \epsilon] \geq 1- \delta$$

* Soundness: $$Pr [ L(h) > o + \epsilon] \leq \delta$$
???
- Introduced new concept PAC verification based on PAC learning

- Completeness: if the statement is true, the honest verifier (that is, one following the protocol properly) will be convinced of this fact by an honest prover.

- Soundness: if the statement is false, no cheating prover can convince the honest verifier that it is true, except with some small probability.

- completeness and soundness mirrors (or taken from) completeness and soundness in interactive proof system

- difference from interactive proof system: we are accepting when proof is correct and rejecting when it is wrong.

- we are accepting when hypoethisis is close to the best that can be done in the class and rejecting when it's far by \$\epsilon\$
---
class: middle

### PAC verification

* Double efficiency 
???

- prover should be efficent than the learner

- we want verifier to be more efficent that learning

* Can verifying be lot cheaper than learning?

* Qualitative and Quantitative

???

- less amount of samples for verifier compared to learner/prover

- type of access to data might be different compared to learner/prover

---
class: middle

## Easy verification (\$opt = 0\$)

* Verifier can take \$O({1 \over \epsilon})\$ random samples and accept h if and only if h classifies all samples correctly

* If \$L(h)\\ \geq \epsilon\$ reject with constant probability

* PAC learning takes \$\Theta \left( d \over \epsilon \right)\$ where \$d = VC(H)\$ (can be arbitrarily large)

???
- Can go back to half line example

- there is no loss
---
class: middle

## Non trivial parts

$$L(h) \leq o + \epsilon$$

* While it's easy to get/find loss, it's quite difficult to find optimum

* Need to prove that optimum is large: lower and upper bounds, qualitative and quantitative

???
- case when there is no hypothesis such that all samples are classified correctly

- lower bound both prover and verifier require same amount of sample

- quantitative upper bound: both are efficient

- qualitative upper bound: verifying is possible with random sample but proving is not

---
class: middle

## Access & Legality

* Access to algorithm in black box manner

* Use of zero knowledge proofs to verify encrypted models?

* Can courts (and other institutes) properly and understand evaluate these system?

???
- Institute might find companies to build these algorithms but they might not give open source access

- Zero knowledge proofs from earlier

- Case from 1868 where product rule for probabilities was used incorrectly for a Forgery trial
---

## Future work

* Sample complexity gap

* Applications!

* Include properties such as fairness, safety

???
- The sample complexity of a machine learning algorithm represents the number of training-samples that it needs in order to successfully learn a target function.

- Real world hypothesis class like neural network etc

- learning more of underline concepts
---
# Thank You!

# Thank You!