class: theme layout: true --- class: center, middle # Talk 13 - NeurIPS 2020 ## Interactive Proofs for Verifying Machine Learning ![:scale 80%](intro.jpg) 06/08/2021 - Talk 13 - Saurabh B --- class: middle ## Information * Github Repository - https://github.com/bakkenbaeck/ml-seminar-information-theory * Original talk - "Robustness, Verification, Privacy: Addressing Machine Learning Adversaries" * Focus on "Verification" ??? - Practical information about updates to index page and talks on the github repo - Original talk had three parts, explain each part - Cryptography inspired models and results to address three challenges that emerge when worst-case adversaries enter the machine learning landscape. - These challenges include verification of machine learning models given limited access to good data - training at scale on private training data - robustness against adversarial examples controlled by worst case adversaries. - robustness - accurate predictions when test distribution adversarial deviates from training distribution --- class: img-left ## Author Bio
1
.left-column[ ![:scale 100%](author.jpeg) Shafrira Goldwasser ] .right-column[ * Professor of Electrical Engineering and Computer Science at MIT * Professor of mathematical sciences at the Weizmann Institute of Science, Israel * Winner of the Turing Award in 2012, Gödel Prize (1993, 2001) * Area of interest include: Theoretical computer science, Cryptography ] .footnote[.red.bold[1:] https://simons.berkeley.edu/people/shafi-goldwasser] ??? - Prestigious Turing award (Nobel prize of Computing!) and many more awards - One of the inventor of Zero knowledge proofs - Presentation and paper co-author with other researchers --- class: middle ## Addressing Machine Learning Adversaries: Verification * Based on the paper "Interactive Proofs for Verifying Machine Learning" by Goldwasser et al. * Verifying hypothesis given distribution ??? - given trained model how to verify the correctness for model and other relevant properties --- class: middle ## Verifibility: Why? * True fact, Fake fact * Доверяй, но проверяй ??? - increasing lack of trust in companies, how to check models they built do what they say they do - Doveryay, no proveryay - can technology help? - even if model can be verified, can it be backed legally? --- class: middle ## Example: Cash bail risk assessment * Unanimous agreement that cash bail is classist, unfair and needs to end * Replace with system based on statistical data * Rejected by large margin * Cash bail opponents would like to end cash bail but don't trust algorithms to make a fair decision * Another example - [ShotSpotter](https://www.eff.org/deeplinks/2021/07/its-time-police-stop-using-shotspotter) ??? - Example of models used in practical setting and why verification is important - California Proposition 25, replace cash bail with risk assessment tool - risk assessment system would automatically decide if suspect would stay in jail or not based on past statistical data - Problem with rich vs poor suspect with same legal issues - belonging to the right group - Bail bond industry exploits these problem as people - Cash bail opponents don't trust the algorithms and people who will build it --- class: middle ## Ungoverned Algorithms
2
* Data driven algorithms can further complicate existing issue * Algorithms can be accurate and efficient but might not hold the underline values & concept that are more important * Varying governance with laws, markets, ethics .footnote[.red.bold[2:] https://simons.berkeley.edu/workshops/fairness-workshop-1] ??? - algorithms might undermine democratic values, institutional standards, principles, accountability - who & what determines fitness, inclusion in new technological landscape - governance: participatory, responsive, inclusive, law abiding, accurate/efficient - new system might lead to more stigma, group discrimination --- class: middle ## Technical concerns * Who builds the algorithms? * Who does the model verification part? ??? - for us as developer or researchers we have two main concerns - what past data algorithm developers have, where is the training done - if randomness is used (for sampling), noisy data cleaning, etc - access to "good" data - even if problem above is solved and we assume that prover has good data and randomness - do verifiers have access to the code, is it open source? - do verifiers have access to historical data - can underline concepts be verified, is functional verification possible - How to ensure if algorithm verified is the same one as deployed, patched buggy code (we won't talk about it) --- class: middle ## Verification w.r.t software systems * County in California accidentally discovered that tabulation software they used dropped 197 paper ballots! * A (voting) system is software independent if an undetected change/error cannot cause undetected change/error in outcome
3
* Need of verification for ML system .footnote[.red.bold[3:] Rivest, Wack: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.310.3729&rep=rep1&type=pdf] ??? - The accident forced country to drop technology and revert back older voting tech for following election in 2008 - So can NOT trust physical machines blindly, they might be buggy, poorly designed etc. - need paper trail (i.e. verification!) - Even with system in industries like aviation, banking where do trust machines: undetected change/error results in detectable change/error - Similarly in context of ML we need verification! --- class: middle ## Tools for verification * Model checking/ Formal verification * Logic, Probabilistic model checking -CPAchecker, UPPAAL, PRISM (not the NSA one!) * Zero knowledge proofs ??? - Formal verification: rigorous, mathematics-based technique to check the correctness of a system - Probabilistic model checking is a formal verification technique for modelling and analysing systems that exhibit probabilistic behavior - Model checking is field on it's own and depends on the system, example rockets, cars, airplane hardware verification - checking if system meets the given specification (in automata theory a finite state machine) - Model system in finite state machine, write properties in propositional temporal logic, do exhaustive search of the state space - Above verification system still has straightforward goal but in ML system we don't know what to verify exactly - Probabilistic model checking involves using stostic processes like markov chain and Probabilistic temporal logic specification to check the model - above used in planes and sensitive system for detecting hardware failure (aribus?) --- class: middle ## Short detour: (Interactive) Zero Knowledge Proofs * Method though which one system can prove possession of value/information to other system without revealing anything about the value/information * The proof statement consists only of the fact that the prover possesses the secret information. * Notable practical application: Avoiding WWIII, authentication, blockchain (Zcash)
4
.footnote[.red.bold[4:] https://en.wikipedia.org/wiki/Zero-knowledge_proof#Practical_examples] ??? - explain with example on next slide --- class: middle ## The Ali Baba cave ![:scale 30%](zero_knowledge_proof.png) * Peggy has uncovered the secret word used to open a magic door in a cave * Victor wants to know whether Peggy knows the secret word ??? - but Peggy, being a very private person, does not want to reveal her knowledge to Victor or to reveal the fact of her knowledge to the world in general. -- * Peggy takes either path A or B; Victor is not allowed to see which path she takes ??? - Then, Victor enters the cave and shouts the name of the path he wants her to use to return, either A or B, chosen at random. - Providing she really does know the magic word, this is easy: she opens the door, if necessary, and returns along the desired path. -- * What happens if she did not know the word? ??? - Then, she would only be able to return by the named path if Victor were to give the name of the same path by which she had entered. - Since Victor would choose A or B at random, she would have a 50% chance of guessing correctly. - If they were to repeat this trick many times, say 20 times in a row, her chance of successfully anticipating all of Victor's requests would become vanishingly small (about one in a million). -- * If Peggy repeatedly appears at the exit Victor names, he can conclude that it is extremely probable that Peggy does, in fact, know the secret word. --- ## Hypothesis verification setting .left-column[ ![:scale 100%](hypothesis.png) ] .right-column[ * Both prover and verifier has (unequal) access to a distribution * The prover suggests a hypothesis and the verifier either accepts or rejects this suggestion * We assume that honest prover has high quality data and trained a good model * Verifier can easily test accuracy * When does \\(V\\) accepts or rejects the hypothesis? ] ??? - Prover and verifier can have different access to the data both in quantity and quality - They exchange messages with each other. - we obviously want verifier to work less than that of the prover - go through infimum, PAC learning to build basis for verification theory --- class: middle ## Infimum * Let S be subset of ordered set F * Infimum of S is the greatest member for F that is less than equal to all members of S * Example: \\(\mathbb{N} = \\{ 1, 2, 3, ... \\} \\) ??? - in this example natural number can be considered to be subset of ordered set of real number -- * 0 seems to a good candidate for inf(\\(\mathbb{N}\\)) as \\( 0 \leq n \\) \\( \forall n \in \mathbb{N} \\) -- * but inf(\\(\mathbb{N}\\)) is not 0 since \\( 1 \leq n \\) \\( \forall n \in \mathbb{N} \\) -- $$inf(\mathbb{N}) = 1$$ --- class: middle ## PAC Learning * Probably Approximately Correct Learning Theory * Theoretical framework for analysing machine learning algorithms ??? - Probably approximately correct learning - PAC learning theory manifests results of machine learning algorithms the same way children learn the differences between cats, dogs, apples and chairs. - Young children can sort apples from balls even when both are round and red. — Leslie Valiant, creator of PAC -- * Let's say we want to learn concept \\(C\\) from some unknown distribution \\(D\\) given labelled training data \\(\\{x,f(x)\\}\\) * Define loss function: \\(L(h)=Pr_{x \in D} [h(x) \neq f(x)]\\) * Optimum \\(o =\\) \\( inf_{(h)\in H} L(h) \\) -- * \\( \forall \epsilon, \delta \geq 0\\) PAC learning efficiently generates hypothesis \\(h \in H\\) such that $$Pr_{x \in D} [h(x) \neq f(x) > \epsilon] \leq \delta$$ ??? - \\(h(x) \neq f(x)\\) is generalization error - efficiently: the sample size will be function of \\(\epsilon, \delta, |c|, n\\) - A concept is PAC learnable if there exists an algorithm such that for \\(\epsilon, \delta\\), unknown distribution, m (sample size) we have Prob(error>epsilon) < delta -- $$Pr [ L(h) - o > \epsilon] \leq \delta$$ -- * Probably: \\(\delta\\) * Approximately: \\(\epsilon\\) --- class: Middle ## Half line Learning ![:scale 100%](pac_example_1.png) Concept class \\(c\\) ??? - Could be temperature or any number of practical learning class - We would like to learn this concept class -- ![:scale 100%](pac_example_2.png) Hypothesis \\(h\\) ??? - We simply pick some hypothesis h that is consistent with data from sampling - We can do this by scanning the test data in ascending sorted order until we find the greatest negatively labeled point and the smallest positively labeled point. - We then set the threshold of our half-line anywhere in this interval. --- class: middle ## Half line Learning ![:scale 100%](pac_example_3.png) Error gap \\(\epsilon\\) ??? - The generalization error of h is the probability mass between h and c on both side - There are two cases where errD(h) > \\(\epsilon\\). - Let B+ be the event that h is more than \\(\epsilon\\) probability mass to the right of c - and let B−be the event that h is more than \\(\epsilon\\) probability mass to the left of c. -- * We will have not so accurate model if there is no training data from these error gaps -- * Say probability of getting from the error gap is \\(\epsilon\\) * So probability of NOT getting any of the points from one of the error gap (\\(\bigtriangleup\\)) is $$\prod_{i=1}^{m} Pr[x_i \notin \bigtriangleup] = (1 - \epsilon)^m $$ ??? - Equation takes one error gap at a time, apply symmetry later - Sampling is independent, product rule can be used --- class: middle ## Half line Learning $$\prod_{i=1}^{m} Pr[x_i \notin \bigtriangleup] = (1 - \epsilon)^m $$ -- * Using \\(1 - x \leq e^{-x}\\) $$\prod_{i=1}^{m} Pr[x_i \notin \bigtriangleup] = (1 - \epsilon)^m \leq e^{-\epsilon m}$$ -- * Considering both of the gaps $$Pr [ o - L(h) > \epsilon] \leq 2e^{-\epsilon m}$$ -- * We want \\(Pr [ o - L(h) > \epsilon] \leq \delta\\), so $$2e^{-\epsilon m} \leq \delta$$ -- $$e^{\epsilon m} \geq {2 \over \delta}$$ -- $$m \geq {1 \over \epsilon} \ln{2 \over \delta}$$ --- class: middle ## PAC in General * For finite hypothesis space, i.e \\(|H| < \infty\\) * If there is algorithm A that find hypothesis \\( h \in H\\) consistent with \\(m\\) examples $$m \geq {1 \over \epsilon} \left( \ln |H| + \ln{1 \over \delta} \right)$$ * Then $$Pr [ o - L(h_{A}) > \epsilon] \leq \delta$$ -- * To PAC learn concept class \\(C\\) with VC dimension \\(d\\), needs labeled sample
5
$$\Theta \left( {d \over \epsilon} \ln{1 \over \epsilon} \right)$$ .footnote[.red.bold[5:] https://www.sciencedirect.com/science/article/pii/0890540189900023] ??? - Using similar techniques used in last example along with conditional probability on independent events - Vapnik-Chervonenkis dimension measures complexity of the model - model complexity determines performance/cost of training/testing - VC is measured by ability of hypothesis (once you have it) to shatter n points (classify points into all possible labels) - big theta is same as big O with additonal bound below asymptotically (limit, close curve) --- class: middle ## PAC Verification * Class of hypothesis \\(H\\) is PAC verifiable if there is verifier \\(V\\) and * \\(\forall \epsilon, \delta \geq 0 \\) and \\(\forall\\) distributions D: $$L(h)=Pr_{x \in D} [h(x) \neq f(x)]$$ $$o = inf_{(h)\in H} L(h)$$ * \\(L_D\\): Loss function w.r.t to distribution \\(D\\) -- * Completeness: $$Pr [ L(h) \leq o + \epsilon] \geq 1- \delta$$ * Soundness: $$Pr [ L(h) > o + \epsilon] \leq \delta$$ ??? - Introduced new concept PAC verification based on PAC learning - Completeness: if the statement is true, the honest verifier (that is, one following the protocol properly) will be convinced of this fact by an honest prover. - Soundness: if the statement is false, no cheating prover can convince the honest verifier that it is true, except with some small probability. - completeness and soundness mirrors (or taken from) completeness and soundness in interactive proof system - difference from interactive proof system: we are accepting when proof is correct and rejecting when it is wrong. - we are accepting when hypoethisis is close to the best that can be done in the class and rejecting when it's far by \\(\epsilon\\) --- class: middle ### PAC verification * Double efficiency ??? - prover should be efficent than the learner - we want verifier to be more efficent that learning -- * Can verifying be lot cheaper than learning? * Qualitative and Quantitative ??? - less amount of samples for verifier compared to learner/prover - type of access to data might be different compared to learner/prover --- class: middle ## Easy verification (\\(opt = 0\\)) * Verifier can take \\(O({1 \over \epsilon})\\) random samples and accept h if and only if h classifies all samples correctly * If \\(L(h)\\ \geq \epsilon\\) reject with constant probability * PAC learning takes \\(\Theta \left( d \over \epsilon \right)\\) where \\(d = VC(H)\\) (can be arbitrarily large) ??? - Can go back to half line example - there is no loss --- class: middle ## Non trivial parts $$L(h) \leq o + \epsilon$$ * While it's easy to get/find loss, it's quite difficult to find optimum * Need to prove that optimum is large: lower and upper bounds, qualitative and quantitative ??? - case when there is no hypothesis such that all samples are classified correctly - lower bound both prover and verifier require same amount of sample - quantitative upper bound: both are efficient - qualitative upper bound: verifying is possible with random sample but proving is not --- class: middle ## Access & Legality * Access to algorithm in black box manner * Use of zero knowledge proofs to verify encrypted models? * Can courts (and other institutes) properly and understand evaluate these system? ??? - Institute might find companies to build these algorithms but they might not give open source access - Zero knowledge proofs from earlier - Case from 1868 where product rule for probabilities was used incorrectly for a Forgery trial --- class: middle ## Future work * Sample complexity gap * Applications! * Include properties such as fairness, safety ??? - The sample complexity of a machine learning algorithm represents the number of training-samples that it needs in order to successfully learn a target function. - Real world hypothesis class like neural network etc - learning more of underline concepts --- # Thank You! # Thank You! # Thank You!