Name :          Ninghui Li

Institution : Purdue University


Ninghui Li is an Associate Professor of Computer Science at Purdue University, where he joined in 2003.  He received a Bachelor's degree from the University of Science and Technology of China in 1993 and a Ph.D. in Computer Science from New York University in 2000. Before joining Purdue, he was a Research Associate at Stanford University Computer Science Department for 3 years.  Prof. Li's research interests are in computer and information security and privacy. He has published over 80 referred papers in journals and conference proceedings. He has served on the Program Committees of more than 50 international conferences and workshops, including serving as the Program Chair of the 2008 ACM Symposium on Access Control Models and Technologies, and the 2009 IFIP Conference on Trust Management.  He is on the editorial board of the VLDB Journal. His research is funded by several projects funded by the US National Science Foundation, by the US Army Research Office, and by companies including IBM and Google.  In 2005, he received an NSF CAREER award.


Publications : (2 maximum)

·                     Tiancheng Li and Ninghui Li:  “On the Tradeoff between Privacy and Utility in Data Publishing”.  To appear in ACM KDD-09, June 2009. 

·                     Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian.  t-Closeness: Privacy beyond k-Anonymity and l-Diversity”.  In ICDE, June 2007.



Title of Project : A Framework for Privacy Preserving Microdata Publishing


We aim at developing a framework for privacy preserving microdata publishing that considers the interactions among four critical aspects: Data properties, Privacy threats, Publishing methods, and Utility measures.  We consider data with high dimensionality as well as data without a clear separation between quasi-identifier and sensitive attributes.  For privacy, we consider addressing three threats: (1) presence threats, in which an adversary learns that an individual's record is in the published data; (2) identity disclosure threats, in which an individual is linked to a particular record in the released data; and (3) attribute disclosure threats, in which new information about some attribute of an individual is revealed.  We consider both existing publishing methods such as generalization and bucketization, and a method we introduce: slicing.  For utility, we consider the utility measures for a number of data mining tasks on anonymized data.

For privacy, we focus on developing a privacy notion that formalizes the same intuition as differential privacy, but is practically achievable for microdata.  While differential privacy has been studied intensively in recent years, all existing results are about publishing statistical information, rather than publishing microdata.  Differential privacy aims at capturing the following intuition of privacy: “Any disclosure will be, within a small multiplicative factor, just likely whether or not the individual participates in the database.”  To formalize this, we need to define the two cases: (1) the individual participates, and (2) the individual does not participate.  In existing literature, this is modeled by D and D’ such that D\D’={t}.  This modeling, however, results in a requirement too strong for microdata publishing.  We need a more suitable formulation.

For publishing methods, we introduce a new technique called slicing. Slicing partitions the dataset both vertically and horizontally. Vertical partitioning is done by grouping attributes into columns based on the correlations among the attributes. Each column contains a subset of attributes that are highly correlated. Horizontal partitioning is done by grouping tuples into buckets. Finally, within each bucket, values in each column are randomly permutated (or sorted) to break the linking between different columns.  Slicing breaks the association cross columns, but preserves the association within each column. This reduces the dimensionality of the data and preserves better utility.