基于聚類的個性化匿名隱私保護算法
首發時間:2023-05-12
摘要:匿名隱私保護技術是應用最為廣泛的一種數據隱私保護技術,其原理是通過泛化或隱匿處理原始數據表中的準標識符屬性,從而發布語義一致的數據。但是目前的匿名隱私保護模型大多未考慮敏感屬性值間的語義相似性,易受相似性攻擊,也無法在數據的安全性和實用性間取得合理的平衡。因此本文提出了基于聚類的個性化(a,k,d)-匿名隱私保護算法。該算法針對敏感屬性定義了語義相似組的概念,并要求每個等價類中的語義相似組個數不小于d,以防御相似性攻擊。同時為滿足匿名模型的個性化需求,對在等價類中相異的敏感屬性設置不同的頻率約束,限制其出現頻率。結合最大相異度聚類來實現匿名算法,在保障隱私的前提下,提高匿名數據的實用性。實驗結果表明,該算法可以用比基于其他聚類的k-匿名模型更小的時間代價,將信息損失量降低了50%以上,抵御了相似性攻擊,提供個性化的隱私保護。
For information in English, please click here
Personalized anonymity privacy protection algorithm based on clustering
Abstract:Anonymous privacy protection technology is the most widely used data privacy protection technology. Its principle is to publish semantically consistent data by generalizing or hiding the quasi-identifier attributePersonalized in the original data table. Most anonymity models for privacy preserving neither consider the semantic similarity of sensitive attribute values, which are vulnerable to similarity attack, nor achieve a balance between data privacy and availability. This paper proposes a (a,k,d)-anonymity privacy preserving algorithm based on clustering. The algorithm defines semantic similarity groups for sensitive attributes, and requires that the number of semantic similarity groups in each equivalence class is not less than d to prevent similarity attack. In addition, it satisfies the personalized needs of the anonymity model by setting different frequency constraints for sensitive attributes that differ in the equivalence class to limit their frequency. The anonymity algorithm is implemented by combining the maximum dissimilarity clustering, which improves the availability of anonymous data on the basis of privacy. The experimental results show that this algorithm can reduce the amount of information loss by more than 50%, resist the similarity attack and provide personalized privacy protection with less time cost than other k-anonymity models based on clustering.
Keywords: privacy protection similarity attack clustering personalization
引用
No.****
同行評議
勘誤表
基于聚類的個性化匿名隱私保護算法
評論
全部評論