Tài liệu Ome improvements about a unified hierarchy for functional, conditional functional dependencies and association rules - Vuc Quốc Tuấn: Research
Journal of Military Science and Technology, Special Issue, No.60A, 05 - 2019 87
SOME IMPROVEMENTS ABOUT A UNIFIED HIERARCHY FOR
FUNCTIONAL, CONDITIONAL FUNCTIONAL DEPENDENCIES
AND ASSOCIATION RULES
Vu Quoc Tuan*
Abstract: In [1], the authors have shown a hierarchy between Functional
Dependencies (FDs), Conditional Functional Dependencies (CFDs) and
Association Rules (ARs): FDs are the union of CFDs while CFDs are the union of
ARs. The hierarchy between FDs, CFDs, and ARs has many benefits: algorithms for
discovering ARs can be adapted to discover many other types of data dependencies
and further generate a reduced set of dependencies. In this paper, we give some
remarks and improvements related to the unified hierarchy for FDs, CFDs and ARs.
Keywords: Database; Functional Dependency; Conditional Functional Dependency; Association Rule.
1. INTRODUCTION
Data dependencies play important roles in database design, data quality
management and knowledge ...
8 trang |
Chia sẻ: quangot475 | Lượt xem: 434 | Lượt tải: 0
Bạn đang xem nội dung tài liệu Ome improvements about a unified hierarchy for functional, conditional functional dependencies and association rules - Vuc Quốc Tuấn, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên
Research
Journal of Military Science and Technology, Special Issue, No.60A, 05 - 2019 87
SOME IMPROVEMENTS ABOUT A UNIFIED HIERARCHY FOR
FUNCTIONAL, CONDITIONAL FUNCTIONAL DEPENDENCIES
AND ASSOCIATION RULES
Vu Quoc Tuan*
Abstract: In [1], the authors have shown a hierarchy between Functional
Dependencies (FDs), Conditional Functional Dependencies (CFDs) and
Association Rules (ARs): FDs are the union of CFDs while CFDs are the union of
ARs. The hierarchy between FDs, CFDs, and ARs has many benefits: algorithms for
discovering ARs can be adapted to discover many other types of data dependencies
and further generate a reduced set of dependencies. In this paper, we give some
remarks and improvements related to the unified hierarchy for FDs, CFDs and ARs.
Keywords: Database; Functional Dependency; Conditional Functional Dependency; Association Rule.
1. INTRODUCTION
Data dependencies play important roles in database design, data quality
management and knowledge representation. Nowadays, along with the
development of the society and digital devices, especially social networks and
smartphone applications, the amount of data in the applications increases very
quickly. These arise data problems such as storage, management and knowledge
discovery from large data sets. Discovering data dependencies in a database is one
of important problems of knowledge discovery. Three essential types of data
dependencies are FDs (Functional Dependencies), CFDs (Conditional Functional
Dependencies), and ARs (Association Rules).
The authors in [1] show a hierarchy between FDs, CFDs and ARs: FDs are the
union of CFDs while CFDs are the union of ARs. This hierarchy has some
benefits: existing algorithms which discover ARs could be adapted to discover any
kind of dependencies and, moreover, generate a reduced set of dependencies. In
this paper, we give some remarks and improvements related to the hierarchy
between FDs, CFDs and ARs.
This paper is organized as: the second section recalls some concepts and
definitions. The third section gives some remarks and improvements. The
conclusions are presented in the fourth section.
2. BACKGROUND AND DEFINITIONS
2.1. Relation
A relation on the set of attributes Ω = {A1, A2,,An} is a subset of the Cartesian
product Dom(A1) Dom(A2) Dom(An), where Dom(Ai) is the domain of Ai, i
= 1, 2,, n. The elements of the relation are called tuples.
Let r be a relation on the set of attributes Ω = {A1, A2,,An}. Then
r {(a1, a2,,an) | ai Dom(Ai), i = 1, 2,, n}
2.2. Relation schema
A relation schema R is a ordered pair R = , where is a finite set of
attributes of the relation, F is a set of constraints involving the attributes. In this
paper, we limit F to consist only of FDs. Let us denote:
Information technology & Applied mathematics
Vu Quoc Tuan, “Some improvements about dependencies and association rules.” 88
R() is instead of R = in some cases,
r(R) refers to a relation r of R,
t[X] is the projection of the tuple t on X (t r).
2.3. Functional dependency (FD)
A functional dependency over a relation schema R() is an expression X Y,
where X and Y are subsets of . The dependency holds or is valid in a given
relation r(R) if
t1, t2 r, t1[X] = t2[X] t1[Y] = t2[Y]
2.4. Armstrong's axioms
For each X, Y, Z ,
A1. (Reflexivity): If Y X then X Y.
A2. (Augmentation): If X Y then XZ YZ.
A3. (Transitivity): If X Y and Y Z then X Z.
Let F be a given set of FDs, the closure F+ of F is the set of FDs which can be
derived from F through Armstrong's reference rules.
2.5. The closure of a set of attributes
Let F be a given set of FDs over the set of attributes and X . The closure
of X under F, denoted by FX
, is a set defined as:
FX
= {A (X A) F+}
From the definition of FX
and Armstrong's reference rules, it is easily seen that
X Y holds if and only if Y FX
.
2.6. Conditional Functional Dependency (CFD)
The concept of CFD [2, 3, 4, 7, 8] is an extension of the FD concept for data
cleaning. Let R() be a relation schema and X, Y . A CFD on R is a pair (X
Y, Tp), where (1) X Y is a standard FD referred to as the FD embedded in
and (2) Tp is a tableau with all attributes in X Y, referred to as the pattern tableau
of , where for each A X Y and each tuple t Tp, t[A] is either a constant "a"
in the domain Dom(A), or an unnamed variable "". Intuitively, the pattern tableau
Tp of refines the standard FD embedded in by enforcing the binding of
semantically related data values.
2.7. Association Rule (AR)
Let R() be a relation schema and X1, ..., Xk, A . An AR ([1]) is a
dependency (X1 = b1) ... (Xk = bk) (A = a), meaning that for any tuple t
r(R), if t[X1] = b1 and ... and t[Xk] = bk then t[A] = a.
2.8. Definitions
Let R be a relation schema defined over a set of attributes = Attr(R). Authors
in [1] have given definitions below.
Research
Journal of Military Science and Technology, Special Issue, No.60A, 05 - 2019 89
Definition 2.8.1 (CFDs definition). A CFD defined on R is a pair (X Y, Tp),
where (1) X Y is a standard FD, referred to as the FD embedded in ; and (2) Tp
is a tableau with attributes in R, referred to as the pattern tableau of , where for
each A Attr(R) and each pattern tuple tp Tp, tp[A] is either:
- a constant "a" in Dom(A),
- an unnamed variable that draws values from Dom(A),
- or an empty variable which indicates that the attribute does not contribute
to the pattern (i.e. A X Y).
Definition 2.8.2 ( definition). For any constant a of an attribute, we have a .
We define the pattern intersection operator of two tuples as:
t1 t2 = tp such that A Attr(R),
1 1 2
2 1 2
[ ] [ ] [ ] [ ]
[ ] [ ] [ ] [ ]
[ ] ,
p
p
p
t A t A t A t A
t A t A t A t A
t A
if
if
otherwise
Definition 2.8.3 (Pattern restriction definition). We define the pattern restriction to
attributes X of a tuple t, denoted by t(X) as:
t(X) = tp such that A Attr(R),
[ ] [ ]
[ ]
p
p
t A t A
t A
if A X
if A X
Definition 2.8.4 ( definition). A subsumption operator over pattern tuples t1
and t2 defined as:
t1 t2 if and only if A Attr(R), t1[A] t2[A]
In other words, t1 t2 if and only if t1 t2 = t1.
Definition 2.8.5 (r definition). An instance r of R satisfies the CFD , denoted
by r , if for each tuple tp in the pattern tableau Tp of , and for each pair of
tuples t1, t2 in r, if
t1[X] = t2[X] tp[X] then t1[Y] = t2[Y] tp[Y]
In other words, a CFD is an FD satisfied by a fragment relation. A pattern
tableau can thus be seen as a selection query on a relation returning a fragment of
the relation.
Example 2.1. Consider the relation Q in Table 1:
Table 1. Relation Q.
Q A B C D
t1 001 124 12 34
t2 001 124 21 43
t3 002 157 34 69
t4 002 157 34 61
Information technology & Applied mathematics
Vu Quoc Tuan, “Some improvements about dependencies and association rules.” 90
t5 002 158 89 62
t6 002 158 89 90
We have: CFD = (AB C, {002, , , }) is satisfied by the relation Q,
t1 t2 = (001, 124, 12, 34), t1 t2 and t1(BC) = (, 124, 12, ).
Definition 2.8.6. A pattern tuple tp defines a fragment relation of r:
pt
r = {t r | tp t}
The authors of [1] have successfully exploited the properties of X-complete
relations, introduced in [6], and established the link between FDs, CFDs and ARs
on that foundation. In section 2.2 of [1], there are the following definitions:
Definition 2.8.7 (X-complete property). The relation r is said to be X-complete if
and only if t1, t2 r we have t1[X] = t2[X].
Example 2.2. The relation in Table 2 is A-complete.
Definition 2.8.8 (X-complete pattern). We call X-complete pattern of an X-
complete relation r, denoted by (X, r), the pattern tuple on which tuples of r agree.
More formally:
(X, r) = {t r}
Definition 2.8.9 (X-complete horizontal decomposition). We denote by RX(r) the
set of all X-complete fragment relations of r. More formally:
RX(r) = {r' r | r' is X-complete}
Definition 2.8.10 (Set of X-patterns). We denote by (X, r) the set of all X-
complete patterns of an X-complete decomposition. More formally:
(X, r) = {(X, r') | r' RX(r)}
Definition 2.8.11 (Closure operator). We call closure of X in r, denoted by (X, r),
the set of all attributes defined in all X-complete patterns of the relation. More
formally:
(X, r) = {A Attr(R) | tp (X, r), tp[A] }
3. SOME REMARKS AND IMPROVEMENTS
Remark 3.1. The definition of a CFD in [1] is different from the standard one, given
for instance in [2, 3] by the extension for all tp Tp which are defined in Attr(R),
instead only in (X Y) Attr(R). This is done by assigning tp[A] = for all A (X
Y). Such extension makes some easy for the matching of two tuples but with the
price by using the pattern restriction to attributes X (resp. Y) of a tuple t.
Remark 3.2. The matching of a tuple t r with a tuple tp Tp (tp is now defined
on Attr(R)) is in fact the matching of t(X Y) and tp(X Y) to know whether there
are assignments appropriate constant a to (in tp) such that they will be identical.
More formally, t(X) and tp(X) (respectively t(Y) and tp(Y)) are matching if
Research
Journal of Military Science and Technology, Special Issue, No.60A, 05 - 2019 91
A X: t(X)[A] = tp(X)[A] = a Dom(A) or t(X)[A] = a and tp(X)[A] =
Remark 3.3. In Definition 3.6, there is the following statement: A pattern tuple tp
defines a fragment relation of r:
pt
r = {t r | tp t} (*)
Due to the order relation a for any constant a of an attribute, it is clear
that the formula (*) is incorrect. The reason is that in almost cases, (*) returns
empty set. In fact, in case tp contains at least one component , then there exists
not t r such that tp t. In the opposite case (tp does not contain the component
) and X Y Attr(R), we have tp[A] = and t[A] = a for A X Y. Therefore,
there does not exist t r such that tp t.
So,
pt
r , defined by (*) returns the non-empty result only when X Y = Attr(R) and
tp coincides with some tuple t in r. Hence, the expression (*) must be changed to
pt
r = {t r | t(X Y) tp(X Y)} (**)
Example 3.1. Consider the relation Q in Table 1 and the pattern tuple tp = (002, ,
, ). Then
pt
r is the fragment relation in Table 2.
Table 2. A fragment relation
pt
r .
pt
r A B C D
t3 002 157 34 69
t4 002 157 34 61
t5 002 158 89 62
t6 002 158 89 90
Remark 3.4. Let r is an X-complete relation and r' r. Then
(X, r') = {t r'}
Returning to the Definition 3.2 of on two tuples t1, t2 r. Here, using the order
relation a for any constant a of an attribute to compute t1 t2, we can
have unnecessary difficulties which are time consuming. Essentially, we have to
compare the respective components of two tuples t1 and t2 to know whether they
are equal or not. Therefore, instead of the operator , it is better to use the simple
operator , defined as follows:
With t1, t2 r,
Information technology & Applied mathematics
Vu Quoc Tuan, “Some improvements about dependencies and association rules.” 92
t1 t2 = t such that A Attr(R),
1 1 2
1 2
[ ] [ ] [ ] [ ]
[ ] [ ] [ ]
t A t A A t A
t A A t A
if t
if t
Example 3.2. To illustrate, we consider the relation r and relations r1, r2, r3, r4 as:
Table 3. The relation r.
r A B C D E G
t1 a1 b1 c1 d1 e1 g1
t2 a2 b1 c1 d1 e2 g2
t3 a2 b4 c4 d1 e6 g2
t4 a2 b2 c2 d1 e3 g2
t5 a3 b2 c2 d2 e4 g3
t6 a3 b3 c3 d2 e5 g3
t7 a4 b3 c3 d3 e6 g4
t8 a1 b1 c1 d1 e3 g1
The relation r is an instance of the relation schema R({A, B, C, D, E, G}). It is
easily seen that r satisfies the set of FDs F = {B C, C B, A DG}.
The following relations r1, r2, r3, r4 are fragment relations of r and A-complete.
Table 4. The relation r2.
r1 A B C D E G
t1 a1 b1 c1 d1 e1 g1
t8 a1 b1 c1 d1 e3 g1
(A, r1) = {t r1} = (a1, b1, c1, d1, , g1)
Table 5. The relation r2.
r2 A B C D E G
t2 a2 b1 c1 d1 e2 g2
t3 a2 b4 c4 d1 e6 g2
t4 a2 b2 c2 d1 e3 g2
(A, r2) = {t r2} = (a2, , , d1, , g2)
Table 6. The relation r3.
r3 A B C D E G
t5 a3 b2 c2 d2 e4 g3
t6 a3 b3 c3 d2 e5 g3
(A, r3) = {t r3} = (a3, , , d2, , g3)
Table 7. The relation r4.
r4
A B C D E G
t7 a4 b3 c3 d3 e6 g4
(A, r4) = {t r4} = (a4, b3, c3, d3, e6, g4)
The set of A-complete pattern tuples of r as:
(A, r) = {(a1, b1, c1, d1, , g1),
(a2, , , d1, , g2),
Research
Journal of Military Science and Technology, Special Issue, No.60A, 05 - 2019 93
(a3, , , d2, , g3),
(a4, b3, c3, d3, e6, g4)}
By the elements of (A, r), we have (A, r) = {A, D, G}.
Remark 3.5. In practice, to compute the X-complete-pattern ( , )X r of an X-
complete relation r, i.e. to compute t r , we can use the following method:
( )A Attr R ,
( , )[ ]X r A =
,
,
the column A of r contains an unique value a
otherwise
a if
More formally
( )A Attr R , ( , )[ ]X r A =
, ( ) ( )
,
otherwise
Aa if r a
Remark 3.6. Basing on definitions and properties of (X, r) and FX
, we have the
proposition as: Let r be a relation of R defined on Attr(R), X Attr(R), and r
satisfies a set of FDs F. Then
(X, r) = {A Attr(R) | tp (X, r), tp[A] }
= FX
= {A Attr(R) | (X A) F+},
where (X, r) = {(X, r') | r' RX(r)} and (X A) F
+.
Proof. A (X, r) tp (X, r), we have tp[A] . Therefore, t, t' r
such that t[X] = t'[X] t[A] = t'[A]. So X A F+ or A FX
. Conversely,
suppose that A FX
X A F+. Then, r' RX(r) and t, t' r', we have
t[X] = t'[X]. Since X A F+ then t[A] = t'[A] (X, r')[A] A (X, r).
The above result is a tool for discovering FDs from (X, r), which means that X
Y if and only if Y FX
= (X, r).
Example 3.3. From (A, r) = {A, D, G} in Example 3.1, we have A D and A G.
4. CONCLUSIONS
In this paper, we give some remarks and preliminary results concerning a work
about a unified hierarchy for FDs, CFDs and ARs: adjusting the expression
pt
r
which defines a fragment relation; proposing an improvement for the operator to
which is more rigorous semantically and computed easily; proving that
( , ) FX r X
.
Studying and solving problems related to FDs, CFDs and ARs is a new and very
interesting direction.
REFERENCES
[1]. R. Medina, and L. Nourine, "A Unified Hierarchy for Functional
Dependencies, Conditional Functional Dependencies and Association Rules",
ICFCA 2009, LNAI 5548, pp. 98–113, (2009).
Information technology & Applied mathematics
Vu Quoc Tuan, “Some improvements about dependencies and association rules.” 94
[2]. P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis, "Conditional
functional dependencies for data cleaning", IEEE 23rd International
Conference on Data Engineering, pages 746–755, (2007).
[3]. W. Fan, F. Geerts, J. Li, and M. Xiong, "Discovering conditional functional
dependencies", IEEE Transactions on Knowledge and Data Engineering,
Volume 23, Issue 5, pp. 683-698, (2011).
[4]. P. Z. Yeh, and C. A. Puri, "Discovering conditional functional dependencies to
detect data inconsistencies", ACM Transactions on Database Systems
(TODS), Volume 33, Issue 2, Article No. 6, (2008).
[5]. W. Fan, and F. Geerts, "Foundations of Data Quality Management", Morgan
& Claypool Publisher, (2012).
[6]. P. D. Bra, and J. Paredaens, "An algorithm for horizontal decompositions",
Information Processing Letters, 17(2), pp. 91-95, (1983).
[7]. R. Salem and A. Abdo, "Fixing rules for data cleaning based on conditional
functional dependency". Future Computing and Informatics Journal, 1(1-2),
10–26.doi:10.1016/j.fcij.2017.03.002.
[8]. J. Rammelaere and F. Geerts, "Revisiting Conditional Functional Dependency
Discovery: Splitting the “C” from the “FD”", Machine Learning and
Knowledge Discovery in Databases pp 552-568, (2018).
[9]. T. Diallo, N. Novelli and J.M. Petit, "Discovering (frequent) constant
conditional functional dependencies". IJDMMM 4(3), 205–223, (2012).
TÓM TẮT
MỘT SỐ CẢI TIẾN VỀ SỰ PHÂN CẤP GIỮA PHỤ THUỘC HÀM, PHỤ
THUỘC HÀM ĐIỀU KIỆN VÀ LUẬT KẾT HỢP
Trong [1], các tác giả đã chỉ ra một thứ tự phân cấp giữa các phụ thuộc hàm
(FD), phụ thuộc hàm điều kiện (CFD) và luật kết hợp (AR): FD là hợp của các
CFD trong khi CFD là hợp của các AR. Thứ tự phân cấp giữa các FD, CFD và AR
mang lại nhiều lợi ích: các thuật toán hiện có để phát hiện AR có thể được thích
nghi để phát hiện nhiều loại phụ thuộc dữ liệu khác và hơn nữa còn sinh một tập
được rút gọn các phụ thuộc. Trong bài báo này, chúng tôi đưa ra một số nhận xét
và cải tiến có liên quan tới thứ tự phân cấp giữa FD, CFD và AR.
Từ khóa: Cơ sở dữ liệu; Phụ thuộc hàm; Phụ thuộc hàm điều kiện; Luật kết hợp.
Received 17th January 2019
Revised 16th April 2019
Accepted 15th May 2019
Author affiliations:
Graduate University of Science and Technology, Vietnam Academy of Science and Technology;
*Corresponding author: vqtuanhd@gmail.com.
Các file đính kèm theo tài liệu này:
- 10_tuan_9489_2150387.pdf