Keynote Speakers




Title: Novel Algorithms for Linking Records

Abstract: Given multiple data sets, the problem of record linkage is to cluster them such that each cluster has all the information pertaining to a single entity and does not contain any other information. In this presentation we summarize some of the novel algorithms that we have recently created in the context of record linkage.
Blocking is a technique that is typically used to speed up record linkage algorithms. Recently, we have introduced a novel algorithm for blocking called SuperBlocking. We have created novel record linkage algorithms that employ SuperBlocking. Experimental comparisons reveal that our algorithms outperform state-of-the-art algorithms for record linkage. We have also developed parallel versions of our record linkage algorithms and they obtain close to linear speedups. We will provide details on these algorithms in this presentation.
We can think of each record as a string of characters. Numerous distance metrics can be found in the literature for strings. The performance of a record linkage algorithm might depend on the distance metric used. Some popular ones are: edit distance (also known as the Levenshtein distance), q-gram distance, Hausdorff distance, etc. Jaro is one such popular distance metric that is being widely used for applications such as record linkage. The best-known prior algorithms for computing the Jaro distance between two strings took quadratic time. Recently, we have presented a linear time algorithm for Jaro distance computation. We will summarize this algorithm also in this presentation.

Biography: Sanguthevar Rajasekaran received his M.E. degree in Automation from the Indian Institute of Science (Bangalore) in 1983, and his Ph.D. degree in Computer Science from Harvard University in 1988. Currently he is the Director of the School of Computing, Board of Trustees Distinguished Professor, and Pratt & Whitney Chair Professor of CSE at the University of Connecticut. Before joining UConn, he has served as a faculty member in the CISE Department of the University of Florida and in the CIS Department of the University of Pennsylvania. During 2000-2002 he was the Chief Scientist for Arcot Systems. His research interests include Big Data, AI and Machine Learning, Bioinformatics, Algorithms, Data Mining, Randomized Computing, and HPC. He has published over 350 research articles in journals and conferences. He has co-authored two texts on algorithms and co-edited six books on algorithms and related topics. He has been awarded numerous research grants from such agencies as NSF, NIH, US Census Bureau, CIA, DARPA, Industry, and DHS (totaling more than $22M). He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), the American Association for the Advancement of Science (AAAS), the American Institute for Medical and Biological Engineering (AIMBE), and the Asia-Pacific Artificial Intelligence Association (AIAA). He is also an elected member of the Connecticut Academy of Science and Engineering.