PODS Invited Talks
Datalog Redux: Experience and Conjecture
Jospeh M. Hellerstein, UC Berkeley
Datalog was a foundational topic in the early years of PODS, despite skepticism from practitioners about its relevance. This has been changing in recent years, with unlikely champions exploring and promoting its use as a practical basis for programming in a wide variety of application domains. We reflect on our use of Datalog to build systems of significant complexity for both networking and cloud computing infrastructure. Based on that experience, we present conjectures regarding next-generation programming languages, and the role that database theory could play in their development.
Joseph M. Hellerstein is a Professor of Computer Science at the University of California, Berkeley, whose research focuses on data management and distributed systems. He is an ACM Fellow, and his work has been recognized by multiple awards including an Alfred P. Sloan Research Fellowship, and two ACM-SIGMOD "Test of Time" awards. Key ideas from his research have been incorporated into commercial and open-source database software released by IBM, Oracle, and PostgreSQL. He has also held industrial posts including Director of Intel Research Berkeley and Chief Scientist of Cohera Corporation, and currently serves as an advisor to a number of technology companies.
Invited Tutorial 1
From Information to Knowledge: Harvesting Entities and Relationships from Web Sources
Gerhard Weikum and Martin Theobald, Max Planck Institute for Informatics, Saarbruecken, Germany
There are major trends to advance the functionality of search engines to a more expressive semantic level. This is enabled by the advent of knowledge-sharing communities such as Wikipedia and the progress in automatically extracting entities and relationships from semistructured as well as natural-language Web sources. Recent endeavors of this kind include DBpedia, EntityCube, KnowItAll, ReadTheWeb, and our own YAGO-NAGA project (and others). The goal is to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall. This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting. It also addresses issues of querying the knowledge base and ranking answers.
Gerhard Weikum is a Scientific Director at the Max-Planck Institute for Informatics, where he is leading the research group on databases and information systems. Earlier he held positions at Saarland University in Germany, ETH Zurich in Switzerland, MCC in Austin, and he was a visiting senior researcher at Microsoft Research in Redmond. His recent working areas include peer-to-peer information systems, the integration of database-systems and information-retrieval methods, and information extraction for building and maintaining large-scale knowledge bases. Gerhard has co-authored more than 150 publications, including a comprehensive textbook on transactional concurrency control and recovery. He received the VLDB 2002 ten-year award for his work on self-tuning databases, and he is an ACM Fellow. He is a member of the German Academy of Science and Engineering and a member of the German Council of Science and Humanities. Gerhard has served on the editorial boards of various journals including ACM TODS and the new CACM, and as program committee chair for conferences like ICDE 2000, SIGMOD 2004, CIDR 2007, and ICDE 2010. From 2004 to 2009 he was president of the VLDB Endowment.
Martin Theobald is a Senior Researcher at the Max Planck Institute for Informatics. He obtained a doctoral degree in computer science from Saarland University, and spent two years as a post-doc at Stanford University where he worked on the Trio probabilistic database system. Martin received an ACM SIGMOD dissertation award honorable mention in 2006 for his work on the TopX search engine for efficient ranked retrieval of semistructured XML data.
Invited Tutorial 2
T.S. Jayram, IBM Almaden Research Center
The recent years have witnessed the overwhelming success of algorithms that operate on massive data. Several computing paradigms have been proposed for massive data set algorithms such as data streams, sketching, sampling etc. and understanding their limitations is a fundamental theoretical challenge. In this survey, we describe the information complexity paradigm that has proved successful in obtaining tight lower bounds for several well-known problems. Information complexity quantifies the amount of information about the inputs that must be necessarily propagated by any algorithm in solving a problem. We describe the key ideas of this paradigm, and highlight the beautiful interplay of techniques arising from diverse areas such as information theory, statistics and geometry.
T.S. Jayram is a manager of the Algorithms and Computation group at IBM Almaden Research Center. His research interests are in massive data sets, probabilistic databases and computational complexity. He obtained his Ph.D. from University of Washington in 1998 and has been with IBM Research since that time. He has received the IBM Outstanding Innovation Award for his contributions to massive data sets. Previously, he received the Machtey award for best student paper in FOCS 1995.