Dr Eamonn Keogh
Computer Science & Engineering Department,
University of California - Riverside
Dr. Keogh is an associate professor in University of California – Riverside. His research interests are in data mining, particularly of time series,
as well as machine learning and information retrieval generally. He has published more than 90 papers, including eleven papers in SIGKDD, ten papers in IEEE ICDM, and papers in SIGIR, SIGMOD, SIGGRAPH, VLDB, ICML, EDBT, PKDD, PAKDD, IEEE ICDE, SIAM SDM, IDEAL, FQAS, SSDM, AI and INTERFACE conferences and in the TODS, DMKD, VLDB, KAIS and IJTAI journals. Several of his papers have won “best paper” awards. In addition he has won several teaching awards. He is the recipient of a 5-year NSF Career Award for “Efficient Discovery of Previously Unknown Patterns and Relationships in Massive Time Series Databases” and a grant from Aerospace Corp to develop a time series visualization tool for monitoring space launch telemetry.
Dr Keogh has given well received tutorials on time series, machine learning and data mining all over the world, and his papers have been referenced well over 3,000 times.
Why the Lack of Reproducibility is Crippling Research in Data Mining/Machine Learning and What You Can Do about It.
In this talk I will make a strong and potentially controversial claim. The majority of papers published in the best data mining/machine learning conferences make no contribution.
I will argue that this lack of reproducibility is crippling research progress, and allowing a large number of false research findings go unchallenged and enter the popular consciousness as true. I will demonstrate my claims with the deconstruction of several influential papers and (reproducible!) experiments.
The reason for this is that in most cases, no one, including the original authors can reproduce the findings in the papers. As I shall argue, non-reproducible results are the same as no results at all. The irreproducibility of results may be explicit, the refusal to share data or to give parameter settings, or implicit, the effort to reproduce may be so great that the authors ensure that no one will ever try.
Dr Mehran Sahami
Senior Research Scientist
Mehran Sahami is a Senior Research Scientist at Google. His research interests include information retrieval on the Web, as
well as data mining and machine learning. Mehran was also previously a Lecturer in the Computer Science Department at Stanford University (where he received his PhD), and prior to Google was also involved in a number of commercial and research machine learning projects at Epiphany, Xerox PARC and Microsoft Research. He has published dozens of refereed technical papers, served on numerous conference program/organizing committees and has several patents pending.
Harnessing the Web to Improve Artificial Intelligence
While research in artificial intelligence has made important progress in the past 50 years, the availability of enormous amounts of data on the Web has created an unprecedented opportunity to significantly advance the state of the art in many areas of AI.
In this talk we explore some particular AI problems, specifically related to information finding and understanding, and show that by leveraging Web data it is possible to address these problems in more effective ways. Specifically, we examine issues in topical inference for text, machine translation, and other related problems in text understanding, discussing both the theoretical and practical implications of these problems in real systems. We then generalize from these examples to discuss broader trends in making use of large Web datasets and the potential they offer for further advancing research and development in AI.