Speakers

Sihem Amer-Yahia: Task Assignment Optimization in Crowdsourcing

Sihem Amer-Yahia is DR1 CNRS at LIG in Grenoble where she leads the SLIDE team. Her interests are at the intersection of large-scale data management and data analytics. Before joining CNRS, she was Principal Scientist at the Qatar Computing Research Institute, Senior Scientist at  Yahoo! Research and at&t Labs. Sihem has served on the SIGMOD (Special Interest Group on the Management of Data) Executive Board, is a member of the VLDB (Very Large Data Bases) and the EDBT (Extending Database Technology) Endowments. She is the Editor-in-Chief of the VLDB Journal for Europe and Africa and is on the editorial boards of TODS (Transactions on Database Systems) and the Information Systems Journal. She is currently serving as PC chair of BDA 2015 and of SIGMOD Industrial 2015. Sihem received her Ph.D. in Computer Science from Paris-Orsay and INRIA in 1999, and her Diplôme d’Ingénieur from INI, Algeria.

A crowdsourcing process can be viewed as a combination of three components: worker skill estimation, worker-to-task assignment, and task accuracy evaluation. The reason why crowdsourcing today is so popular is that tasks are small, independent, homogeneous, and do not require a long engagement from workers. The crowd is typically volatile, its arrival and departure asynchronous, and its levels of attention and accuracy variable. As a result, popular crowdsourcing platforms are not well-adapted to emerging team-based tasks such as collaborative editing, multi-player games, or fan-subbing, that require to form a team of experts to accomplish a task together. In particular, I will argue that the optimization of worker-to-task assignment is central to the effectiveness of team-based crowdsourcing. I will present a framework that allows to formulate worker-to-task assignment as optimization problems with different goals and summarize some of our results in this area.

Ange Aniesa : Le dépôt légal face aux méga-données : l’exemple de l’archivage du web à la Bibliothèque nationale de France.

Ange Aniesa is a curator at the French National Library (Bibliothèque Nationale de France). He is in charge of digital collections at the legal deposit department of the BNF.

During his presentation, Ange Aniesa will present the legal deposit mission of the BNF, its extension to the digital area, the integrated model of collection used as well as its technical dimension. He will also offer an international perspective on the issue. He’ll then focus on the current archiving project at the BNF on digital data derived from the French electoral campaigns which have taken place since 2002. He’ll deal with the goals of this project, the chosen documentary typology, the technical organisation of the work involved and will also reflect on the future prospects for such projects. He’ll conclude with the challenges related to the access, promotion and use of collections generated through the legal deposit mission of the BNF.

Khalid Belhajjame : The State of the Nation in Data Science Reproducibility

Khalid is a lecturer (Maitre de Conférences) at the University Paris-Dauphine, where he is a member of the LAMSADE research lab. Before moving to Paris, He was a researcher for several years at the University of Manchester, and prior to that a PhD student at the University of Grenoble. His research interests lie in the areas of information and knowledge management. In particular, He has made key contributions to the areas of pay-as-you data integration, e-Science, scientific workflow management, provenance tracking and exploitation, and semantic web services. He has published over 50 papers in the aforementioned topics. Most of his research proposals were validated against real-world applications from the fields of astronomy, biodiversity and life sciences. He has participated in multiple European-, French- and UK-funded projects, and has been an active member of the W3C Provenance working group, the NSF funded DataONE working group on scientific workflows and provenance, and more recently the Research Object for Scholarly Communication Community Group.  He is also co-leading the provenance benchmarking activity ProvBench, which seeks to produce a family of benchmarks for testing provenance proposals.

Reproducibility is increasingly recognized as a fundamental pre-requisite for establishing trust and reliability in scientific results and findings. In this talk, I will introduce the key concepts for understanding reproducibility in the context of data science experiments and analyses. I will present examples of platforms and tools that have been proposed for enabling or facilitating reproducibility. I will then focus on the reproducibility of special data science artifacts, viz. scientific workflows underlining current issues for ensuring their preservation and reproducibility, and discussing issues that yet have to be solved.

Josh Cowls : The Big Data Revolution for Social and Political Science

Josh Cowls in a research assistant at the Oxford Internet Institute which he joined in 2013, to work on the Sloan Foundation-funded project on Accessing and Using Big Data to Advance Social Science Knowledge. He also has experience in front-line politics, having worked in the Policy Unit of a UK political party and on US presidential and senate campaigns. His research interests include the impact of big data sets on democracy and government, and the new forms and functions of the public sphere facilitated by online social networks. He is currently part of the Big UK Domain Data for the Arts and Humanities’ project which works with data derived from the UK domain crawl from 1996 to 2013, in order to develop a framework for the study of web archive data and produce a major history of the UK web space.

The project entitled “Accessing and Using Big Data to Advance Social Science Knowledge” carried on by a team of researchers at the Oxford Internet Institute between 2012 and 2014 aimed at following ‘big data’ from its public and private origins through open and closed pathways into the social sciences, and at documenting and shaping the ways they are being accessed and used to create new knowledge about the social world. In short, what are the social and scientific implications of large-scale ‘big data’ as it becomes more widely available to social scientists in academia, public institutions, and the private sector? The project relied on in-depth studies of exemplar cases to understand how social scientists in academia, industry, and government are accessing and using big data to answer old questions at larger scales as well as asking and answering new questions about society and human behavior. This paper will address the methodological, technical and ethical issues which emerged from this project.