Cybersecurity news about "ACM SIGIR"

:::info This paper is available on arxiv under CC 4.0 license. Authors: (1) Jinrui Yang, School of Computing & Information Systems, The University of Melbourne (Email: jinruiy@student.unimelb.edu.au); (2) Timothy Baldwin, School of Computing & Information Systems, The University of Melbourne and Mohamed bin Zayed University of Artificial Intelligence, UAE (Email: (tbaldwin,trevor.cohn)@unimelb.edu.au); (3) Trevor Cohn, School of Computing & Information Systems, The University of Melbourne. ::: Table of Links Abstract and Intro Background and Related Work Multi-EuP Experiments and Findings Language Bias Discussion Conclusion, Limitations, Ethics Statement, Acknowledgements, References, and Appendix 6 Conclusion In this paper, we introduce Multi-EuP, a novel dataset for multilingual information retrieval across 24 languages, collected from European Parliament debates. The demographic information provided by the Multi-EuP dataset serves a dual purpose: not only does it contribute to multilingual retrieval tasks, but it also holds significant potential for advancing research in the realm of fairness and bias. This dataset can play a pivotal role in investigating issues of equitable representations and mitigation of biases within document ranking settings. Multi-EuP facilitates diverse information retrieval (IR) scenarios, encompassing one-vs-one, one-vs-many, and many-vs-many settings. We demonstrated the utility of Multi-EuP as a benchmark for evaluating both monolingual and multilingual IR. Our study reveals the presence of language bias in multilingual IR when employing BM25. We further validate the effectiveness of mitigating this bias through the strategic implementation of whitespace as a language tokenizer. We propose to conduct future work in three main areas. First, we intend to expand our investigation of language bias to encompass a broader range of ranking methods, including neural methods such as mDPR (Zhang et al., 2021), mColBERT (Lawrie et al., 2023) and PLAID-X(Santhanam et al., 2022). Second, we will expand the dataset by developing an automated API to retrieve data published by the European Parliament (EP), thereby ensuring realtime synchronization of our dataset. Lastly, our current experiments have explored language bias only, but we plan to further investigate gender bias, age bias, and nationality bias. Limitations The limitations of the Multi-EuP dataset are notable but navigable. Primarily, the temporal coverage of the dataset is confined to the past three years. This temporal constraint arises due to the fact that, preceding 2020, documents released by the EU were predominantly available in mono-lingual versions only. However, a potential remedy lies in the amalgamation of the Europarl (Koehn, 2005) collection, enabling a more comprehensive and holistic MultiEuP dataset. Furthermore, it is worth noting the domain skew of the dataset, in that Multi-EuP inevitably centers on political matters. While this presents challenges, particularly in terms of the intricate nuances of political language, it inherently serves as an excellent foundational stepping stone for delving into the intricacies of multilingual retrieval. We believe, however, that this dataset can serve as a launching pad for broader explorations encompassing crossdomain and open-domain transfer learning scenarios, thus contributing to the broader landscape of language understanding and retrieval. Ethics Statement The dataset contains publicly-available EP data that does not include personal or sensitive information, with the exception of information relating to public officeholders, e.g., the names of the active members of the European Parliament, European Council, or other official administration bodies. The collected data is licensed under the Creative Commons Attribution 4.0 International licence. [8] Acknowledgements This research was funded by Melbourne Research Scholarship and undertaken using the LIEF HPCGPGPU Facility hosted at the University of Melbourne. This facility was established with the assistance of LIEF Grant LE170100200. We would like to thank George Buchanan for providing valuable feedback. References Luiz Henrique Bonifacio, Israel Campiotti, Roberto de Alencar Lotufo, and Rodrigo Frassetto Nogueira. 2021. mMARCO: A multilingual version of MS MARCO passage ranking dataset. CoRR, abs/2108.13897. Ilias Chalkidis, Manos Fergadiotis, and Ion Androutsopoulos. 2021. MultiEURLEX - a multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6974–6996, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics. 8 https://eur-lex.europa.eu/cont Jonathan H. Clark, Eunsol Choi, Michael Collins, Dan Garrette, Tom Kwiatkowski, Vitaly Nikolaev, and Jennimaria Palomaki. 2020. TyDi QA: A benchmark for information-seeking question answering in typologically diverse languages. Transactions of the Association for Computational Linguistics, 8:454–470. Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for opendomain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics. Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over BERT. CoRR, abs/2004.12832. Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of Machine Translation Summit X: Papers, pages 79–86, Phuket, Thailand. Dawn Lawrie, Eugene Yang, Douglas W. Oard, and James Mayfield. 2023. Neural approaches to multilingual information retrieval. arXiv cs.IR 2209.01335. Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, JhengHong Yang, Ronak Pradeep, and Rodrigo Nogueira. 2021. Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations. https://github.com/ castorini/pyserini. Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. CoRR, abs/1611.09268. Ella Rabinovich, Raj Nath Patel, Shachar Mirkin, Lucia Specia, and Shuly Wintner. 2017. Personalized machine translation: Preserving original author traits. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 1074–1084, Valencia, Spain. Association for Computational Linguistics. Razieh Rahimi, Azadeh Shakery, and Irwin King. 2015. Multilingual information retrieval in the language modeling framework. Information Retrieval Journal, 18:246–281. Keshav Santhanam, Omar Khattab, Christopher Potts, and Matei Zaharia. 2022. PLAID: An efficient engine for late interaction retrieval. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, CIKM '22, page 1747–1756, New York, NY, USA. Association for Computing Machinery. Jörg Tiedemann and Santhosh Thottingal. 2020. OPUSMT – building open translation services for the world. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 479–480, Lisboa, Portugal. European Association for Machine Translation. Eva Vanmassenhove and Christian Hardmeier. 2018. Europarl datasets with demographic speaker information. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, page 391, Alicante, Spain. Denny Vrandeciˇ c and Markus Krötzsch. 2014. ´ Wikidata: A free collaborative knowledge base. Communications of the ACM, 57:78–85. Konrad Wojtasik, Vadim Shishkin, Kacper Wołowiec, Arkadiusz Janz, and Maciej Piasecki. 2023. BEIRPL: Zero shot information retrieval benchmark for the Polish language. arXiv cs.IR 2305.19840. Peilin Yang, Hui Fang, and Jimmy Lin. 2017. Anserini: Enabling the use of lucene for information retrieval research. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1253–1256. Xinyu Zhang, Xueguang Ma, Peng Shi, and Jimmy Lin. 2021. Mr. TyDi: A multi-lingual benchmark for dense retrieval. arXiv cs.CL 2108.08787. A. Appendix [8] https://eur-lex.europa.eu/content/ legal-notice/legal-notice.html

Computational Linguistics dataset Language Bias machine translation

:::info This paper is available on arxiv under CC 4.0 license. Authors: (1) Ryen W. White, Microsoft Research, Redmond, WA, USA. ::: Table of Links Abstract and Taking Search to task AI Copilots Challenges Opportunities The Undiscovered Country and References 5 THE UNDISCOVERED COUNTRY AI copilots will transform how we search. Tasks are central to people's lives and more support is needed for complex tasks in search settings. Some limited support for these tasks already exists in search engines, but copilots will expand the task frontier to make more tasks actionable and address the “last mile” in search interaction: task completion [58]. Moving forward, search providers should invest in “better together” experiences that utilize copilots plus traditional search, make these joint experiences more seamless for searchers, and add more support for their use in practice, e.g., help people to quickly understand copilot capabilities and potential and/or recommend the best modality for the current task or task stage. This includes experiences where both modalities are offered separately and can be selected by searchers and those where there is unification and the selection happens automatically based on the query and the conversation context. The foundation models that power copilots have other search-related applications, e.g., for generating and applying intent taxonomies [43] or for evaluation [19]. We must retain a continued focus on human-AI cooperation, where searchers stay in control while the degree of system support increases as needed [44], and on AI safety. Searchers need to be able to trust copilots in general but also be able to verify their answers with minimal effort. Overall, the future is bright for IR, and AI research in general, with the advent of generative AI and the copilots that build upon it. Copilots will help augment and empower searchers in their information seeking journeys. Computer science researchers and practitioners should embrace this new era of assistive agents and engage across the full spectrum of exciting practical and scientific opportunities, both within information seeking as we focused on here, and onwards into other important domains such as personal productivity [5] and scientific discovery [22]. REFERENCES [1] Eugene Agichtein, Ryen W White, Susan T Dumais, and Paul N Bennet. 2012. Search, interrupted: understanding and predicting search task continuation. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 315-324. [2] Marcia J Bates. 1990. Where should the person stop and the information search interface start? Information Processing & Management 26, 5 (1990), 575–591. [3] Nicholas J Belkin. 1980. Anomalous states of knowledge as a basis for information retrieval. Canadian journal of information science 5, 1 (1980), 133–143. [4] Paul N Bennett, Ryen W White, Wei Chu, Susan T Dumais, Peter Bailey, Fedor Borisyuk, and Xiaoyuan Cui. 2012. Modeling the impact of short-and long-term behavior on search personalization. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. 185–194. [5] Christian Bird, Denae Ford, Thomas Zimmermann, Nicole Forsgren, Eirini Kalliamvakou, Travis Lowdermilk, and Idan Gazit. 2022. Taking Flight with Copilot: Early insights and opportunities of AI-powered pair-programming tools. Queue 20, 6 (2022), 35–57. [6] Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021). [7] Lucas Bourtoule, Varun Chandrasekaran, Christopher A Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot. 2021. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE, 141–159. [8] Andrei Broder. 2002. A taxonomy of web search. In ACM Sigir forum, Vol. 36. ACM New York, NY, USA, 3–10. [9] Andrei Z Broder and Preston McAfee. 2023. Delphic Costs and Benefits in Web Search: A utilitarian and historical analysis. arXiv preprint arXiv:2308.07525 (2023). [10] Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. 2023. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023). [11] Katriina Byström and Kalervo Järvelin. 1995. Task complexity affects information seeking and use. Information processing & management 31, 2 (1995), 191–213. [12] Robert Capra and Jaime Arguello. 2023. How does AI chat change search behaviors? arXiv preprint arXiv:2307.03826 (2023). [13] Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E Gonzalez, et al. 2023. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023) (2023). [14] Antonia Creswell and Murray Shanahan. 2022. Faithful reasoning using large language models. arXiv preprint arXiv:2208.14271 (2022). [15] Brenda Dervin. 1998. Sense-making theory and practice: An overview of user interests in knowledge seeking and use. Journal of knowledge management 2, 2 (1998), 36–46. [16] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). [17] Karl Duncker and Lynne S Lees. 1945. On problem-solving. Psychological monographs 58, 5 (1945), i. [18] Brad Everman, Trevor Villwock, Dayuan Chen, Noe Soto, Oliver Zhang, and Ziliang Zong. 2023. Evaluating the Carbon Impact of Large Language Models at the Inference Stage. In 2023 IEEE International Performance, Computing, and Communications Conference (IPCCC). IEEE, 150–157. [19] Guglielmo Faggioli, Laura Dietz, Charles LA Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, et al. 2023. Perspectives on Large Language Models for Relevance Judgment. In Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval. 39–50. [20] Jianfeng Gao, Chenyan Xiong, Paul Bennett, and Nick Craswell. 2023. Neural Approaches to Conversational Information Retrieval. Vol. 44. Springer Nature. [21] Ahmed Hassan Awadallah, Ryen W White, Patrick Pantel, Susan T Dumais, and Yi-Min Wang. 2014. Supporting complex search tasks. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management. 829–838. [22] Tom Hope, Doug Downey, Daniel S Weld, Oren Etzioni, and Eric Horvitz. 2023. A computational inflection for scientific discovery. Commun. ACM 66, 8 (2023), 62–73. [23] Peter Ingwersen and Kalervo Järvelin. 2005. The turn: Integration of information seeking and retrieval in context. Vol. 18. Springer Science & Business Media. [24] Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. 2023. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38. [25] Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 133–142. [26] Jeonghyun Kim. 2006. Task difficulty as a predictor and indicator of web searching interaction. In CHI'06 extended abstracts on human factors in computing systems. 959–964. [27] David R Krathwohl. 2002. A revision of Bloom's taxonomy: An overview. Theory into practice 41, 4 (2002), 212–218. [28] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474. [29] Yuelin Li and Nicholas J Belkin. 2008. A faceted approach to conceptualizing tasks in information seeking. Information processing & management 44, 6 (2008), 1822–1837. [30] Yuanchun Li and Oriana Riva. 2021. Glider: A reinforcement learning approach to extract UI scripts from websites. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1420–1430. [31] Paul Pu Liang, Chiyu Wu, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2021. Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning. PMLR, 6565–6576. [32] Gary Marchionini. 2006. Exploratory search: from finding to understanding. Commun. ACM 49, 4 (2006), 41–46. [33] James Mayfield, Eugene Yang, Dawn Lawrie, Samuel Barham, Orion Weller, Marc Mason, Suraj Nair, and Scott Miller. 2023. Synthetic Cross-language Information Retrieval Training Data. arXiv preprint arXiv:2305.00331 (2023). [34] Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, and Ahmed Awadallah. 2023. Orca: Progressive learning from complex explanation traces of gpt-4. arXiv preprint arXiv:2306.02707 (2023). [35] Marc Najork. 2023. Generative Information Retrieval. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1–1. [36] Alexandra Olteanu, Jean Garcia-Gathright, Maarten de Rijke, Michael D Ekstrand, Adam Roegiest, Aldo Lipani, Alex Beutel, Alexandra Olteanu, Ana Lucic, AnaAndreea Stoica, et al. 2021. FACTS-IR: fairness, accountability, confidentiality, transparency, and safety in information retrieval. In ACM SIGIR Forum, Vol. 53. ACM New York, NY, USA, 20–43. [37] Soo Young Rieh, Kevyn Collins-Thompson, Preben Hansen, and Hye-Jung Lee. 2016. Towards searching as a learning process: A review of current perspectives and future directions. Journal of Information Science 42, 1 (2016), 19–34. [38] Shawon Sarkar and Chirag Shah. 2021. An integrated model of task, information needs, sources and uncertainty to design task-aware search systems. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval. 83–92. [39] Reijo Savolainen. 2012. Expectancy-value beliefs and information needs as motivators for task-based information seeking. Journal of Documentation 68, 4 (2012), 492–511. [40] Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 (2023). [41] Chirag Shah. 2023. Generative AI and the Future of Information Access. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (Birmingham, United Kingdom) (CIKM '23). Association for Computing Machinery, New York, NY, USA, 3. https://doi.org/10.1145/3583780.3615317 [42] Chirag Shah, Ryen White, Paul Thomas, Bhaskar Mitra, Shawon Sarkar, and Nicholas Belkin. 2023. Taking search to task. In Proceedings of the 2023 Conference on Human Information Interaction and Retrieval. 1–13. [43] Chirag Shah, Ryen W White, Reid Andersen, Georg Buscher, Scott Counts, Sarkar Snigdha Sarathi Das, Ali Montazer, Sathish Manivannan, Jennifer Neville, Xiaochuan Ni, et al. 2023. Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies. arXiv preprint arXiv:2309.13063 (2023). [44] Ben Shneiderman. 2022. Human-centered AI. Oxford University Press. [45] Adish Singla, Ryen White, and Jeff Huang. 2010. Studying trailfinding algorithms for enhanced web search. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. 443–450. [46] Jaime Teevan, Kevyn Collins-Thompson, Ryen W White, and Susan Dumais. 2014. Slow search. Commun. ACM 57, 8 (2014), 36–38. [47] Jaime Teevan, Susan T Dumais, and Eric Horvitz. 2005. Personalizing search via automated analysis of interests and activities. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 449–456. [48] Jaime Teevan, Meredith Ringel Morris, and Steve Bush. 2009. Discovering and using groups to improve personalized search. In Proceedings of the second acm international conference on web search and data mining. 15–24. [49] Maartje ter Hoeve, Robert Sim, Elnaz Nouri, Adam Fourney, Maarten de Rijke, and Ryen W White. 2020. Conversations with documents: An exploration of document-centered assistance. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval. 43–52. [50] Paul Thomas, Seth Spielman, Nick Craswell, and Bhaskar Mitra. 2023. Large language models can accurately predict searcher preferences. arXiv preprint arXiv:2309.10621 (2023). [51] Randall H Trigg. 1988. Guided tours and tabletops: Tools for communicating in a hypertext environment. ACM Transactions on Information Systems (TOIS) 6, 4 (1988), 398–414. [52] Sarah K Tyler and Jaime Teevan. 2010. Large scale query log analysis of re-finding. In Proceedings of the third ACM international conference on Web search and data mining. 191–200. [53] Pertti Vakkari. 2001. A theory of the task-based information retrieval process: A summary and generalisation of a longitudinal study. Journal of documentation 57, 1 (2001), 44–60. [54] Pertti Vakkari. 2016. Searching as learning: A systematization based on literature. Journal of Information Science 42, 1 (2016), 7–18. [55] Nicholas Vincent. 2022. The Paradox of Reuse, Language Models Edition.https://nmvg.mataroa.blog/blog/the-paradox-of-reuse-language-modelsedition/. Accessed: 2023-09-12. [56] Yu Wang, Xiao Huang, and Ryen W White. 2013. Characterizing and supporting cross-device search tasks. In Proceedings of the sixth ACM international conference on Web search and data mining. 707–716. [57] Ryen W White. 2016. Interactions with search systems. Cambridge University Press. [58] Ryen W White. 2018. Opportunities and challenges in search interaction. Commun. ACM 61,12 (2018), 36–38. [59] Ryen W White. 2018. Skill discovery in virtual assistants. Commun. ACM 61, 11 (2018), 106–113. [60] Ryen W White. 2022. Intelligent futures in task assistance. Commun. ACM 65, 11 (2022), 35–39. [61] Ryen W. White. 2023. Tasks, Copilots, and the Future of Search. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR '23). Association for Computing Machinery, New York, NY, USA, 5–6. https://doi.org/10.1145/3539618.3593069 [62] Ryen W White, Mikhail Bilenko, and Silviu Cucerzan. 2007. Studying the use of popular destinations to enhance web search interaction. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 159–166. [63] Ryen W White, Wei Chu, Ahmed Hassan, Xiaodong He, Yang Song, and Hongning Wang. 2013. Enhancing personalized search by mining and modeling task behavior. In Proceedings of the 22nd international conference on World Wide Web. 1411–1420. [64] Ryen W White, Adam Fourney, Allen Herring, Paul N Bennett, Nirupama Chandrasekaran, Robert Sim, Elnaz Nouri, and Mark J Encarnación. 2019. Multi-device digital assistance. Commun. ACM 62, 10 (2019), 28–31. [65] Ryen W White, Ian Ruthven, and Joemon M Jose. 2005. A study of factors affecting the utility of implicit relevance feedback. In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. 35–42. [66] Ryen W White, Ian Ruthven, Joemon M Jose, and CJ Van Rijsbergen. 2005. Evaluating implicit feedback models using searcher simulations. ACM Transactions on Information Systems (TOIS) 23, 3 (2005), 325–361. [67] Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. 2023. AutoGen: Enabling nextgen LLM applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155 (2023). [68] Iris Xie. 2008. Interactive information retrieval in digital environments. IGI global. [69] Jinyun Yan, Wei Chu, and Ryen W White. 2014. Cohort modeling for enhanced personalized search. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 505–514. [70] Da Yu, Saurabh Naik, Arturs Backurs, Sivakanth Gopi, Huseyin A Inan, Gautam Kamath, Janardhan Kulkarni, Yin Tat Lee, Andre Manoel, Lukas Wutschitz, et al. 2021. Differentially private fine-tuning of language models. arXiv preprint arXiv:2110.06500 (2021). [71] Hamed Zamani, Susan Dumais, Nick Craswell, Paul Bennett, and Gord Lueck. 2020. Generating clarifying questions for information retrieval. In Proceedings of the web conference 2020. 418–428. [72] Jieyu Zhang, Ranjay Krishna, Ahmed H Awadallah, and Chi Wang. 2023. EcoAssistant: Using LLM Assistant More Affordably and Accurately. arXiv preprint arXiv:2310.03046 (2023). [73] Yi Zhang, Sujay Kumar Jauhar, Julia Kiseleva, Ryen White, and Dan Roth. 2021. Learning to decompose and organize complex tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2726–2735. [74] Wenhao Zhu, Hongyi Liu, Qingxiu Dong, Jingjing Xu, Lingpeng Kong, Jiajun Chen, Lei Li, and Shujian Huang. 2023. Multilingual machine translation with large language models: Empirical results and analysis. arXiv preprint arXiv:2304.04675 (2023). [75] Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B Brown, Alec Radford, Dario Amodei, Paul Christiano, and Geoffrey Irving. 2019. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593 (2019).

ACM ACM SIGIR international ACM International Conference SIGIR conference

:::info This paper is available on arxiv under CC 4.0 license. Authors: (1) Ryen W. White, Microsoft Research, Redmond, WA, USA. ::: Table of Links Abstract and Taking Search to task AI Copilots Challenges Opportunities The Undiscovered Country and References ABSTRACT As many of us in the information retrieval (IR) research community know and appreciate, search is far from being a solved problem. Millions of people struggle with tasks on search engines every day. Often, their struggles relate to the intrinsic complexity of their task and the failure of search systems to fully understand the task and serve relevant results [58]. The task motivates the search, creating the gap/problematic situation that searchers attempt to bridge/resolve and drives search behavior as they work through different task facets. Complex search tasks require more than support for rudimentary fact finding or re-finding. Research on methods to support complex tasks includes work on generating query and website suggestions [21, 62], personalizing and contextualizing search [4], and developing new search experiences, including those that span time and space [1, 64]. The recent emergence of generative artificial intelligence (AI) and the arrival of assistive agents, or copilots, based on this technology, has the potential to offer further assistance to searchers, especially those engaged in complex tasks [41, 61]. There are profound implications from these advances for the design of intelligent systems and for the future of search itself. This article, based on a keynote by the author at the 2023 ACM SIGIR Conference, explores these issues and charts a course toward new horizons in information access guided by AI copilots. ACM Reference Format: Ryen W. White. 2023. Navigating Complex Search Tasks with AI Copilots. Under review at REDACTED. October, 2023 1 TAKING SEARCH TO TASK Tasks are a critical part of people's daily lives. The market for dedicated task applications that help people with their “to do” tasks is likely to grow significantly (effectively triple in size) over the next few years.[1] There are many examples of such applications that can help both individuals (e.g., Microsoft To Do, Google Tasks, Todoist) and teams (e.g., Asana, Trello, Monday.com) tackle their tasks more effectively. Over time, these systems will increasingly integrate AI to better help their users capture, manage, and complete their tasks [60]. In information access scenarios such as search, tasks play an important role in motivating searching via gaps in knowledge and problematic situations [3, 15]. AI can be central in these search scenarios, too, especially in assisting with complex search tasks. 1.1 Tasks in Search Tasks drive the search process. The IR and information science communities have long studied tasks in search [42] and many information seeking models consider the role of task directly [3, 15]. Prior research has explored the different stages of task execution (e.g., pre-focus, focus formation, post-focus) [53], task levels [39], task facets [29], tasks defined on intents (e.g., informational, transactional, and navigational [8]; well-defined or ill-defined [23]; lookup, learn, or investigate [32]), the hierarchical structure of tasks [68], the characteristics of tasks, and the attributes of task searcher interaction, e.g., task difficulty and, of course, a focus in this article, task complexity [11, 26]. As a useful framing device to help conceptualize tasks and develop system support for them, tasks can be represented as trees comprising macrotasks (high level goals), subtasks (specific components of those goals), and actions (specific steps taken by searchers toward the completion of those components) [42]. Figure 1 presents an example of a “task tree” for a task involving an upcoming vacation to Paris, France. Examples of macrotasks, subtasks, and actions are included. Moves around this tree correspond to different task applications such as task recognition (up), task decomposition (down), and task prediction (across). Only actions (e.g., queries, clicks, and so on) are directly observable to traditional search engines. However, with recent advances in search copilots (more fully supporting natural language interactions via language understanding and language generation), more aspects of macrotasks and subtasks are becoming visible to search systems and more fully understood by those systems. Challenges in working with tasks include how to represent them within search systems, how to observe more task-relevant activity and content to develop richer task models, and how to develop task-oriented interfaces that place tasks and their completion at the forefront of user engagement. Task complexity deserves a special focus in this article given the challenges that searchers can still face with complex tasks and the significant potential of AI to help searchers tackle complex tasks. 1.2 Complex Search Tasks Recent estimates suggest that half of all Web searches are not answered.[2] Many of those searches are connected to complex search tasks. These tasks are ill-defined and/or multi-step, span multiple queries, sessions, and/or devices, and require deep engagement with search engines (many queries, backtracking, branching, etc.) to complete them [21]. Complex tasks also often have many facets and cognitive dimensions, and are closely connected to searcher characteristics such as domain expertise and task familiarity [38, 58]. To date, there have been significant attempts to support complex search tasks via humans (e.g., librarians, subject matter experts) and search systems (both general Web search engines and those tailored to specific industry verticals or domains). The main technological progress so far has been in areas such as query suggestion and contextual search, with new experiences also being developed that utilize multiple devices, provide cross-session support, and enable conversational search. We are now also seeing emerging search-related technologies in the area of generative AI [35]. Before proceeding, let us dive into these different types of existing and emerging search support for complex tasks in more detail. • Suggestions, personalization, and contextualization: Researchers and practitioners have long developed and deployed support such as query suggestion and trail suggestion, e.g., [21, 45], including providing guided tours [51] and suggesting popular trail destinations [62] as ways to find relevant resources. This coincides with work on contextual search and personalized search, e.g., [4, 47, 63], where search systems can use data from the current searcher such as session activity, location, reading level, and so on, and the searcher's long-term activity history, to provide more relevant results. Search engines may also use cohort activities to help with cold-start problems for new users and augment personal profiles for more established searchers [48, 69]. • Multi-device, cross-device, and cross-session: Devices have different capabilities and can be used in different settings. Multidevice experiences, e.g., [64], utilizing multiple devices simultaneously to better support complex tasks such as recipe preparation, auto repair, and home improvement that have been decomposed into steps manually or automatically [73]. Cross-device and cross-session support [1, 56] can help with ongoing/background searches for complex tasks that persist over space and time. For example, being able to predict task continuation can help with “slow search” applications that focus more on result quality than the near instantaneous retrieval of search results [46]. • Conversational experiences and generative AI: Natural language is an expressive and powerful means of communicating intentions and preferences with search systems. The introduction of clarification questions on search engine result pages (SERPs) [71], progress on conversational search [20], and even “conversations” with documents (where searchers can inquire about document content via natural language dialog) [49], enable these systems to engage more fully with searchers to better understand their tasks and goals. There are now many emerging opportunities to better understand and support more tasks via large-scale foundation models such as GPT-4[3] and DALL·E 3,[4] including offering conversational task assistance via chatbots such as ChatGPT.5 All of these advances, and others, have paved the way for the emergence of AI copilots, assistive agents that can help people tackle complex search tasks. [1] https://www.verifiedmarketresearch.com/product/task-management-softwaremarket/ [2] https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new-aipowered-microsoft-bing-and-edge-your-copilot-for-the-web/ [3] https://openai.com/gpt-4 [4] https://openai.com/dall-e-3 [5] https://openai.com/chatgpt

Complex Search

Search results for ACM SIGIR