Inverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (2024)

Inverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (2)

Advanced Search

nips

research-article

Free Access

  • Authors:
  • Joey Hejna Stanford University

    Stanford University

    Search about this author

    ,
  • Dorsa Sadigh Stanford University

    Stanford University

    Search about this author

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing SystemsDecember 2023Article No.: 825Pages 18806–18827

Published:30 May 2024Publication History

  • Get Citation Alerts

    New Citation Alert added!

    This alert has been successfully added and will be sent to:

    You will be notified whenever a record that you have chosen has been cited.

    To manage your alert preferences, click on the button below.

    Manage my Alerts

    New Citation Alert!

    Please log in to your account

  • Publisher Site

NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Inverse preference learning: preference-based rl without a reward function

Pages 18806–18827

PreviousChapterNextChapter

Inverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (3)

ABSTRACT

Reward functions are difficult to design and often hard to align with human intent. Preference-based Reinforcement Learning (RL) algorithms address these problems by learning reward functions from human feedback. However, the majority of preference-based RL methods naïvely combine supervised reward models with off-the-shelf RL algorithms. Contemporary approaches have sought to improve performance and query complexity by using larger and more complex reward architectures such as transformers. Instead of using highly complex architectures, we develop a new and parameter-efficient algorithm, Inverse Preference Learning (IPL), specifically designed for learning from offline preference data. Our key insight is that for a fixed policy, the Q-function encodes all information about the reward function, effectively making them interchangeable. Using this insight, we completely eliminate the need for a learned reward function. Our resulting algorithm is simpler and more parameter-efficient. Across a suite of continuous control and robotics benchmarks, IPL attains competitive performance compared to more complex approaches that leverage transformer-based and non-Markovian reward functions while having fewer algorithmic hyperparameters and learned network parameters. Our code is publicly released https://github.com/jhejna/inverse-preference-learning.

References

  1. Pieter Abbeel and Andrew Y Ng. Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning, 2004.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (4)Digital Library
  2. Baris Akgun, Maya Cakmak, Karl Jiang, and Andrea L Thomaz. Keyframe-based learning from demonstration. International Journal of Social Robotics, 4(4):343-355, 2012.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (6)Cross Ref
  3. Riad Akrour, Marc Schoenauer, and Michele Sebag. Preference-based policy learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2011.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (8)Digital Library
  4. Firas Al-Hafez, Davide Tateo, Oleg Arenz, Guoping Zhao, and Jan Peters. LS-IQ: Implicit reward regularization for inverse reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=o3Q4m8jg4BR.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (10)
  5. Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (11)
  6. Fahiem Bacchus, Craig Boutilier, and Adam Grove. Rewarding behaviors. In National Conference on Artificial Intelligence, 1996.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (12)
  7. Chandrayee Basu, Qian Yang, David Hungerman, Mukesh Sinahal, and Anca D Draqan. Do you want your autonomous car to drive like you? In 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI, pages 417-425. IEEE, 2017.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (13)Digital Library
  8. Erdem Biyik, Daniel A Lazar, Dorsa Sadigh, and Ramtin Pedarsani. The green choice: Learning and influencing human decisions on shared roads. In 2019 IEEE 58th conference on decision and control (CDC), pages 347-354. IEEE, 2019.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (15)Digital Library
  9. Erdem Biyik, Nicolas Huynh, Mykel J. Kochenderfer, and Dorsa Sadigh. Active preference-based gaussian process regression for reward learning. In Proceedings of Robotics: Science and Systems (RSS), July 2020.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (17)Cross Ref
  10. Ralph Allan Bradley and Milton E Terry. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324-345, 1952.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (19)Cross Ref
  11. Daniel Brown, Wonjoon Goo, Prabhat Nagarajan, and Scott Niekum. Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In International conference on machine learning, pages 783-792. PMLR, 2019.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (21)
  12. Daniel Brown, Russell Coleman, Ravi Srinivasan, and Scott Niekum. Safe imitation learning via fast bayesian reward inference from preferences. In International Conference on Machine Learning, pages 1165-1177. PMLR, 2020.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (22)
  13. Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems, 2017.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (23)
  14. Christian Daniel, Oliver Kroemer, Malte Viering, Jan Metz, and Jan Peters. Active reward learning with a novel acquisition function. Autonomous Robots, 39(3):389-405, 2015.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (24)Digital Library
  15. Joseph Early, Tom Bewley, Christine Evers, and Sarvapali Ramchurn. Non-markovian reward modelling from trajectory labels via interpretable multiple instance learning. In Advances in Neural Information Processing Systems, 2022.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (26)
  16. Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, and Aleksander Madry. Implementation matters in deep rl: A case study on ppo and trpo. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=r1etN1rtPB.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (27)
  17. Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (28)
  18. Divyansh Garg, Shuvam Chakraborty, Chris Cundy, Jiaming Song, and Stefano Ermon. Iq-learn: Inverse soft-q learning for imitation. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=Aeo-xqtb5p.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (29)
  19. Divyansh Garg, Joey Hejna, Matthieu Geist, and Stefano Ermon. Extreme q-learning: Maxent RL without entropy. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=SJ0Lde3tRL.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (30)
  20. Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, 2018.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (31)
  21. Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart J Russell, and Anca Dragan. Inverse reward design. Advances in neural information processing systems, 30, 2017.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (32)
  22. Joey Hejna and Dorsa Sadigh. Few-shot preference learning for human-in-the-loop RL. In Conference on Robot Learning, 2022.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (33)
  23. Joey Hejna, Rafael Rafailov, Harsh*t Sikchi, Chelsea Finn, Scott Niekum, W Bradley Knox, and Dorsa Sadigh. Contrastive preference learning: Learning from human feedback without rl. arXiv preprint arXiv:2310.13639, 2023.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (34)
  24. Jonathan Ho and Stefano Ermon. Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (35)
  25. Borja Ibarz, Jan Leike, Tobias Pohlen, Geoffrey Irving, Shane Legg, and Dario Amodei. Reward learning from human preferences and demonstrations in atari. In Advances in Neural Information Processing Systems, 2018.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (36)
  26. Yachen Kang, Diyuan Shi, Jinxin Liu, Li He, and Donglin Wang. Beyond reward: Offline preference-guided policy optimization. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 15753-15768. PMLR, 23-29 Jul 2023. URL https://proceedings.mlr.press/v202/kang23b.html.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (37)
  27. Rebecca P Khurshid and Katherine J Kuchenbecker. Data-driven motion mappings improve transparency in teleoperation. Presence, 24(2):132-154, 2015.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (38)Digital Library
  28. Changyeon Kim, Jongjin Park, Jinwoo Shin, Honglak Lee, Pieter Abbeel, and Kimin Lee. Preference transformer: Modeling human preferences using transformers for rl. In International Conference on Learning Representations, 2023.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (40)
  29. W Bradley Knox and Peter Stone. Tamer: Training an agent manually via evaluative reinforcement. In 2008 7th IEEE international conference on development and learning, pages 292-297. IEEE, 2008.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (41)Cross Ref
  30. Ilya Kostrikov, Ashvin Nair, and Sergey Levine. Offline reinforcement learning with implicit q-learning. In International Conference on Learning Representations, 2022.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (43)
  31. Minae Kwon, Erdem Biyik, Aditi Talati, Karan Bhasin, Dylan P Losey, and Dorsa Sadigh. When humans aren't optimal: Robots that collaborate with risk-aware humans. In 2020 15th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pages 43-52. IEEE, 2020.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (44)Digital Library
  32. Kimin Lee, Laura Smith, and Pieter Abbeel. Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. In International Conference on Machine Learning, 2021.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (46)
  33. Kimin Lee, Laura Smith, Anca Dragan, and Pieter Abbeel. B-pref: Benchmarking preference-based reinforcement learning. In Conference on Neural Information Processing Systems Datasets and Benchmarks Track (round 1), 2021.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (47)
  34. Jessy Lin, Daniel Fried, Dan Klein, and Anca Dragan. Inferring rewards from language in context. arXiv preprint arXiv:2204.02515, 2022.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (48)
  35. Dylan P Losey, Krishnan Srinivasan, Ajay Mandlekar, Animesh Garg, and Dorsa Sadigh. Controlling assistive robots with learned latent actions. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 378-384. IEEE, 2020.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (49)Cross Ref
  36. Ajay Mandlekar, Danfei Xu, Josiah Wong, Soroush Nasiriany, Chen Wang, Rohun Kulkarni, Li Fei-Fei, Silvio Savarese, Yuke Zhu, and Roberto Martín-Martín. What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning (CoRL), 2021.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (51)
  37. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (52)
  38. Vivek Myers, Erdem Biyik, Nima Anari, and Dorsa Sadigh. Learning multimodal rewards from rankings. In Conference on Robot Learning, pages 342-352. PMLR, 2022.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (53)
  39. Ashvin Nair, Murtaza Dalal, Abhishek Gupta, and Sergey Levine. {AWAC}: Accelerating online reinforcement learning with offline datasets, 2021. URL https://openreview.net/forum?id=OJiM1R3jAtZ.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (54)
  40. Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, et al. Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv:2112.09332, 2021.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (55)
  41. Andrew Y Ng, Stuart J Russell, et al. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning, 2000.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (56)
  42. Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (57)
  43. Jongjin Park, Younggyo Seo, Jinwoo Shin, Honglak Lee, Pieter Abbeel, and Kimin Lee. Surf: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning. In International Conference on Learning Representations, 2022.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (58)
  44. Xue Bin Peng, Aviral Kumar, Grace Zhang, and Sergey Levine. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, 2019.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (59)
  45. Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D Manning, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290, 2023.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (60)
  46. Deepak Ramachandran and Eyal Amir. Bayesian inverse reinforcement learning. In IJCAI, volume 7, pages 2586-2591, 2007.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (61)
  47. Dorsa Sadigh, Anca D Dragan, Shankar Sastry, and Sanjit A Seshia. Active preference-based learning of reward functions. In Robotics: Science and Systems, 2017.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (62)Cross Ref
  48. C. Schenck and D. Fox. Visual closed-loop control for pouring liquids. In International Conference on Robotics and Automation, 2017.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (64)Digital Library
  49. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (66)
  50. Daniel Shin and Daniel S Brown. Offline preference-based apprenticeship learning. arXiv preprint arXiv:2107.09251, 2021.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (67)
  51. Nisan Stiennon, Long Ouyang, Jeff Wu, Daniel M Ziegler, Ryan Lowe, Chelsea Voss, Alec Radford, Dario Amodei, and Paul Christiano. Learning to summarize from human feedback. arXiv preprint arXiv:2009.01325, 2020.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (68)
  52. Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT Press, 2018.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (69)Digital Library
  53. Aaron Wilson, Alan Fern, and Prasad Tadepalli. A bayesian approach for policy learning from trajectory preference queries. In Advances in Neural Information Processing Systems, 2012.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (71)
  54. Jeff Wu, Long Ouyang, Daniel M Ziegler, Nissan Stiennon, Ryan Lowe, Jan Leike, and Paul Christiano. Recursively summarizing books with human feedback. arXiv preprint arXiv:2109.10862, 2021.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (72)
  55. Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Victor Wai Kin Chan, and Xianyuan Zhan. Offline rl with no ood actions: In-sample learning via implicit value regularization. In International Conference on Learning Representations, 2023.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (73)
  56. Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning, 2020.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (74)
  57. Henry Zhu, Justin Yu, Abhishek Gupta, Dhruv Shah, Kristian Hartikainen, Avi Singh, Vikash Kumar, and Sergey Levine. The ingredients of real world robotic reinforcement learning. In International Conference on Learning Representations, 2020.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (75)
  58. Brian D Ziebart. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. Carnegie Mellon University, 2010.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (76)Digital Library
  59. Brian D Ziebart, Andrew L Maas, J Andrew Bagnell, Anind K Dey, et al. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pages 1433-1438. Chicago, IL, USA, 2008.Google ScholarInverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (78)Digital Library

Cited By

View all

Inverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (80)

    Recommendations

    • Preference elicitation and inverse reinforcement learning

      ECML PKDD'11: Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III

      We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a ...

      Read More

    • Preference elicitation and inverse reinforcement learning

      ECMLPKDD'11: Proceedings of the 2011th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III

      We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting in a principled (Bayesian) statistical formulation. This generalises previous work on Bayesian inverse reinforcement learning and allows us to obtain a ...

      Read More

    • A survey of inverse reinforcement learning

      Abstract

      Learning from demonstration, or imitation learning, is the process of learning to act in an environment from examples provided by a teacher. Inverse reinforcement learning (IRL) is a specific form of learning from demonstration that attempts to ...

      Read More

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    Get this Publication

    • Information
    • Contributors
    • Published in

      Inverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (81)

      NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

      December 2023

      80772 pages

      • Editors:
      • A. Oh,
      • T. Naumann,
      • A. Globerson,
      • K. Saenko,
      • M. Hardt,
      • S. Levine

      Copyright © 2023 Neural Information Processing Systems Foundation, Inc.

      Sponsors

        In-Cooperation

          Publisher

          Curran Associates Inc.

          Red Hook, NY, United States

          Publication History

          • Published: 30 May 2024

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Conference

          Funding Sources

          • Inverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (82)

            Other Metrics

            View Article Metrics

          • Bibliometrics
          • Citations0
          • Article Metrics

            • Total Citations

              View Citations
            • Total Downloads

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0

            Other Metrics

            View Author Metrics

          • Cited By

            This publication has not been cited yet

          Digital Edition

          View this article in digital edition.

          View Digital Edition

          • Figures
          • Other

            Close Figure Viewer

            Browse AllReturn

            Caption

            View Table of Contents

            Export Citations

              Your Search Results Download Request

              We are preparing your search results for download ...

              We will inform you here when the file is ready.

              Download now!

              Your Search Results Download Request

              Your file of search results citations is now ready.

              Download now!

              Your Search Results Download Request

              Your search export query has expired. Please try again.

              Inverse preference learning | Proceedings of the 37th International Conference on Neural Information Processing Systems (2024)
              Top Articles
              Latest Posts
              Article information

              Author: Kieth Sipes

              Last Updated:

              Views: 6476

              Rating: 4.7 / 5 (47 voted)

              Reviews: 86% of readers found this page helpful

              Author information

              Name: Kieth Sipes

              Birthday: 2001-04-14

              Address: Suite 492 62479 Champlin Loop, South Catrice, MS 57271

              Phone: +9663362133320

              Job: District Sales Analyst

              Hobby: Digital arts, Dance, Ghost hunting, Worldbuilding, Kayaking, Table tennis, 3D printing

              Introduction: My name is Kieth Sipes, I am a zany, rich, courageous, powerful, faithful, jolly, excited person who loves writing and wants to share my knowledge and understanding with you.