Training language models to follow instructions with human feedback L Ouyang, J Wu, X Jiang, D Almeida, C Wainwright, P Mishkin, C Zhang, ... Advances in Neural Information Processing Systems 35, 27730-27744, 2022 | 11193 | 2022 |
Training verifiers to solve math word problems K Cobbe, V Kosaraju, M Bavarian, M Chen, H Jun, L Kaiser, M Plappert, ... arXiv preprint arXiv:2110.14168, 2021 | 2475 | 2021 |
TruthfulQA: Measuring How Models Mimic Human Falsehoods S Lin, J Hilton, O Evans Association for Computational Linguistics, 3214-3252, 2022 | 1381 | 2022 |
WebGPT: Browser-assisted question-answering with human feedback R Nakano, J Hilton, S Balaji, J Wu, L Ouyang, C Kim, C Hesse, S Jain, ... arXiv preprint arXiv:2112.09332, 2021 | 1094 | 2021 |
Leveraging procedural generation to benchmark reinforcement learning K Cobbe, C Hesse, J Hilton, J Schulman International conference on machine learning, 2048-2056, 2020 | 632 | 2020 |
Scaling laws for reward model overoptimization L Gao, J Schulman, J Hilton International Conference on Machine Learning, 10835-10866, 2023 | 359 | 2023 |
ChatGPT: Optimizing language models for dialogue J Schulman, B Zoph, C Kim, J Hilton, J Menick, J Weng, JFC Uribe, ... OpenAI blog, 2022 | 307 | 2022 |
Teaching Models to Express Their Uncertainty in Words S Lin, J Hilton, O Evans Transactions on Machine Learning Research, 2022 | 254 | 2022 |
Phasic policy gradient KW Cobbe, J Hilton, O Klimov, J Schulman International Conference on Machine Learning, 2020-2027, 2021 | 191 | 2021 |
Understanding RL Vision J Hilton, N Cammarata, S Carter, G Goh, C Olah Distill 5 (11), e29, 2020 | 25 | 2020 |
Scaling laws for single-agent reinforcement learning J Hilton, J Tang, J Schulman arXiv preprint arXiv:2301.13442, 2023 | 15 | 2023 |
Batch size-invariance for policy optimization J Hilton, K Cobbe, J Schulman Advances in Neural Information Processing Systems 35, 17086-17098, 2022 | 14 | 2022 |
Topological Ramsey numbers and countable ordinals AE Caicedo, J Hilton Foundations of mathematics 690, 85-118, 2017 | 12 | 2017 |
The topological pigeonhole principle for ordinals J Hilton The Journal of Symbolic Logic 81 (2), 662-686, 2016 | 12 | 2016 |
Combinatorics of countable ordinal topologies JH Hilton University of Leeds, 2016 | 3 | 2016 |
Backdoor defense, learnability and obfuscation P Christiano, J Hilton, V Lecomte, M Xu arXiv preprint arXiv:2409.03077, 2024 | 1 | 2024 |
Obfuscated Activations Bypass LLM Latent-Space Defenses L Bailey, A Serrano, A Sheshadri, M Seleznyov, J Taylor, E Jenner, ... arXiv preprint arXiv:2412.09565, 2024 | | 2024 |
Estimating the Probabilities of Rare Outputs in Language Models G Wu, J Hilton arXiv preprint arXiv:2410.13211, 2024 | | 2024 |
Towards a Law of Iterated Expectations for Heuristic Estimators P Christiano, J Hilton, A Lincoln, E Neyman, M Xu arXiv preprint arXiv:2410.01290, 2024 | | 2024 |
Any modification of Müller's Markov process is transient J Hilton, J Kramár | | |