Follow
Jacob Hilton
Jacob Hilton
Alignment Research Center
Verified email at alignment.org - Homepage
Title
Cited by
Cited by
Year
Training language models to follow instructions with human feedback
L Ouyang, J Wu, X Jiang, D Almeida, C Wainwright, P Mishkin, C Zhang, ...
Advances in Neural Information Processing Systems 35, 27730-27744, 2022
111932022
Training verifiers to solve math word problems
K Cobbe, V Kosaraju, M Bavarian, M Chen, H Jun, L Kaiser, M Plappert, ...
arXiv preprint arXiv:2110.14168, 2021
24752021
TruthfulQA: Measuring How Models Mimic Human Falsehoods
S Lin, J Hilton, O Evans
Association for Computational Linguistics, 3214-3252, 2022
13812022
WebGPT: Browser-assisted question-answering with human feedback
R Nakano, J Hilton, S Balaji, J Wu, L Ouyang, C Kim, C Hesse, S Jain, ...
arXiv preprint arXiv:2112.09332, 2021
10942021
Leveraging procedural generation to benchmark reinforcement learning
K Cobbe, C Hesse, J Hilton, J Schulman
International conference on machine learning, 2048-2056, 2020
6322020
Scaling laws for reward model overoptimization
L Gao, J Schulman, J Hilton
International Conference on Machine Learning, 10835-10866, 2023
3592023
ChatGPT: Optimizing language models for dialogue
J Schulman, B Zoph, C Kim, J Hilton, J Menick, J Weng, JFC Uribe, ...
OpenAI blog, 2022
3072022
Teaching Models to Express Their Uncertainty in Words
S Lin, J Hilton, O Evans
Transactions on Machine Learning Research, 2022
2542022
Phasic policy gradient
KW Cobbe, J Hilton, O Klimov, J Schulman
International Conference on Machine Learning, 2020-2027, 2021
1912021
Understanding RL Vision
J Hilton, N Cammarata, S Carter, G Goh, C Olah
Distill 5 (11), e29, 2020
252020
Scaling laws for single-agent reinforcement learning
J Hilton, J Tang, J Schulman
arXiv preprint arXiv:2301.13442, 2023
152023
Batch size-invariance for policy optimization
J Hilton, K Cobbe, J Schulman
Advances in Neural Information Processing Systems 35, 17086-17098, 2022
142022
Topological Ramsey numbers and countable ordinals
AE Caicedo, J Hilton
Foundations of mathematics 690, 85-118, 2017
122017
The topological pigeonhole principle for ordinals
J Hilton
The Journal of Symbolic Logic 81 (2), 662-686, 2016
122016
Combinatorics of countable ordinal topologies
JH Hilton
University of Leeds, 2016
32016
Backdoor defense, learnability and obfuscation
P Christiano, J Hilton, V Lecomte, M Xu
arXiv preprint arXiv:2409.03077, 2024
12024
Obfuscated Activations Bypass LLM Latent-Space Defenses
L Bailey, A Serrano, A Sheshadri, M Seleznyov, J Taylor, E Jenner, ...
arXiv preprint arXiv:2412.09565, 2024
2024
Estimating the Probabilities of Rare Outputs in Language Models
G Wu, J Hilton
arXiv preprint arXiv:2410.13211, 2024
2024
Towards a Law of Iterated Expectations for Heuristic Estimators
P Christiano, J Hilton, A Lincoln, E Neyman, M Xu
arXiv preprint arXiv:2410.01290, 2024
2024
Any modification of Müller's Markov process is transient
J Hilton, J Kramár
The system can't perform the operation now. Try again later.
Articles 1–20