Nvidia Corporation
Santa Clara, CA
NVIDIA is driving a vision for AI factories that convert tokens to intelligence at scale to power AI demands of tomorrow. Maintaining AI infrastructure at scale takes more than human involvement; it demands smart automation. The orchestration engine for AI factory break-fix runs live in production at DGX Cloud. As the Product Manager leading all aspects of resilient automation at AI Factory, you will manage break-fix automation. You will develop the product strategy, improve operator experience, and guide the roadmap for professionals. You will build a scalable, reliable product from a strong engineering foundation that NVIDIA Cloud Partners depend on to uphold their SLAs. This is your chance to compose how AI factories self-heal! What You'll Be Doing: Take full responsibility for the strategic direction and roadmap of the break-fix automation system spanning multiple vendors, technologies, and CSPs. Define automation confidence thresholds, blocking issue criteria, and...