K* Mastery


Key Features:

  • Adaptive Memory Redistribution: Dynamically allocates memory to different layers of the transformer model during training, based on their current memory requirements.
  • Lazy Adagrad: A modified version of the Adagrad optimizer that reduces memory overhead by storing only the gradients of the most recent layers.
  • Gradient Partitioning: Divides the gradients of large layers into smaller chunks, reducing the memory needed to store them.


  • Scalability: Facilitates the training of extremely large transformer models, which would otherwise be impractical due to memory limitations.


K* Mastery has been successfully used in the training of various transformer-based models, including:

  • Natural Language Processing (NLP): BERT, GPT-3, T5
  • Computer Vision: Vision Transformer (ViT), Swin Transformer
  • Audio Processing: Wav2Vec 2.0, HuBERT

Comparison to Other Methods:

K* Mastery outperforms other memory-efficient transformer training algorithms in terms of both memory consumption and training speed. It is particularly effective for training large models with billions or trillions of parameters.


K Mastery


K Mastery is a proficiency-based learning model where students demonstrate their understanding of academic content through a series of interconnected knowledge checkpoints.

Key Elements:

  • Feedback and Remediation: Students receive ongoing feedback on their progress and have opportunities to retake checkpoints if they do not initially achieve mastery.
  • Mastery Portfolio: Students document their evidence of mastery and reflect on their learning journey.
  • Personalized Learning: Students can learn at their own pace and access differentiated instruction as needed.


  • Improved Academic Performance: Students who master concepts thoroughly develop stronger foundations and perform better on standardized assessments.
  • Increased Student Motivation: Students are engaged in the learning process and motivated to achieve mastery.
  • Tailored Instruction: Teachers can provide targeted support and address individual student needs.
  • Increased Student Ownership: Students take responsibility for their learning and develop self-direction skills.
  • Assessment for Learning: Checkpoints provide opportunities for formative assessment and guide instruction.


  • Content Standards: Identify specific learning objectives and align checkpoints with grade-level standards.
  • Checkpoint Development: Create clear and measurable checkpoints that assess student understanding.
  • Instructional Planning: Design lessons that support student learning towards checkpoints.
  • Monitoring Student Progress: Track student performance on checkpoints and provide timely feedback.
  • Differentiated Instruction: Offer support and extensions as needed to meet the needs of all learners.


  • Math: Students master concepts such as multiplication and division through a series of checkpoints that assess fluency, problem-solving, and application.
  • Science: Students investigate ecosystems and cells through hands-on experiments and written reports that demonstrate their understanding of key concepts.


K Mastery is an innovative learning model that empowers students to achieve deep understanding and mastery of academic content. By providing incremental learning experiences, targeted feedback, and personalized instruction, K Mastery enhances student engagement, increases academic achievement, and fosters lifelong learning skills.

