Finetune
Author: p | 2025-04-24
Finetuned TorToiSe Models In the ./finetunes/ folder contains a collection of my finetuned models. Each model folder contains: the pickle'd finetuned model for tortoise-tts; the LJSpeech
GitHub - homgorn/unsloth_reasoning-finetune: Finetune
Support prompts that require multiple input lines. More information and additional resourcestutorials/download_model_weights: A more comprehensive download tutorial, tips for GPU memory limitations, and moreFinetune LLMsLitGPT supports several methods of supervised instruction finetuning, which allows you to finetune models to follow instructions.Datasets for Instruction-finetuning are usually formatted in the following way:Alternatively, datasets for instruction finetuning can also contain an 'input' field:In an instruction-finetuning context, "full" finetuning means updating all model parameters as opposed to only a subset. Adapter and LoRA (short for low-rank adaptation) are methods for parameter-efficient finetuning that only require updating a small fraction of the model weights.Parameter-efficient finetuning is much more resource-efficient and cheaper than full finetuning, and it often results in the same good performance on downstream tasks.In the following example, we will use LoRA for finetuning, which is one of the most popular LLM finetuning methods. (For more information on how LoRA works, please see Code LoRA from Scratch.)Before we start, we have to download a model as explained in the previous "Download pretrained model" section above:litgpt download microsoft/phi-2The LitGPT interface can be used via command line arguments and configuration files. We recommend starting with the configuration files from the config_hub and either modifying them directly or overriding specific settings via the command line. For example, we can use the following setting to train the downloaded 2.7B parameter microsoft/phi-2 model, where we set --max_steps 5 for a quick test run.If you have downloaded or cloned the LitGPT repository, you can provide the config file via a relative path:litgpt finetune_lora microsoft/phi-2\ --config config_hub/finetune/phi-2/lora.yaml \ --train.max_steps 5Alternatively, you can provide a URL:litgpt finetune_lora microsoft/phi-2\ --config \ --train.max_steps 5TipNote that the config file above will finetune the model on the Alpaca2k dataset on 1 GPU and save the resulting files in an out/finetune/lora-phi-2 directory. All of these settings can be changed via a respective command line argument or by changing the config file.To see more options, execute litgpt finetune_lora --help.Running the previous finetuning command will initiate the finetuning process, which should only take about a minute on a GPU due to the --train.max_steps 5 setting., ignore_index=-100, seed=42, num_workers=4, download_dir=PosixPath('data/alpaca2k')), 'devices': 1, 'eval': EvalArgs(interval=100, max_new_tokens=100, max_iters=100), 'logger_name': 'csv', 'lora_alpha': 16, 'lora_dropout': 0.05, 'lora_head': True, 'lora_key': True, 'lora_mlp': True, 'lora_projection': True, 'lora_query': True, 'lora_r': 8, 'lora_value': True, 'num_nodes': 1, 'out_dir': PosixPath('out/finetune/lora-phi-2'), 'precision': 'bf16-true', 'quantize': None, 'seed': 1337, 'train': TrainArgs(save_interval=800, log_interval=1, global_batch_size=8, micro_batch_size=4, lr_warmup_steps=10, epochs=1, max_tokens=None, max_steps=5, max_seq_length=512, tie_embeddings=None, learning_rate=0.0002, weight_decay=0.0, beta1=0.9, Finetuned TorToiSe Models In the ./finetunes/ folder contains a collection of my finetuned models. Each model folder contains: the pickle'd finetuned model for tortoise-tts; the LJSpeech Download Effusion - FineTune Music MP3 song on Boomplay and listen Effusion - FineTune Music offline with lyrics. Effusion - FineTune Music MP3 song from the FineTune Music’s None, 'seed': 1337, 'train': TrainArgs(save_interval=800, log_interval=1, global_batch_size=8, micro_batch_size=4, lr_warmup_steps=10, epochs=1, max_tokens=None, max_steps=5, max_seq_length=512, tie_embeddings=None, learning_rate=0.0002, weight_decay=0.0, beta1=0.9, beta2=0.95, max_norm=None, min_lr=6e-05)}Seed set to 1337Number of trainable parameters: 12,226,560Number of non-trainable parameters: 2,779,683,840The longest sequence length in the train data is 512, the model's maximum sequence length is 512 and context length is 2048Validating ...Recommend a movie for me to watch during the weekend and explain the reason.Below is an instruction that describes a task. Write a response that appropriately completes the request.### Instruction:Recommend a movie for me to watch during the weekend and explain the reason.### Response:I recommend you watch "Parasite" because it's a critically acclaimed movie that won multiple awards, including the Academy Award for Best Picture. It's a thought-provoking and suspenseful film that will keep you on the edge of your seat. The movie also tackles social and economic inequalities, making it a must-watch for anyone interested in meaningful storytelling./home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The ``compute`` method of metric MeanMetric was called before the ``update`` method which may lead to errors, as metric states have not yet been updated. warnings.warn(*args, **kwargs) # noqa: B028Missing logger folder: out/finetune/lora-phi-2/logs/csvEpoch 1 | iter 1 step 0 | loss train: 1.646, val: n/a | iter time: 820.31 msEpoch 1 | iter 2 step 1 | loss train: 1.660, val: n/a | iter time: 548.72 ms (step)Epoch 1 | iter 3 step 1 | loss train: 1.687, val: n/a | iter time: 300.07 msEpoch 1 | iter 4 step 2 | loss train: 1.597, val: n/a | iter time: 595.27 ms (step)Epoch 1 | iter 5 step 2 | loss train: 1.640, val: n/a | iter time: 260.75 msEpoch 1 | iter 6 step 3 | loss train: 1.703, val: n/a | iter time: 568.22 ms (step)Epoch 1 | iter 7 step 3 | loss train: 1.678, val: n/a | iter time: 511.70 msEpoch 1 | iter 8 step 4 | loss train: 1.741, val: n/a | iter time: 514.14 ms (step)Epoch 1 | iter 9 step 4 | loss train: 1.689, val: n/a | iter time: 423.59 msEpoch 1 | iter 10 step 5 | loss train: 1.524, val: n/a | iter time: 603.03 ms (step)Training time: 11.20sMemory used: 13.90 GBSaving LoRA weights to 'out/finetune/lora-phi-2/final/lit_model.pth.lora'Saved merged weights to 'out/finetune/lora-phi-2/final/lit_model.pth'Notice that the LoRA script saves both the LoRA weights ('out/finetune/lora-phi-2/final/lit_model.pth.lora') and the LoRA weight merged back into the original model ('out/finetune/lora-phi-2/final/lit_model.pth') for convenience. This allows us toComments
Support prompts that require multiple input lines. More information and additional resourcestutorials/download_model_weights: A more comprehensive download tutorial, tips for GPU memory limitations, and moreFinetune LLMsLitGPT supports several methods of supervised instruction finetuning, which allows you to finetune models to follow instructions.Datasets for Instruction-finetuning are usually formatted in the following way:Alternatively, datasets for instruction finetuning can also contain an 'input' field:In an instruction-finetuning context, "full" finetuning means updating all model parameters as opposed to only a subset. Adapter and LoRA (short for low-rank adaptation) are methods for parameter-efficient finetuning that only require updating a small fraction of the model weights.Parameter-efficient finetuning is much more resource-efficient and cheaper than full finetuning, and it often results in the same good performance on downstream tasks.In the following example, we will use LoRA for finetuning, which is one of the most popular LLM finetuning methods. (For more information on how LoRA works, please see Code LoRA from Scratch.)Before we start, we have to download a model as explained in the previous "Download pretrained model" section above:litgpt download microsoft/phi-2The LitGPT interface can be used via command line arguments and configuration files. We recommend starting with the configuration files from the config_hub and either modifying them directly or overriding specific settings via the command line. For example, we can use the following setting to train the downloaded 2.7B parameter microsoft/phi-2 model, where we set --max_steps 5 for a quick test run.If you have downloaded or cloned the LitGPT repository, you can provide the config file via a relative path:litgpt finetune_lora microsoft/phi-2\ --config config_hub/finetune/phi-2/lora.yaml \ --train.max_steps 5Alternatively, you can provide a URL:litgpt finetune_lora microsoft/phi-2\ --config \ --train.max_steps 5TipNote that the config file above will finetune the model on the Alpaca2k dataset on 1 GPU and save the resulting files in an out/finetune/lora-phi-2 directory. All of these settings can be changed via a respective command line argument or by changing the config file.To see more options, execute litgpt finetune_lora --help.Running the previous finetuning command will initiate the finetuning process, which should only take about a minute on a GPU due to the --train.max_steps 5 setting., ignore_index=-100, seed=42, num_workers=4, download_dir=PosixPath('data/alpaca2k')), 'devices': 1, 'eval': EvalArgs(interval=100, max_new_tokens=100, max_iters=100), 'logger_name': 'csv', 'lora_alpha': 16, 'lora_dropout': 0.05, 'lora_head': True, 'lora_key': True, 'lora_mlp': True, 'lora_projection': True, 'lora_query': True, 'lora_r': 8, 'lora_value': True, 'num_nodes': 1, 'out_dir': PosixPath('out/finetune/lora-phi-2'), 'precision': 'bf16-true', 'quantize': None, 'seed': 1337, 'train': TrainArgs(save_interval=800, log_interval=1, global_batch_size=8, micro_batch_size=4, lr_warmup_steps=10, epochs=1, max_tokens=None, max_steps=5, max_seq_length=512, tie_embeddings=None, learning_rate=0.0002, weight_decay=0.0, beta1=0.9,
2025-04-21None, 'seed': 1337, 'train': TrainArgs(save_interval=800, log_interval=1, global_batch_size=8, micro_batch_size=4, lr_warmup_steps=10, epochs=1, max_tokens=None, max_steps=5, max_seq_length=512, tie_embeddings=None, learning_rate=0.0002, weight_decay=0.0, beta1=0.9, beta2=0.95, max_norm=None, min_lr=6e-05)}Seed set to 1337Number of trainable parameters: 12,226,560Number of non-trainable parameters: 2,779,683,840The longest sequence length in the train data is 512, the model's maximum sequence length is 512 and context length is 2048Validating ...Recommend a movie for me to watch during the weekend and explain the reason.Below is an instruction that describes a task. Write a response that appropriately completes the request.### Instruction:Recommend a movie for me to watch during the weekend and explain the reason.### Response:I recommend you watch "Parasite" because it's a critically acclaimed movie that won multiple awards, including the Academy Award for Best Picture. It's a thought-provoking and suspenseful film that will keep you on the edge of your seat. The movie also tackles social and economic inequalities, making it a must-watch for anyone interested in meaningful storytelling./home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The ``compute`` method of metric MeanMetric was called before the ``update`` method which may lead to errors, as metric states have not yet been updated. warnings.warn(*args, **kwargs) # noqa: B028Missing logger folder: out/finetune/lora-phi-2/logs/csvEpoch 1 | iter 1 step 0 | loss train: 1.646, val: n/a | iter time: 820.31 msEpoch 1 | iter 2 step 1 | loss train: 1.660, val: n/a | iter time: 548.72 ms (step)Epoch 1 | iter 3 step 1 | loss train: 1.687, val: n/a | iter time: 300.07 msEpoch 1 | iter 4 step 2 | loss train: 1.597, val: n/a | iter time: 595.27 ms (step)Epoch 1 | iter 5 step 2 | loss train: 1.640, val: n/a | iter time: 260.75 msEpoch 1 | iter 6 step 3 | loss train: 1.703, val: n/a | iter time: 568.22 ms (step)Epoch 1 | iter 7 step 3 | loss train: 1.678, val: n/a | iter time: 511.70 msEpoch 1 | iter 8 step 4 | loss train: 1.741, val: n/a | iter time: 514.14 ms (step)Epoch 1 | iter 9 step 4 | loss train: 1.689, val: n/a | iter time: 423.59 msEpoch 1 | iter 10 step 5 | loss train: 1.524, val: n/a | iter time: 603.03 ms (step)Training time: 11.20sMemory used: 13.90 GBSaving LoRA weights to 'out/finetune/lora-phi-2/final/lit_model.pth.lora'Saved merged weights to 'out/finetune/lora-phi-2/final/lit_model.pth'Notice that the LoRA script saves both the LoRA weights ('out/finetune/lora-phi-2/final/lit_model.pth.lora') and the LoRA weight merged back into the original model ('out/finetune/lora-phi-2/final/lit_model.pth') for convenience. This allows us to
2025-04-10Beta2=0.95, max_norm=None, min_lr=6e-05)}Seed set to 1337Number of trainable parameters: 12,226,560Number of non-trainable parameters: 2,779,683,840The longest sequence length in the train data is 512, the model's maximum sequence length is 512 and context length is 2048Validating ...Recommend a movie for me to watch during the weekend and explain the reason.Below is an instruction that describes a task. Write a response that appropriately completes the request.### Instruction:Recommend a movie for me to watch during the weekend and explain the reason.### Response:I recommend you watch "Parasite" because it's a critically acclaimed movie that won multiple awards, including the Academy Award for Best Picture. It's a thought-provoking and suspenseful film that will keep you on the edge of your seat. The movie also tackles social and economic inequalities, making it a must-watch for anyone interested in meaningful storytelling./home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torchmetrics/utilities/prints.py:43: UserWarning: The ``compute`` method of metric MeanMetric was called before the ``update`` method which may lead to errors, as metric states have not yet been updated. warnings.warn(*args, **kwargs) # noqa: B028Missing logger folder: out/finetune/lora-phi-2/logs/csvEpoch 1 | iter 1 step 0 | loss train: 1.646, val: n/a | iter time: 820.31 msEpoch 1 | iter 2 step 1 | loss train: 1.660, val: n/a | iter time: 548.72 ms (step)Epoch 1 | iter 3 step 1 | loss train: 1.687, val: n/a | iter time: 300.07 msEpoch 1 | iter 4 step 2 | loss train: 1.597, val: n/a | iter time: 595.27 ms (step)Epoch 1 | iter 5 step 2 | loss train: 1.640, val: n/a | iter time: 260.75 msEpoch 1 | iter 6 step 3 | loss train: 1.703, val: n/a | iter time: 568.22 ms (step)Epoch 1 | iter 7 step 3 | loss train: 1.678, val: n/a | iter time: 511.70 msEpoch 1 | iter 8 step 4 | loss train: 1.741, val: n/a | iter time: 514.14 ms (step)Epoch 1 | iter 9 step 4 | loss train: 1.689, val: n/a | iter time: 423.59 msEpoch 1 | iter 10 step 5 | loss train: 1.524, val: n/a | iter time: 603.03 ms (step)Training time: 11.20sMemory used: 13.90 GBSaving LoRA weights to 'out/finetune/lora-phi-2/final/lit_model.pth.lora'Saved merged weights to 'out/finetune/lora-phi-2/final/lit_model.pth'">{'checkpoint_dir': PosixPath('checkpoints/microsoft/phi-2'), # TODO 'data': Alpaca2k(mask_prompt=False, val_split_fraction=0.03847, prompt_style=, ignore_index=-100, seed=42, num_workers=4, download_dir=PosixPath('data/alpaca2k')), 'devices': 1, 'eval': EvalArgs(interval=100, max_new_tokens=100, max_iters=100), 'logger_name': 'csv', 'lora_alpha': 16, 'lora_dropout': 0.05, 'lora_head': True, 'lora_key': True, 'lora_mlp': True, 'lora_projection': True, 'lora_query': True, 'lora_r': 8, 'lora_value': True, 'num_nodes': 1, 'out_dir': PosixPath('out/finetune/lora-phi-2'), 'precision': 'bf16-true', 'quantize':
2025-04-04Words Fine Tune ElevenLabs AI Voice OpenAI AI Voice GCP AI Voice Ms Azure AI Voice AWS AI Voice GPT 4o AI Model GPT 4o mini AI Model Gemini 1.5 Pro AI Model 140+ Accents & Languages 900+ Kind of Voices AI Voice Cloning Sound Studio AI ChatBots Feature Up to Pro TemplatesAI AI Rewriter Smart Editor Brand Voice Multiple Files Supported Email & Chat Support AI Web Chat Feature AI Article Wizard Finetune AI ChatBots Finetune AI Templates Lifetime Deal – UNLIMITED Pay Once, Use ForeverNo Reccuring CostNo Hidden Fees AI Text to Speech AI Speech to Text AI Writing Tools AI Image Generation AI Fine Tune Model Unlimited Char. Monthly TTS Unlimited Words Monthly AI Unlimited Minutes Monthly STT 500 Images Monthly Stable.D 1 000 000 words Fine Tune ElevenLabs AI Voices OpenAI AI Voices GCP AI Voices Ms Azure AI Voices AWS AI Voices GPT 4o AI Model GPT 4o Mini AI Model Gemini 1.5 Pro AI Model 140+ Accents & Languages 900+ Kind of Voices AI Voice Cloning Sound Studio AI ChatBots Feature All TemplatesAI Finetune AI ChatBots AI Rewriter Finetune AI Templates Smart Editor Brand Voice Multiple Files Supported Email & Chat Support AI Web Chat Feature AI Article Wizard * Audio length estimations are based on the average English speaking rate and character count. The actual audio length may vary based on the specific settings and content you choose within Textalky. Frequently asked questions Textalky is an innovative AI text-to-speech software that turns any text or script into lifelike natural human voices in just 3 easy steps. It's designed to cater to various needs such as e-learning, marketing, podcasts, and video creation. Using Textalky is simple:a. Upload or paste your text.b. Choose the desired voice & language from our vast selection.c. Click 'Listen,' and your text will be transformed into lifelike audio. Content creators, educators, marketers, podcasters, YouTubers, and anyone who needs to convert text to speech can benefit from Textalky's user-friendly and high-quality service. Textalky offers a wide range of voices in various languages and accents, catering to a global audience. Explore our platform to find the perfect match for your content. Yes, Textalky prioritizes user privacy and security. All text conversions are handled with the utmost confidentiality, following strict data protection guidelines. Absolutely! Textalky is suitable for commercial projects, including advertising, product promotion, and more. Our high-quality AI voices give your content a professional edge. Our dedicated support team is available to assist you with any questions or issues related to Textalky. Feel free to reach out through our 'Contact Us' page, and we'll be glad to help. You can experience the power of Textalky's AI-driven text-to-speech by visiting our website and create a free account. Discover how Textalky can revolutionize your content creation today! 6 minutes 4 minutes 3 minutes 8 minutes Start creating a custom voice for your brand today
2025-04-18Target domain data. This process involves adjusting the model parameters using a smaller dataset relevant to the desired domain, which enables the model to learn domain-specific knowledge and vocabulary.However, as LLMs are "large," updating multiple layers in a transformer model can be very expensive, so researchers started developing parameter-efficient alternatives.In this article, we discussed several parameter-efficient alternatives to the conventional LLM finetuning mechanism. In particular, we discussed how to insert and finetune additional adapter layers to improve the predictive performance of an LLM compared to training the original model parameters.Below are additional experiments where I implemented the adapter method and ran a comparison to finetune a DistilBERT model for sentiment classification:finetuning only the last two layers as a performance baseline;inserting and finetuning adapter layers;finetuning all layers of the original model;inserting adapter layers and finetuning all layers as a control experiment.All code examples are available here on GitHub. As a thanks to those who supported the newsletter in the previous months, I included a bonus section below discussing the code examples. Thanks again for your support!First, let's establish a performance baseline by only finetuning the last layers of a DistilBERT model on a movie review dataset. Here, we will only look at the relevant lines of code, omitting the non-finetuning specific code for brevity. However, as mentioned above, the full code examples are available here.First, after loading the pretrained DistilBERT model, let's look at the architecture:For this performance baseline, we only finetune the last two layers, which comprise 592,130 parameters. The simplest way to do that is to freeze all parameters and then unfreeze the last two layers via the code below:# Freeze all layersfor param in model.parameters(): param.requires_grad = False # Unfreeze the two output layersfor param in model.pre_classifier.parameters(): param.requires_grad = Truefor param in model.classifier.parameters(): param.requires_grad = TrueThen, after training this model for 3 epochs, we get the following results:Training time: 2.89 minTraining accuracy: 86.7%Validation accuracy: 87.2%Test accuracy: 86.4%Next, let's add the adapter layers to the model. Notice that DistilBERT has 6 transformer blocks. As discussed earlier, the adapter method inserts 2 adapter modules into each of the 6 transformer
2025-04-11