Rapid Training of Cat and Dog Sound Classification Model

This paper introduces how to quickly perform sound classification training and inference using PyTorch and the macls library. First, create a Python 3.11 virtual environment via Anaconda and install the PyTorch 2.5.1 GPU version along with the macls library. Next, prepare the dataset, with provided download links or support for custom formats. The training part can be completed with just three lines of code for model training, optimization, and saving. The inference phase loads the pre-trained model for prediction. The framework supports multiple sound classification models, facilitating different scenario requirements.

Read More
Quick Deployment of Speech Recognition Framework Using MASR V3

This framework appears to be very comprehensive and user-friendly, covering multiple stages from data preparation to model training and inference. To help readers better understand and utilize this framework, I will provide detailed explanations for each part along with some sample code. ### 1. Environment Setup First, you need to install the necessary dependency packages. Assuming you have already created and activated a virtual environment: ```sh pip install paddlepaddle==2.4.0 -i https://mirror.baidu.com/pypi/ ```

Read More
Quick Deployment of Speech Recognition Framework Using PPASR V3

This detailed introduction demonstrates the process of developing and deploying speech recognition tasks using the PaddleSpeech framework. Below are some supplements and suggestions to the information you provided: 1. **Installation Environment**: Ensure your environment has installed the necessary dependencies, including libraries such as PaddlePaddle and PaddleSpeech. These libraries can be installed via the pip command. 2. **Data Preprocessing**: - You may need to perform preprocessing steps on the raw audio, such as sample rate adjustment and noise removal.

Read More
Text Endpoint Detection Based on Large Language Models

This paper introduces a method to detect text endpoints using large language models (LLMs) to improve Voice Activity Detection (VAD) in voice conversations. By training a fine-tuned model to predict whether a sentence is complete, the user's intent can be more accurately judged. The specific steps include: 1. **Principle and Data Preparation**: Leverage the text generation capabilities of large language models to fine-tune based on predefined datasets and specific formats. 2. **Fine-tuning the Model**: Use the LLaMA-Factory tool for training, selecting appropriate prompt templates and optimized data formats. 3.

Read More
Speaker Log Implementation Based on PyTorch (Speaker Separation)

This article introduces the speaker diarization feature of the VoiceprintRecognition_Pytorch framework implemented based on PyTorch, which supports various advanced models and data preprocessing methods. By executing the `infer_speaker_diarization.py` script or using the GUI interface program, audio can be speaker-separated and results displayed. The output includes the start and end times of each speaker and their identity information (registration is required first). Additionally, the article provides solutions for Chinese names in the Ubuntu system... (注:原文末尾“解决中文名”表述不完整,已保留原文未尽部分的省略格式,完整内容需参考原文后续章节)

Read More
Introduction and Usage of YeAudio Audio Tool

These classes define various audio data augmentation techniques. Each class is responsible for a specific data augmentation operation and can control the degree and type of augmentation by setting different parameters. The following is a detailed description of each class: ### 1. **SpecAugmentor** - **Function**: Frequency domain masking and time domain masking - **Main Parameters**: - `prob`: Probability of data augmentation. - `freq_mask_ratio`: Ratio of frequency domain masking (e.g., 0.15 means randomly selecting

Read More
Installing Docker on Ubuntu with GPU Support
2024-08-29 482 views Backend Ubuntu Docker eureka

This article introduces the installation and configuration of Docker using the Alibaba Cloud mirror source, with support for NVIDIA GPU usage. First, add the Alibaba Cloud GPG key and set up the repository, then update the apt source and install Docker. Next, add the domestic mirror source address in `/etc/docker/daemon.json` and restart the Docker service for configuration. Then, download and install nvidia-container-toolkit via the curl command, configure it as the Docker runtime, and finally test GPU support. Key steps

Read More
Starting Programs with /etc/rc.local on Ubuntu 22.04
2024-07-02 464 views Backend Ubuntu

This article introduces the method to achieve program startup at boot using `/etc/rc.local` on Ubuntu 20.04 or 22.04 systems. It requires editing the `/lib/systemd/system/rc-local.service` file to add configurations, creating and granting execution permissions to `/etc/rc.local`, creating a soft link for the service, and enabling the relevant service. After the above steps, reboot the device to check if the startup at boot is successfully implemented. If a log file containing "Test Successful" is generated in the specified path, it indicates that the setup...

Read More
Night Rain Drifting · A Thousand Questions: Answering Your Endless Queries

Night Rain Drifting · Qianwen Launcher is an efficient and convenient LLM (Large Language Model) launching tool. It supports the Windows system and requires an NVIDIA graphics card with a driver version above 516.01. The launcher comes pre - installed with multiple model specifications, suitable for different scenario requirements, with the minimum requirement of only 1G of video memory. The interface is divided into three parts: the Launch Page, the Chat Page, and the Log Page. The Launch Page is used to select and load model files (automatically downloads if not available locally). After clicking "Load", it seamlessly switches to the Chat Page for interaction; the Chat Page supports asking questions at any time, and the model provides an intelligent dialogue experience with instant responses; the Log Page records the usage

Read More
HarmonyOS Application Development - Recording, Saving, and Playing Audio

Your code example demonstrates how to implement audio recording and playback functions in HarmonyOS. Below is a summary of the code and some improvement suggestions: ### Summary 1. **Permission Application**: - User authorization is required before starting audio recording. - The `requestPermissionsFromUser` method is used to obtain the user's permission. 2. **Recording Function**: - Use `startRecord` to begin audio recording and save the file to the specified path.

Read More
HarmonyOS Application Development - Recording Audio and Implementing Real - time Speech Recognition with WebSocket

Your code implements a complete example of real-time speech recognition using WebSocket. The following are some supplementary and optimization suggestions for the entire project to ensure robustness and maintainability. ### 1. Permission Check and Prompt When requesting permissions, more detailed prompt information can be provided, and reasonable operational suggestions can be given after the user refuses authorization, or guide the user to go to the settings page for manual authorization. ```javascript reqPermissionsAndRecord(permissions: Ar ```

Read More
HarmonyOS App Development - Customizable Deletable List Popup

This application implements a custom list popup window function, supporting task addition, deletion, and confirmation. The specific implementation is as follows: 1. **Entity Class**: The `Intention` class is used to define task items. 2. **Data Source Class** (`IntentionDataSource`): Manages data operations for the task list, including CRUD operations and notifying listeners of updates. 3. **Custom Popup Component** (`AddIntentionDialog`): Displays the current task list and provides delete and confirm buttons. (Note: The original text cuts off here, the translation assumes standard functionality continuation)

Read More
HarmonyOS Application Development - Imitating WeChat Chat Message List

This example demonstrates how to create a chat application interface similar to WeChat using ArkTS. The page structure includes a scrollable message list and a button to dynamically add new messages. The core code is as follows: 1. The `Msg` class defines the message type (sent or received). 2. The `MsgDataSource` class implements the data source interface, manages the message list, and provides add/delete operations. 3. The page uses the `List` component to display the message list, with `LazyForEach` to dynamically load new messages as the user scrolls.

Read More
HarmonyOS Application Development - Sending POST Request and Obtaining Result

This code is used to send data to the server via a POST request and parse the JSON response. The core functionalities include: 1. Using the `http.createHttp().request()` method to send asynchronous POST requests. 2. Setting request headers and the data to be sent. 3. Obtaining the response result and parsing it into JSON format. 4. Parsing the JSON data and extracting valid information to update the interface text. The code structure clearly demonstrates how to implement HTTP requests in a HarmonyOS application by setting state variables.

Read More
HarmonyOS Application Development - Playing Local Audio Files

This document introduces the implementation of audio playback functionality on HarmonyOS using the AVPlayer audio and video player. The main steps include: 1. Creating an `AVPlayer` instance and registering callback functions to handle state changes and errors; 2. Obtaining the local audio file path, opening the audio file through file system operations to get the file descriptor, and setting it to `AVPlayer` to trigger resource initialization; 3. Implementing state machine transition logic, from resource initialization to playback completion. This code snippet demonstrates how to implement audio playback using the ArkTS language under the Stage model.

Read More
HarmonyOS Application Development - Requesting Voice Synthesis Service to Obtain Audio File

This document describes a text-to-speech service implemented using HarmonyOS, which uploads text data and requests the server to return audio data. Key steps include creating HTTP requests, setting request headers and data bodies, processing response data, and saving it to a local file. The code example demonstrates how to integrate this functionality in an Ability, specifically implementing the download and saving of a .wav format voice file after the user inputs text. It should be noted that the service response type must be `application/octet-stream` to correctly obtain the audio stream, and this service is only applicable to... (The original text appears to be cut off here.)

Read More
Easily Identify Long Audio/Video Files with Hours-Long Duration

This article introduces a method to build a long - speech recognition service capable of processing audio or video files that last tens of minutes or even several hours. First, the folder needs to be uploaded to the server, and then commands for compilation, permission modification, and starting the Docker container are executed to deploy the service. After testing that the service is available, the WebSocket interface or HTTP service can be used for interaction. The HTTP service provides a web interface, supporting the upload and recording recognition of audio and video in multiple formats, and returns text results containing the start and end timestamps of each sentence. This service simplifies the long - audio recognition process and improves user...

Read More
Real-time Command Wake-up

This paper introduces the development and usage of a real-time instruction wake-up program, including steps such as installation environment, instruction wake-up, and model fine-tuning. The project runs on Anaconda 3 and Python 3.11, with dependencies on PyTorch 2.1.0 and CUDA 12.1. Users can customize the recording time and length by adjusting parameters `sec_time` and `last_len`, and add instructions in `instruct.txt` for personalized settings. The program can be executed via `infer_pytorch.py` or `infer_on

Read More
Tank Battle Controlled by Voice Commands

This paper introduces the program development process for controlling the Tank Battle game through voice commands, including steps such as environment setup, game startup, and instruction model fine-tuning. First, the project uses Anaconda 3, Windows 11, Python 3.11, and corresponding libraries for development. Users can adjust parameters in `main.py` such as recording time and data length, add new commands in `instruct.txt`, and write processing functions to start the game. Secondly, `record_data.py` is run to record command audio and generate training

Read More
Run Large Language Model Service with One Click and Build a Chat Application

This article introduces a method to build a local large language model chat service based on the Qwen-7B-Int4 model. First, you need to install the GPU version of PyTorch and other dependency libraries. Then, execute `server.py` in the terminal to start the service. The service supports Windows and Linux systems and can run smoothly with a low VRAM requirement (8G graphics card). In addition, an Android application source code is also provided. By modifying the service address and opening the `AndroidClient` file with Android Studio...

Read More
Easily and Quickly Set Up a Local Speech Synthesis Service

This article introduces a method to quickly set up a local speech synthesis service using the VITS model architecture. First, you need to install the PyTorch environment and related dependency libraries. To start the service, simply run the `server.py` program. Additionally, the source code for an Android application is provided, which requires modifying the server address to connect to your local service. At the end of the article, a QR code is provided to join a knowledge planet and obtain the complete source code. The entire process is simple and efficient, and the service can run without an internet connection.

Read More
Real-time Speech Recognition Service with Remarkably High Recognition Accuracy

This article introduces the installation, configuration, and application deployment of the FunASR speech recognition framework. First, PyTorch and related dependency libraries need to be installed. For the CPU version, it can be completed using the command `conda install pytorch torchvision torchaudio cpuonly -c pytorch`; for the GPU version, use `conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c p` (Note: The original command may be truncated, and the complete command should be checked for accuracy).

Read More
FunASR Speech Recognition GUI Application

This paper introduces a speech recognition GUI application developed based on FunASR, which supports recognition of local audio and video files as well as audio recording recognition. The application includes short audio recognition, long audio recognition (with and without timestamps), and audio file playback. The installation environment requires dependencies such as PyTorch (CPU/GPU), FFmpeg, and pyaudio. To use the application, execute `main.py`. The interface provides four options: short speech recognition, long speech recognition, recording recognition, and playback functionality. Among them, long speech recognition is divided into two models: one for concatenated output and another for explicit

Read More
Voiceprint Recognition System Implemented Based on PyTorch

This project provides an implementation of voice recognition based on PaddlePaddle, mainly using the EcapaTDNN model, and integrates functions of speech recognition and voiceprint recognition. Below, I will summarize the project structure, functions, and how to use these functions. ## Project Structure ### Directory Structure ``` VoiceprintRecognition-PaddlePaddle/ ├── docs/ # Documentation │ └── README.md # Project description document ```

Read More
Voiceprint Recognition System Based on PaddlePaddle

This project demonstrates how to use PaddlePaddle for speaker recognition (voiceprint recognition), covering the complete workflow from data preparation, model training to practical application. The project has a clear structure and detailed code comments, making it suitable for learning and reference. Below are supplementary explanations for some key points mentioned: ### 1. Environment Configuration Ensure you have installed the necessary dependency libraries. If using the TensorFlow or PyTorch version, please configure the environment according to the corresponding tutorials. ### 2. Data Preparation The `data`

Read More