Last Updated: Sep '22
Transcription Language Skill
- The Transcription Skill converts audio to text, which can be then analyzed and transformed by other Language Skills
- Labels extracted from transcriptions hold timestamps of the labeled words in the audio input
- Automated diarization- different speakers are identified and exchanges are separated into different sections
- The pipeline API supports .wav & .mp3 file inputs. The Transcription Skill is required to process audio files.
- Use the async endpoint (details below) to process big audio files asynchronously
```const pipeline = new oneai.Pipeline(
const output = await pipeline.runFile('my-conversation.mp3');
console.log(output.transcription.text); // transcribed text
console.log(output.transcription.emotions); // emotions that appear in the conversation
console.log(output.transcription.summary.text); // summary of the conversation```
New & Improved Language Skills
- Generate an appropriate headline for your input text.
- Generate a subheading for your input text.
- The Summary Skill now generates `origin` labels, mapping words in the summary to their position in the source input (string indices for text inputs, timestamps for audio inputs).
- Skill is now configurable, providing control over how many sections the input is split into.
- Combine with the Headline / Subheadings Skills to generate headline per section
- Numbers and quantities that appear in text form are interpreted & provided as numeric data
- Dates, times & time-durations in various formats are provided in a standardized date string
- Added a `datetime_baseline` parameter to control the current time dates & times should use as a baseline
- The Names Skill detects when names that appear in text reference real-world entities (people, companies, locations, etc.) based on context.
- `name` labels are enriched with entity data when known entities are recognized.
- Automatically identify the dominant language used in the input and help with further processing and translation.
Analytics API (beta)
- The analytics engine makes large amounts of text data digestible, by clustering together texts with similar meaning and accumulating metadata generated by Language Skills (such as Sentiment & Topics).
- The API accepts text items and organizes them in hierarchical clusters by meaning.
- Generated clusters can be fetched from the API or reviewed at the Analytics section of the Studio (the UI is open-source)
- Cluster collections can be queried with specific text items, fetching the cluster with the most similar meaning, making the analytics API an effective tool for intent based classification and search.
- Text items can be enriched with metadata generated by Language Skills. Clusters then aggregate item metadata to derive insights at scale.
- Quickstart guide - docs.oneai.com/docs/quick-start-analytics
- The async API endpoints introduce asynchronous processing of large text inputs and audio files.
- The upload request of the input and the response with the output are split into separate endpoints, so outputs can be retrieved without waiting for the entire processing time.
- Use the `/async`endpoint to process raw text
- Use the `/async/files` endpoint to process binary encoded files
- Successful async requests will return a `task_id` parameter that was assigned to the input.
- Task status and outputs can be fetched via polling from the `/async/tasks` endpoint