Photo by

Transcribing audio with One AI

Author
Olga Miroshnyk
·
Oct 18, 2022
·
3 min read

Looking for a high-quality automatic audio transcription service? Try out One AI’s Smart Transcription, with powerful text-to-speech, Proofreading, Summary, and much more.

Whether your application involves customer support calls, interviews, meetings, brainstorming sessions, or podcasts, you can have them transcribed with high accuracy in just several minutes. And with a combination of Language Skills like proofreading or summarization, you could go even further! Now they can become customer feedback reports, automatic subtitles, doctor's prescriptions, and so on. 

How can I try it?

Here is a nice example of how you can easily get all the necessary info from a video. Let's use this video from a TED talk. I used this website to download it. You can pick to download video or just audio, let’s move with the audio. 

Now open One AI studio. On the bottom-right of the screen, you can find our Language Skills, represented by a bunch of rectangles. Each Skill can be dragged and dropped onto the Skills pipeline box, above the library.

Upload your audio file in .mp3 or .wav format. You’ll notice that the “Transcribing Audio” Language Skill is already picked. Now just press “Run the Pipeline” and after a few moments, you can see the results you can work with.

Let’s print the text and check what we got:

```

print(transcription.text)

```

[00:00:16.430] speaker 1: First, a video. [00:00:25.010] speaker 1: Yes, it is a scrambled egg. [00:00:29.570] speaker 1: But as you look at it, I hope you'll begin to feel just slightly uneasy. [00:00:37.070] speaker 1: Because you may notice that what's actually happening is that the egg is unscrambling itself. [00:00:42.350] speaker 1: And you will now see the yolk and the white of separated, and now they're going to be poured back into the egg. [00:00:48.350] speaker 1: And we all know in our heart of hearts that this is not the way the universe works. [00:00:54.890] speaker 1: A scrambled egg is mush, tasty mush, but it's marsh. [00:00:58.010] speaker 1: An egg is a beautiful, sophisticated thing that can create even more sophisticated things such as chickens. [00:01:04.490] speaker 1: And we know in our heart of hearts that the universe does not travel from mush to complexity. [00:01:10.910] speaker 1: In fact, this gut instinct is reflected in one of the most fundamental laws of physics, the second law of thermodynamics, or the law of entropy. [00:01:19.430] speaker 1: What that says basically is that the general tendency of the universe is to move from order and structure to lack of order, lack of structure, in fact, to mush. [00:01:32.090] speaker 1: And that's why that video feels a bit strange. [00:01:35.930] speaker 1: And yet, look around us. [00:01:39.710] speaker 1: What we see around us is staggering complexity. York City alone, there are some 10 billion SKUs or distinct commodities being traded. ....

How can I get it to work with my code?

At this point, you are just two steps away from inserting generated code into your project.

Step1: Pick one of the generated code you need to your code editor (click here to generate your own API key):

Step 2: Run “pip install oneai” for Python SDK or “npm install oneai” for Node.js SDK to get the library. Make sure to import all the required packages:

```

import oneai

import base64

oneai.api_key = "[YOUR ONEAI API KEY]"

```

That’s it, you’re good to go! Run your code and see the results.

```

with open("AudioFile.mp3", "rb") as f:

    

    pipeline = oneai.Pipeline(

      steps = [

            oneai.skills.Transcribe()

      ]

    )

    transcription = pipeline.run(f)

```

Using Additional Language skills on recognized code.

Now, when we have transcription, we can use any other Skill. Let's say we want to get the topics discussed in the audio file. Just drag and drop the “Topics'' Skill and press “Run Pipeline”:

```

pipeline = oneai.Pipeline(

    steps = [

        oneai.skills.Topics(),

    ]

)

Topics = pipeline.run(transcription.text)

print(Topics.data[0].values)

```

The topics we got :

['Diversity', 'Complexity', 'Earth', 'Cloud', 'DNA', 'Energy', 'Universe']

What if we want to change the title of the video? Yeah, just drop the “Headlines” skill:

```

text = transcription.text

pipeline = oneai.Pipeline(

  steps = [

        oneai.skills.Headline()

  ]

)

Headline = pipeline.run(transcription.text)

print(Headline.data[0].values)

```

The new headline we got was: “The universe is not mush, but it's marsh“. Well, that’s an interesting thought 😊

As in our case with the historical context, we could extract all the famous names and dates to include in the report using only the following skills "Names" and "Numbers&Time". Of course, you don't need to run each skill individually. You can use all the skills together in a single pipeline like so:

```

with open("AudioFile.mp3", "rb") as f:

    

    pipeline = oneai.Pipeline(

      steps = [

            oneai.skills.Transcribe(),

            oneai.skills.Topics(),

            oneai.skills.Headline(),

            oneai.skills.Names(),

            oneai.skills.Numbers(),

          

      ]

    )

    output = pipeline.run(f)

```

And the outputs will be: 

Topics:

`print(Topics.data[0].values)`

`['Energy', 'Earth', 'Complexity', 'Cloud', 'Diversity', 'Universe', 'Dna']`

Headline:

`print(Headline.data[0].values)`

`["The universe is not mush, but it's marsh"]`

Names:

`[[x.name,x.value,x.timestamp] for x in  output.transcription.names]`

[['PERSON', 'Eric Bayern hotter', datetime.timedelta(seconds=103, microseconds=910000)], ['LOCATION', 'New York City', datetime.timedelta(seconds=105, microseconds=950000)], ['PERSON', 'Fred', datetime.timedelta(seconds=157, microseconds=10000)], ['PRODUCT', 'Wilkinson Microwave Anisotropy Probe', datetime.timedelta(seconds=349, microseconds=610000)], ['EVENT', 'Big Bang', datetime.timedelta(seconds=411, microseconds=530000)], ['LOCATION', 'Yucatán Peninsula', datetime.timedelta(seconds=724, microseconds=850000)], ['LOCATION', 'Savanna', datetime.timedelta(seconds=840, microseconds=230000)], ['LOCATION', 'Africa', datetime.timedelta(seconds=841, microseconds=70000)], ['LOCATION', 'Siberia', datetime.timedelta(seconds=849, microseconds=410000)], ['LOCATION', 'Americas', datetime.timedelta(seconds=852, microseconds=50000)], ['LOCATION', 'Australia (continent)', datetime.timedelta(seconds=852, microseconds=710000)], ['LOCATION', 'England', datetime.timedelta(seconds=950, microseconds=510000)], ['GROUPS', 'Cuba', datetime.timedelta(seconds=951, microseconds=470000)], ['LOCATION', 'Goldilocks', datetime.timedelta(seconds=973, microseconds=970000)], ['PERSON', 'Daniel', datetime.timedelta(seconds=1002, microseconds=170000)], ['PERSON', 'Daniel', datetime.timedelta(seconds=1033, microseconds=610000)]]

Numbers and Time:

`[x.name,x.value] for x in  output.transcription.numbers]`

[['ORDINAL', '1'], ['DATE', '2023-05'], ['DATE', '2022-10-12'], ['DATE', '2022-10-12'], ['NUMBER', '1'], ['ORDINAL', '2'], ['QUANTITY', '10,000,000,000'], ['NUMBER', '100'], ['NUMBER', '7,000,000,000'], ['ORDINAL', '2'], ['DATE', '2022-10-12'], ['ORDINAL', '2'], ['DATE', '2023-05-01'], ['DURATION', '1000000000 Years'], ['ORDINAL', '1'], ['ORDINAL', '1'], ['ORDINAL', '1'], ['ORDINAL', '2'], ['ORDINAL', '1'], ['ORDINAL', '2'], ['DATE', '2022-10-12'], ['DURATION', '380000 Years'], ['DATE', '2022-10-12'], ['DATE', '2022-10-12'], ['NUMBER', '1,000,000,000'], ['QUANTITY', '10,000,000'], ['ORDINAL', '1'], ['NUMBER', '1,000,000,000'], ['DATE', '2022-10-12'], ['NUMBER', '2'], ['DATE', '2022-10-12'], ['ORDINAL', '4'], ['DATE', '2022-10-12'], ['DATE', '2022-10-12'], ['ORDINAL', '1'], ['DATE', '2022-10-12'], ['NUMBER', '1,000,000,000'], ['DATE', '2022-10-12'], ['DURATION', '1000000000 Years'], ['DATE', '1522-10-12'], ['DATE', '2022-10-12'], ['NUMBER', '7,000,000,000'], ['DATE', '1822'], ['DURATION', '1000000000 Years'], ['DURATION', '1 Day'], ['DATE', '2022-10-12']]

Conclusion

No matter if you work for a large company or a small startup, you can now extract all the information you need from audio in a few minutes.

To get started, visit our Language Studio. Share your own ideas with the community.