SPHINX2 VICIDIAL NA CLASSIFICATION README                            2005-01-17

This file outlines how to use Sphinx 2 in combination with recordings of ring-time in VICIDIAL NA calls(No-Answers). No-Answers are the combined Busy, Ring-no-answer, Fast_busy, Invalid, Disconnected and Privacy Manager calls that do not send an Answer signal to Asterisk.

In place of doing signal processing during the ring-time of a call to be able to tell what kind of NA that call is, we will record the call in GSM format and place it in the /var/spool/asterisk/monitor/ORIG/VDAD directory to be processed after-hours.

These scripts will analyze and categorize about 95% or better of the NA calls occuring on a VICIDIAL server in one day to the UK or USA.


Install sphinx2 AND sphinx3 if you do not have them installed already:
wget http://internap.dl.sourceforge.net/sourceforge/cmusphinx/sphinx2-0.6.tar.gz
gunzip sphinx2-0.6.tar.gz
tar xvf sphinx2-0.6.tar
cd sphinx2-0.6
./configure
make
make install
wget http://internap.dl.sourceforge.net/sourceforge/cmusphinx/sphinx3-0.5.tar.gz
gunzip sphinx3-0.5.tar.gz
tar xvf sphinx3-0.5.tar
cd sphinx3-0.5
./configure
make
make install


upload the 4426 folder into the models directory for sphinx:
/usr/local/sphinx2-0.5/model/lm/


To set all of this up you need to create a few directories on your Asterisk/VICIDIAL server:
mkdir /home/cron/RESULTS
mkdir /home/cron/RESULTS/DONE
mkdir /var/spool/asterisk/monitor/ORIG/VDAD
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK/B
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK/CN
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK/DC
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK/FB
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK/FB2
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK/INV
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK/N
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK/N2
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK/O
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK/S
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/USA
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/USA/B
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/USA/CN
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/USA/DC
mkdir /var/spool/asterisk/monitor/ORIG/VDAD/GO/USA/N


You also need to place all sphinxNA perl files in the /home/cron directory and chmod them to 0755




Here's an explanation of how the while process works and what else you need to setup:


STEP ONE:
To enable recording of the ring-time of when a lead is called the first time in vicidial, simply edit the AST_VDauto-dial.pl script here:
$RECcount=''; ### leave blank for no REC count
$RECprefix='7'; ### leave blank for no REC prefix

change the $RECcount to 1 (record if called_count of the lead is less than $RECcount)

And then in your extensions.conf you will need to add new entries with the 7 prefixed in to allow recording of ring-time:
; dial a long distance outbound number
; record call
exten => _971NXXNXXXXXX,1,AGI(call_log.agi,${EXTEN})
exten => _971NXXNXXXXXX,2,Monitor(gsm|/var/spool/asterisk/monitor/ORIG/VDAD/${CALLERIDNAME})
exten => _971NXXNXXXXXX,3,Dial(${TRUNK}/${EXTEN:2},50,o)
exten => _971NXXNXXXXXX,4,StopMonitor
exten => _971NXXNXXXXXX,5,Hangup
; do not record call
exten => _91NXXNXXXXXX,1,AGI(call_log.agi,${EXTEN})
exten => _91NXXNXXXXXX,2,Dial(${TRUNK}/${EXTEN:1},50,o)
exten => _91NXXNXXXXXX,3,Hangup

This will allow both ring-recorded and non-ring-recorded calls to go through just fine. Recorded ring-time will be named the callerid or uniqueid of the call which contains the lead_id as the last 9 digits.



STEP TWO:
After hours, a script must be run to delete the recordings that are not of NA status - VDAD_file_clean.pl. You will need a crontab entry for this script(in fact I'll just list all crontab entries needed for sphinx2 analysis right here):

### clean out the ORIG/VDAD recordings for NA dispos - 10:01 PM
1 22 * * * /home/cron/VDAD_file_clean.pl -q

### disposition the UK calls and convert the USA calls to wav files for sphinx processing - 10:09 PM
9 22 * * * /home/cron/VDAD_dispo_NAs.pl --debug > /home/cron/LOGS/dispo_NAs.txt

### run the sphinx2 batch processing on level 1 for all files that are NA - 12:21 AM
21 0 * * * /home/cron/sphinx2_pltest.pl 4426 /var/spool/asterisk/monitor/ORIG/VDAD/GO/USA/ 1 100000 > /home/cron/LOGS/sphinx_1st_run.txt

### run the analysis of the first sphinx2 batch processing results - 05:12 AM
12 5 * * * /home/cron/VDAD_IN_response_sphinx.pl

### run the sphinx2 batch processing on level 4 for the files that didn't get recognized in the first pass - 05:15 AM
15 5 * * * /home/cron/sphinx2_pltest.pl 4426 /var/spool/asterisk/monitor/ORIG/VDAD/GO/USA/ 4 100000 > /home/cron/LOGS/sphinx_2nd_run.txt

### run the analysis of the second sphinx2 batch processing results - 08:55 AM
55 8 * * * /home/cron/VDAD_IN_response_sphinx.pl

### remove all recordings - 08:56 AM
56 8 * * * /usr/bin/find /var/spool/asterisk/monitor/ORIG/VDAD/GO/USA/ -maxdepth 2 -name "V*" -print | xargs rm -f
56 8 * * * /usr/bin/find /var/spool/asterisk/monitor/ORIG/VDAD/GO/UK/ -maxdepth 2 -name "V*" -print | xargs rm -f


For the VDAD_dispo_NAs.pl script you will need to have the Audio::Data perl module installed on your system. It is OK to force install this perl module as it does not require hardware compatibility to run for our purposes. Also, SoX must be installed as well, but this is a required package for VICIDIAL so you should already have it on the system.



STEP THREE:
After hours, a script must be run to figure out whether the recording is UK or USA recording, and use Audio::Data to categorize the UK calls and convert the USA calls to the proper WAV file format for processing by Sphinx2 - VDAD_dispo_NAs.pl. You will need a crontab entry for this script(see step two above)



STEP FOUR:
A batch process is run through sphinx to process the audio in the ring-time recordings. The result is a text file with all processed calls in it, one call per line. After sphinx processing, the response file is read by a perl script that grades the results and dispositions them accordingly. All that are not dispositioned are run through a second round of more intensive sphinx processing and the resulting response file is again graded and dispositioned eith the perl script- VDAD_IN_response_sphinx.pl



STEP FIVE:
All voice processing is terminated when all audio files are erased before the morning shift starts.

