Categories
Uncategorized

Chatbot implementation

Categories
Uncategorized

Semi Primes

A composite is a number containing at least two prime factors. For example, 15 = 3 × 5; 9 = 3 × 3; 12 = 2 × 2 × 3.

There are ten composites below thirty containing precisely two, not necessarily distinct, prime factors: 4, 6, 9, 10, 14, 15, 21, 22, 25, 26.

How many composite integers, n < 10^8, have precisely two, not necessarily distinct, prime factors?

subprime = []
def primefactor(num):
    primefactors = []
    while num % 2 == 0:
        primefactors.append(2)
        num = num/2
        
    for n in range(3,math.ceil(math.sqrt(num))+1,2):
        while num % n==0:
            primefactors.append(n)
            num = num/n
    if num >2:
        primefactors.append(num)

    return primefactors
        

for num in range(3,100000000):
    quote = []
    fact = []
    #fact = factors(num)
    fact = primefactor(num)
    if len(fact) == 2:
        subprime.append(num)
        
print(len(subprime))

Output: 17427258

Categories
Uncategorized

Cosine Similarity

In this article, using a small example, we will see why and where we use cosine similarity and how Cosine similarity works.

Well, going as per the definition of Cosine similarity, it is the measure of similarity between two non-zero vectors (i.e. it shows us how similar the two vectors are). We use Cosine similarity in many applications. I used it in one of my Recommendation systems and Spelling checking applications.
Why do we use Cosine for checking similarity, why not Sine or other functions?
Well, that depends on the angle of the function. For example, if we have cosine similarity as 1, that means they are same, which shows the angle between those two vectors is 0. i.e. only in cosine function we get 1 when the angle is 0. Similarly, if we have cosine similarity as 0, that means there is no similarity. The smaller the angle, higher the cosine similarity.
As per Wikipedia below is the cosine similarity

{\displaystyle {\text{similarity}}=\cos(\theta )={\mathbf {A} \cdot \mathbf {B}  \over \|\mathbf {A} \|\|\mathbf {B} \|}={\frac {\sum \limits _{i=1}^{n}{A_{i}B_{i}}}{{\sqrt {\sum \limits _{i=1}^{n}{A_{i}^{2}}}}{\sqrt {\sum \limits _{i=1}^{n}{B_{i}^{2}}}}}},}

lets take

A = [1,1,0]

B = [1,0,1]

cos Ɵ = 

Untitled1

= 1/2

cos Ɵ = 1/2 which means  Ɵ = 60.

Categories
Uncategorized

Converting geojson to zipcodes using python

For one of the projects we got multipolygon geojson file, where we have multiple latitutes/longitudes for a single location. we need to extract those latitudes and logitudes and convert them to zipcodes. we used python for doing this.

Geojson as the name suggests the geo locations are represented in the json format. These geo locations are represented in different formats as json. they are:

  1. points: Addresses and locations,
  2. line strings: streets, highways and boundaries,
  3. Polygons: countries, provinces, tracts of land,
  4. Multi Geometries: multi-part collections of these types.

The file for multi geometries show as below for a single location.

{
“type”: “FeatureCollection”,
“properties”: {
“zone_name”: “Postmates”,
“market_name”: “Postmates”
},
“features”: [
{
“type”: “FeatureCollection”,
“properties”: {
“zone_name”: “Akron OH”,
“market_name”: “Akron OH”
},
“features”: [
{
“geometry”: {
“type”: “MultiPolygon”,
“coordinates”: [
[
[

[
-81.3940623151993,
41.115702695661
],
[
-81.4008165811959,
41.1139914440836
],

]

]

]

},

“type”: “Feature”,

“properties”: {

“name”: “zone_geometry”

}

},

{ “geometry”: {

“type”: “Point”,

“coordinates”: [

-81.51892234358142, 41.08024351088745 ]

}, “type”: “Feature”,

“properties”: {

“name”: “map_center” } }

]

}
There are two task to do

  1. We need to extract all the latitudes and longitudes from the file(from the “type” “Polygon”) and store it in the .csv file.
  2. convert those latitudes and longitudes to Zipcode.

First task is easy but we need to go deep into the json file to get the lat/long as there lot of levels we have to pass to get them. Below is the code to do this.

import json
import pandas as pd

input_file=open('postmates_delivery_zones.json', 'r')

output_file=open('test.json', 'w')

json_decode=json.load(input_file)

df = pd.DataFrame(columns = ['Market','Latitude','Longitude'] )

market = []
latitude = []
longitude = []

for i in range(len(json_decode['features'])):
    print(json_decode['features'][i]['properties'])
    
    for k in range(len(json_decode['features'][i]['features'][0]['geometry']['coordinates'][0][0])):
        print(json_decode['features'][i]['features'][0]['geometry']['coordinates'][0][0][k][0])
        print()
        #print()
        market.append(json_decode['features'][i]['properties']['market_name'])
        latitude.append(json_decode['features'][i]['features'][0]['geometry']['coordinates'][0][0][k][0])
        longitude.append(json_decode['features'][i]['features'][0]['geometry']['coordinates'][0][0][k][1])

df['Market'] = market
df['Latitude'] = latitude
df['Longitude'] = longitude
df.to_csv('marketpoint.csv',index = False)

Once we extract the latitudes and longitudes from the file, we have to convert them to zipcodes.There are lot of API’s available to do so. But all of them have limitations. For example we can only convert 15K lat/long to zipcodes using google API, same for the yahoo API. But we have Open Street Maps(OSM) API which has no limit to convert. Below is the code to convert them using OSM.

import geocoder
import csv
import pandas as pd


pcode = []


with open('marketpoint1.csv', 'r') as f:
    reader = csv.DictReader(f)
    for line in reader: 
        #print(line)
        print(line['Latitude'])          
        lat = float(line['Latitude'])
        print(line['Longitude'])
        lon = float(line['Longitude'])
        try:        
            g = geocoder.osm([lon,lat], method='reverse')
            print(g.osm['addr:postal'])
            pcode.append(g.osm['addr:postal'])
        except:
            print("No zip")
            
import codecs

content = unicode(pcode.content.strip(codecs.BOM_UTF8), 'utf-8')        
        
        
df = pd.DataFrame()         


k = []
for i in pcode:
    k.append(i.encode('utf-8'))

df['Zipcode'] = k
df.to_csv("output.csv")
    

Please let me know if any suggestions from your side.

Categories
Uncategorized

named entity recognition (NER)usign python Regex

We have a problem in finding our own NER for our chatbot created for automobile entities. I tried Spacy but not successful. So, I thought of creating my own NER Using Regular Expressions in python. This model identifies Year, Zip, Vin and Model of the car. All of them are straight forward except finding the model of the car. We have list of model separated by space. And also the other challenge is even if the users enters the model in a different way, it should identify the correct model. below is the code snippet which i wrote and works perfectly.

# -*- coding: utf-8 -*-
"""
Created on Mon Dec  3 22:35:35 2018

@author: trinadh
"""

import re
import itertools

#Models which we have(Deleted most of them)
models = [u'xg350l',
 u'xg350',
 u'xg300l',
 u'xg300',
 u'xg',
 u'veracruz se',
 u'veracruz ltd',
 u'veracruz gls',
 u'veracruz',
 u'veloster', 
 u'3dr base 5sp',
 u'3dr base 5-spd',
 u'3-dr gs auto',
 u'3-dr gs 5-spd',
 u'3-dr base 5-spd',
 u'2dr turbo 5sp',
 u'2dr se a',
 u'2dr se 5spd',
 u'2dr scpe a',
 u'2dr scpe 5sp',
 u'2dr ls a',
 u'2dr ls 5sp']

#creating all the combinations of the models. 
modelList = []
for model in models:
    lst = model.split()
    per = set(itertools.permutations(lst))
    for p in per:
        modelList.append(' '.join(p))

#creating a collections dictionary with year, zip and vin
collections = {}
collections[re.compile(r'\b(19[89][0-9]|20[0-2][0-5])\b')] = 'Year' #where year #should be only between 1989 and 2025
collections[re.compile(r'\b\d{5}\b')] = 'Zip' #Zip should be a 5 digit number
collections[re.compile(r'\bhu|gu\d{6}\b')] = 'Vin' #Vin should start with hu or #gu and followed by 6 digits

dis = {'Model':[],
       'Year':[],
        'Zip':[],
        'Vin':[]}

def find_matching_regexen(words, dicts=collections, modelList = modelList, models = models):
    
    #compiling a pattern for all the combinations of the Models
    regex1one = re.compile(r'\b(?:%s)'%'|'.join(modelList)) 
    name4 = re.findall(regex1one, words)

    #finding the original model
    modelList1 = []
    for name in name4:
        lst1 = name.split()
        pers = set(itertools.permutations(lst1))
        for per1 in pers:
            modelList1.append(' '.join(per1))            
    dis['Model'].append(list(set(modelList1) & set(models)))

    #finding the year,vin and zip
    for word in words.split():
        des = [[dis[description].append( word)] for regex, description in dicts.items() if regex.match(word)]                       
    return dis

dis1 = find_matching_regexen('i have 2025 3dr base 5sp car when i was in 45678 with vin hu456788 and also i own gls 5dr a, 2018') #Model 5dr gls a has been #shuffled
print(dis1['Model'])
print(dis1['Year'])
print(dis1['Vin'])
print(dis1['Zip'])

Output:
[['5dr gls a', '3dr base 5sp']]
['2025']
['hu456788']
['45678']



Categories
Uncategorized

Retrieving Tweets from Twitter

I want some 100’s of sentences regarding the different Auto-mobile companies to use them in my project. So its really hard to create 100’s of sentences on my own. So, I wrote a code to retrieve tweets from the twitter.

    # -*- coding: utf-8 -*-
"""
Created on Wed Oct  3 13:33:36 2018

@author: trinadh
"""
import tweepy 
import pandas as pd
import csv
import os
# Fill the X's with the credentials obtained by  
# following the above mentioned procedure. 
consumer_key = "xxxxxxxxxxxxxxxx" 
consumer_secret = "xxxxxxxxxxxxxxxx"
access_key = "xxxxxxxxxxxxxxxx"
access_secret = "xxxxxxxxxxxxxxxx"

# Authorization to consumer key and consumer secret 
auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
  
# Access to user's access key and access secret 
auth.set_access_token(access_key, access_secret) 
  
# Calling api 
api = tweepy.API(auth) 
# Function to extract tweets 
def get_tweets(username): 
     
        tweets = api.user_timeline(screen_name=username,count = 500) 
  
        # Empty Array 
        tmp=[]   
        # create array of tweet information: username,  
        # tweet id, date/time, text 
        tweets_for_csv = [tweet.text for tweet in tweets] # CSV file created  
        for j in tweets_for_csv: 
            # Appending tweets to the empty array tmp 
            tmp.append(j)    
        # Printing the tweets 
        print(tmp) 
        with open('Automobiletweets.csv', 'a') as f:
            writer = csv.writer(f)
            for item in tmp:
                writer.writerow([item.encode('ascii', 'ignore')])
                
        
# Driver code 
if __name__ == '__main__': 
  
    # Here goes the twitter handle for the user 
    # whose tweets are to be extracted.
  try:
    os.remove('Automobiletweets.csv')
  except OSError:
        pass
  make = pd.read_csv("make.csv")
  get_tweets("XXXX") #(XXXX-> from which twitter page you need to grab tweets)


Categories
Uncategorized

Automatically creating Spacy Training set

while I am using spacy, I had a difficulty for creating the training set. Initially I created manully for some of the entities, but later I got frustrated and wrote a program to create spacy training automatically.

Prerequisites:

  • Need python
  • Dataset which have the conversations with entities.
  • Entities which we need to be recognised

In this code I want to create a spacy training set for Auto-mobile entities. I got the sentences from ‘Automobiletweets-processed.csv’ file and each auto-mobile has total 5 entites, ‘Make’, ‘Model’, ‘Year’, ‘VIN’, ‘Zip’

"""
Created on Fri Sep 21 11:25:41 2018

@author: trinadh
"""
from __future__ import print_function
import pandas as pd
from langdetect import detect
import os

Autotweets = pd.read_csv('Automobiletweets-processed.csv',error_bad_lines=False)
Autotweets.columns = ['Index','Index1','tweets','tweet2']

cars = pd.read_excel('SAB Make and Models.xlsx')
cars.dropna(subset = ['CDS_Model'],inplace = True)
models = list(set(list(cars['CDS_Model'])))
makes = list(set(list(cars['CDS_Make'])))
vins = list(set(list(cars['VIN'])))
zips = list(set(list(cars['ZIP'])))
years = ['2000','2001','2002','2003','2004','2005','2006','2007','2008','2009','2010','2011','2012','2013','2014','2015','2016','2017','2018',]

file = open('AutomobileSpacyTraining.txt','w+') 
file.write('[')

subAuto = Autotweets[0:4]
     
for tweet in Autotweets['tweets']: 
    finish = 0
    flag = 0
    flag2 = 0
    flag3 = 0
    flag4 = 0
    flag5 = 0
        
    if detect(tweet) == 'en':
        word = tweet.split()
        
        for make in makes:                 
            if make in word:                              
                start_index = tweet.find(make)
                end_index = start_index + len(make)
                #print(tweet)
                #print(start_index,' ',end_index)
                if flag == 0:
                    file.write('\n')
                    file.write('(\'')
                    file.write(tweet)
                    file.write('\', { \'entities\': [(')
                else:
                    file.write(',(')
                file.write(str(start_index))
                file.write(',')
                file.write(str(end_index))
                #file.write(',\'Model\')]}),')
                file.write(',\'Make\')')
                flag = 1
                finish = 1

        for model in models:
            if model in word:
                #print make
                
                start_index = tweet.find(model)
                end_index = start_index + len(model)
                #print(tweet)
                #print(start_index,' ',end_index)
                if flag == 0 and flag2 == 0 :
                    file.write('\n')
                    file.write('(\'')
                    file.write(tweet)
                    file.write('\', { \'entities\': [(')
                else:
                    file.write(',(')
                file.write(str(start_index))
                file.write(',')
                file.write(str(end_index))
                #file.write(',\'Model\')]}),')
                file.write(',\'Model\')')
                #file.write(']}),')
                flag2 = 1
                finish = 1
                
        
        for year in years:
            if year in word:
                start_index = tweet.find(year)
                end_index = start_index + len(year)
                print(tweet)
                print(start_index,' ',end_index)
                if flag == 0 and flag2 == 0 and flag3 == 0:
                    file.write('\n')
                    file.write('(\'')
                    file.write(tweet)
                    file.write('\', { \'entities\': [(')
                else:
                    file.write(',(')
                file.write(str(start_index))
                file.write(',')
                file.write(str(end_index))
                file.write(',\'Year\')')
                #file.write(']}),')
                flag3 = 1
                finish = 1

        for vin in vins:
            if vin in word:
                start_index = tweet.find(vin)
                end_index = start_index + len(vin)
                print(tweet)
                print(start_index,' ',end_index)
                if flag == 0 and flag2 == 0 and flag3 == 0 and   flag4 == 0:
                    file.write('\n')
                    file.write('(\'')
                    file.write(tweet)
                    file.write('\', { \'entities\': [(')
                else:
                    file.write(',(')
                file.write(str(start_index))
                file.write(',')
                file.write(str(end_index))
                file.write(',\'VIN\')')
                #file.write(']}),')
                flag4 = 1
                finish = 1
                
        for zip1 in zips:
            zip1 = str(zip1)
            if zip1 in word:
                start_index = tweet.find(zip1)
                end_index = start_index + len(zip1)
                print(tweet)
                print(start_index,' ',end_index)
                if flag == 0 and flag2 == 0 and flag3 == 0 and flag4 == 0 and flag5 == 0:
                    file.write('\n')
                    file.write('(\'')
                    file.write(tweet)
                    file.write('\', { \'entities\': [(')
                else:
                    file.write(',(')
                file.write(str(start_index))
                file.write(',')
                file.write(str(end_index))
                file.write(',\'ZIP\')')
                #file.write(']}),')
                flag5 = 1
                finish = 1
                
        if finish == 1:
            file.write(']}),')
            
            
file.seek(-1,os.SEEK_END)
file.write('\n')
file.write(']')
file.close()



Categories
Uncategorized

Python Tableau Integration

Tableau Integration with Python – Step by Step

This document is intended to show how to leverage Python to extend Tableau capabilities and visualize outputs from Python.

PREREQUISITE:

1.To integrate Python with tableau we require Tableau 10.1v & Python 3.0v or greater.

2. Both Tableau and Python should be installed in same machine.

Install tableau python server: TabPy

1.Pip Install Tabpy-Server it will install the Tabpy server in the desired location which will be shown in the last line of the command prompt.

2.Go to the path and copy it C:\Users\Innova\Anaconda3; & C:\Users\Innova\Anaconda3\Scripts

3.Open System Properties and paste both the paths in Environment Variables by creating new variable like shown in the below image.

4.Click Ok.

5.Open CMD prompt, Run PIP Install Tabpy-Server

6. So for next time to start python tableau server (tabpy) ,go to the below path C:\Users\Innova\Anaconda3\Lib\site-packages\tabpy_server\Startup

7.Configure a TabPy Connection on Tableau on the Help menu in Tableau Desktop choose Settings and Performance > Manage External Service Connection to open the TabPy connection dialog box

Enter or select a server name using a domain or an IP address. 
The drop-down list includes localhost and the server you most recently connected to. 
Specify a port. Port 9004 is the default port for TabPy servers. 
Click Test Connection. 


Click OK.

Pass Expressions to Python

  • In order to let tableau, know that the calculations need to go to Python, it must be passed through one of the 4 functions.
  • These 4 functions are: SCRIPT_BOOL, SCRIPT_INT, SCRIPT_REAL, SCRIPT_STR
  • Python Functions are computed as Table calculations in Tableau.
  • Since these are table calculations, all the Fields being passed to Python must be aggregated like Sum(PROFIT), MIN(Profit), Max (Profit), ATTR(Category) etc.

Python Functions in Tableau
Run a Python script on Tableau

SCRIPT_BOOL

Returns a Boolean result from the specified expression. The expression is passed directly to a running external service instance. In Python expressions, use _arg(with a leading underscore) to reference parameters (_arg1, _arg2, etc.).

  • In this python example, _arg1 is equal to SUM([Profit])
  • All the Fields being passed to python must be aggregated like Sum(PROFIT), MIN(Profit), Max (Profit), ATTR(Category) etc.

SCRIPT_BOOL: Finding Profit Greater Zero by python

PythonBoolPositive : Python Calculated Field Function Code

SCRIPT_BOOL (“lst= []for i in _arg1 :lst.append(i>0)return lst”,SUM([Profit]))

 SCRIPT_INT – Example → Multiply Sales with 2 from Python

PythonIntegerMultiplyBy2 : Python Calculated Field Function Code

SCRIPT_INT(“lst= []for i in _arg1 : lst.append(i*2)return lst”,SUM([Sales]))

SCRIPT_REAL – Example – Finding log of sales by Python

PythonRealLog: Python Calculated Field Function Code

SCRIPT_REAL(“import mathlst = []for i in _arg1:

    if math.isnan(i):lst.append(0)else :lst.append(math.log(i))return lst”,SUM([Sales]))

SCRIPT_STR – Concatenate Two Strings using Python

PythonStringConcatenate : Python Calculated Field Function Code

SCRIPT_STR(“lst= []for i in range(0,len(_arg1)) :lst.append(_arg1[i]+_arg2[i])return lst”,ATTR([Category]),ATTR([Sub-Category]))

Finding correlation coefficient of Sales & Profit by Python

PythonCorrCoeff: Python Calculated Field Function Code

SCRIPT_REAL (“import numpy as npreturn np.corrcoef(_arg1,_arg2)[0,1]”,SUM([Sales]),sum([Profit]))

Categories
Uncategorized

The Journey Begins

Thanks for joining me!

Good company in a journey makes the way seem shorter. — Izaak Walton

post