Scrape content in json format - Python

By : Kẻ Rong Chơi
Date : October 25 2020, 04:08 PM
To fix the issue you can do 1 . Iterate all script tags and search target json
2 . Use regex to grab start and end
code :
for i in soup.select('script'):
    if 'Product.Config' in str(i):
        data = re.search(r'(?is)(Product\.Config\()(.*?)(\))',str(i)).group(2)

json_data = json.loads(data)

Python - Scrape Views Count from Instagram Video, load to JSON format

By : Arun kumar Shukla
Date : March 29 2020, 07:55 AM
Hope this helps I want to scrape the number of views that specific videos on Instagram have. I'm relatively new to python but I'm guessing there must be a way given that the views can be found in the source code. , Well, since the format is always the same, you could simply do this:
code :
data = json.loads(script.text.replace('window._sharedData = ', '')[:-1])
import json
import re
import urllib2
from bs4 import BeautifulSoup as bs

page = urllib2.urlopen('https://www.instagram.com/p/BOTU6rJhShv/')
soup = bs(page.read(),"html.parser")
body = soup.find('body',{'class':''})
script = body.find('script',{'type':'text/javascript'})
data = json.loads(script.text.replace('window._sharedData = ', '')[:-1])
print data
print data['entry_data']['PostPage'][0]['media']['video_views']
How to scrape each td tag content using Python 3

By : Alexis Santos
Date : March 29 2020, 07:55 AM
help you fix your problem you can use text instead of string and change your for loop, below is full code for your reference:
code :
from bs4 import BeautifulSoup
from urllib2 import urlopen
journal_ISSN = []
journal_name = []
journal_affecting_factors = []
journal_JCR_zone = []
journal_parent_class = []
journal_sub_class = []
journal_SCI = []
journal_acception = []
journal_period = []
url = "http://www.letpub.com.cn/index.php?page=journalapp&view=search&searchname=&searchissn=&searchfield=&searchimpactlow=&searchimpacthigh=&searchimpacttrend=&searchscitype=&searchcategory1=%E7%8E%AF%E5%A2%83%E7%A7%91%E5%AD%A6%E4%B8%8E%E7%94%9F%E6%80%81%E5%AD%A6&searchcategory2=%E7%8E%AF%E5%A2%83%E7%A7%91%E5%AD%A6&searchjcrkind=&searchopenaccess=&searchsort=relevance&searchsortorder=desc&currentsearchpage=2"
resp = urlopen(url) 
soup = BeautifulSoup(resp.read().decode('utf-8'), "html.parser") #decode to utf-8
journal_table = soup.find("table", {"class": "table_yjfx"})
rows = journal_table.find_all('tr')[2:-1] #filter to get only table data
for row in rows:
    col = row.find_all('td')
print journal_JCR_zone[0]
print journal_parent_class[0]
with open('chinesechar.txt','wb') as outf:
Python requests through API with variable URL in json to scrape content

By : Gazi Mahmud
Date : March 29 2020, 07:55 AM
I hope this helps you . Install phantomJs. http://phantomjs.org/ not a full solution, but hope this helps. pip install selenium npm install phantomjs
code :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs') //path to phantomjs driver
driver.set_window_size(1120, 550)


    //you want to wait untill page is renderded 
    element = WebDriverWait(driver,1).until(
    element = driver.find_element_by_xpath('//*[@id="homepage"]/div/div[3]/div[1]/div/form/label/input')

    if element.is_displayed():
        element.send_keys('search this')
        print ('no element')

except Exception as e:
    print (e)

print (driver.current_url)
Python - convert .txt content into json format

By : Ali Ünsal
Date : March 29 2020, 07:55 AM
wish helps you Try json.dumps instead of load Why do you wrap you dict in a list?
Write to file the result of the json.dumps, not the object
code :
with open(path, 'a+') as file:
    stringData = {"ContentUrl": link,
    "Text": post,
    "PublishDate": date.strip(),
    "Title": "",
    "SourceUrl": domainname,
    "SocialNetwork": media,
    "Source": "",
    "Author": name,
    "Like_count": likes.strip(),
    "Replies_count": replies.strip(),
    "Retweets_count": retweets.strip(),
    "Schema": "SOCIAL_MEDIA"}

    objData = json.dumps(stringData)
Is there a way to scrape data from a PDF into a structured JSON format?

By : Robert Andre
Date : September 26 2020, 03:00 PM
I hope this helps you . If you are using UIPath you need to install 2 packages:
1.Uipath PDF Activities - to read the pdf
