logo
down
shadow

Scrape content in json format - Python


Scrape content in json format - Python

By : Kẻ Rong Chơi
Date : October 25 2020, 04:08 PM
To fix the issue you can do 1 . Iterate all script tags and search target json
2 . Use regex to grab start and end
code :
for i in soup.select('script'):
    if 'Product.Config' in str(i):
        data = re.search(r'(?is)(Product\.Config\()(.*?)(\))',str(i)).group(2)

json_data = json.loads(data)
print(len(json_data['attributes']['179']['options']))
9


Share : facebook icon twitter icon
Python - Scrape Views Count from Instagram Video, load to JSON format

Python - Scrape Views Count from Instagram Video, load to JSON format


By : Arun kumar Shukla
Date : March 29 2020, 07:55 AM
Hope this helps I want to scrape the number of views that specific videos on Instagram have. I'm relatively new to python but I'm guessing there must be a way given that the views can be found in the source code. , Well, since the format is always the same, you could simply do this:
code :
data = json.loads(script.text.replace('window._sharedData = ', '')[:-1])
import json
import re
import urllib2
from bs4 import BeautifulSoup as bs

page = urllib2.urlopen('https://www.instagram.com/p/BOTU6rJhShv/')
soup = bs(page.read(),"html.parser")
body = soup.find('body',{'class':''})
script = body.find('script',{'type':'text/javascript'})
data = json.loads(script.text.replace('window._sharedData = ', '')[:-1])
print data
print data['entry_data']['PostPage'][0]['media']['video_views']
How to scrape each td tag content using Python 3

How to scrape each td tag content using Python 3


By : Alexis Santos
Date : March 29 2020, 07:55 AM
help you fix your problem you can use text instead of string and change your for loop, below is full code for your reference:
code :
from bs4 import BeautifulSoup
from urllib2 import urlopen
journal_ISSN = []
journal_name = []
journal_affecting_factors = []
journal_JCR_zone = []
journal_parent_class = []
journal_sub_class = []
journal_SCI = []
journal_acception = []
journal_period = []
url = "http://www.letpub.com.cn/index.php?page=journalapp&view=search&searchname=&searchissn=&searchfield=&searchimpactlow=&searchimpacthigh=&searchimpacttrend=&searchscitype=&searchcategory1=%E7%8E%AF%E5%A2%83%E7%A7%91%E5%AD%A6%E4%B8%8E%E7%94%9F%E6%80%81%E5%AD%A6&searchcategory2=%E7%8E%AF%E5%A2%83%E7%A7%91%E5%AD%A6&searchjcrkind=&searchopenaccess=&searchsort=relevance&searchsortorder=desc&currentsearchpage=2"
resp = urlopen(url) 
soup = BeautifulSoup(resp.read().decode('utf-8'), "html.parser") #decode to utf-8
journal_table = soup.find("table", {"class": "table_yjfx"})
rows = journal_table.find_all('tr')[2:-1] #filter to get only table data
for row in rows:
    col = row.find_all('td')
    journal_ISSN.append(col[0].text.strip())
    journal_name.append(col[1].text.strip())
    journal_affecting_factors.append(col[2].text.strip())
    journal_JCR_zone.append(col[3].text.strip())
    journal_parent_class.append(col[4].text.strip())
    journal_sub_class.append(col[5].text.strip())
    journal_SCI.append(col[6].text.strip())
    journal_acception.append(col[7].text.strip())
    journal_period.append(col[8].text.strip())
print journal_JCR_zone[0]
print journal_parent_class[0]
4区
环境科学与生态学
with open('chinesechar.txt','wb') as outf:
    outf.write(journal_sub_class[0].encode("utf-8"))
Python requests through API with variable URL in json to scrape content

Python requests through API with variable URL in json to scrape content


By : Gazi Mahmud
Date : March 29 2020, 07:55 AM
I hope this helps you . Install phantomJs. http://phantomjs.org/ not a full solution, but hope this helps. pip install selenium npm install phantomjs
test.py
code :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs') //path to phantomjs driver
driver.set_window_size(1120, 550)

driver.get("https://www.grailed.com/")

try:
    //you want to wait untill page is renderded 
    element = WebDriverWait(driver,1).until(
        EC.presence_of_all_elements_located((By.XPATH,'//*[@id="homepage"]/div/div[3]/div[1]/div/form/label/input'))
    )
    element = driver.find_element_by_xpath('//*[@id="homepage"]/div/div[3]/div[1]/div/form/label/input')

    if element.is_displayed():
        element.send_keys('search this')
    else:
        print ('no element')

except Exception as e:
    print (e)





print (driver.current_url)
driver.quit()
Python - convert .txt content into json format

Python - convert .txt content into json format


By : Ali Ünsal
Date : March 29 2020, 07:55 AM
wish helps you Try json.dumps instead of load Why do you wrap you dict in a list?
Write to file the result of the json.dumps, not the object
code :
with open(path, 'a+') as file:
    stringData = {"ContentUrl": link,
    "Text": post,
    "PublishDate": date.strip(),
    "Title": "",
    "SourceUrl": domainname,
    "SocialNetwork": media,
    "Source": "",
    "Author": name,
    "Like_count": likes.strip(),
    "Replies_count": replies.strip(),
    "Retweets_count": retweets.strip(),
    "Schema": "SOCIAL_MEDIA"}

    objData = json.dumps(stringData)
    file.write(objData)
Is there a way to scrape data from a PDF into a structured JSON format?

Is there a way to scrape data from a PDF into a structured JSON format?


By : Robert Andre
Date : September 26 2020, 03:00 PM
I hope this helps you . If you are using UIPath you need to install 2 packages:
1.Uipath PDF Activities - to read the pdf
Related Posts Related Posts :
  • antlr4 + python: debug token match
  • How to 'blit' sprites onto window for a set time
  • Program that checks if a number is prime number
  • python pandas time line graph
  • Reading a text file with OpenCV in Python
  • PyGame in MacOSX: CGContextDrawImage: invalid context 0x0
  • Twisted chat server demo exits immediately
  • How to calculate block averages in pandas DataFrame
  • how to change a list to a specific string.
  • Overlapping text when saving multiple Matplotlib images with text in a loop
  • How do I scrape ONLY <div class ='quotetext'> from a website using python?
  • Python: Float Object is not Iterable
  • ValueError: need more than 3 values to unpack
  • Evaluate while loop at certain point?
  • RxPy - Why are emissions interleaved with merging operators?
  • Spyder - hints disappear too fast
  • Creating a |N| x |M| matrix from a hash-table
  • daily data, resample every 3 days, calculate over trailing 5 days efficiently
  • How to do this program without a counter?
  • Saving a data frame with a column of list in python
  • Python newbie - refactor string function
  • TypeError: deafultdict must have first arguments callable
  • Zero padding not performed properly I think
  • When to bind to attributes that populated with kv-file?
  • Python - Adding "hidden" values to tuples
  • Multselecting in Pandas using .loc
  • python - checking if an array consisting of N integers is a permutation
  • How do you set the outer bg colour of a plot in matplotlib
  • Checking if an input is formatted correctly in Python 3
  • How to restrict two columns not to have the same value using Django?
  • Using turtle in Python to draw six-pointed stars with different side lengths
  • QAbstractListModel does not get updated with values when data is loaded from CSV, but it does when using hardcoded value
  • Python - Modify dictionary from function
  • django-ldap-auth user profile in django > 1.7
  • Rate Limit API Calls to Shopify API with Django on Google App Engine
  • TypeError: decoding str is not supported
  • Regular expression behaves unexpectedly when using some specific words
  • Counting uppercase letters in a list excluding the first capital in a word
  • Use socket.io to display realtime data
  • How to neatly print dictionaries with dictionaries inside
  • sorting dictionary by numeric value
  • How to find HDF5 file groups/keys within Python?
  • Cannot access nested dictionary in python
  • How to add a code fix for infinite loop while adding two integers using bitwise operations
  • Stuck in while loop
  • In Tensorflow, do I need to add new op for "sinc" or "gaussian" activation functions?
  • Conditional statment regarding various regex and length of a list in python
  • log2 axis doesn't work for histograms in matplotlib/seaborn
  • Selenium using Python - Geckodriver executable needs to be in PATH
  • Adding legend to a radarchart in Python
  • Detect same words using different alphabets?
  • What representation of chat text data should I use for user classification?
  • 'sqlite3.Cursor' object has no attribute '__getitem__' Error in Python Flask
  • Python Numpy: Coalesce and return first nonzero observation
  • Dowloading data from quandl.com and want to know how I include my API key with my request?
  • How to set python version on windows platform for matlab?
  • AttributeError: 'function' object has no attribute 'index'
  • Difficulty using subprocess.check_output with command line argument in many parts
  • Can someone tell me what are the mistakes in this code?
  • Convert 16 bytes of random data to integer in Python
  • shadow
    Privacy Policy - Terms - Contact Us © soohba.com