 推荐学习书目
› Python Cookbook
推荐学习书目
› Python Cookbook|  |      1INT21H      2012-06-14 19:47:46 +08:00  1 >>> from BeautifulSoup import BeautifulSoup >>> html="""<html> ... <head> ... <title>Test</title> ... </head> ... <body> ... <p>输出我</p> ... <p>我来捣乱</p> ... </body> ... </html>""" >>> bs = BeautifulSoup(html) >>> bs.p <p>输出我</p> >>> bs.p.contents [u'\u8f93\u51fa\u6211'] >>> | 
|  |      2vfasky      2012-06-14 20:56:33 +08:00 <code> html = '''<html> <head> <title>Test</title> </head> <body> <p>输出我</p> <p>我来捣乱</p> </body> </html>''' for t in html.split('</p>') : print t.replace('<p>','') break; </code> | 
|  |      3vfasky      2012-06-14 20:58:41 +08:00  1 | 
|  |      4muzuiget      2012-06-14 21:03:12 +08:00 关键词:正则表达式,DOM。 | 
|      5goofansu      2012-06-14 21:05:13 +08:00 最近也在玩,beautifulsoup很棒 | 
|  |      6yibin001      2012-06-14 21:16:34 +08:00 beautifulsoup还真是个神器 | 
|  |      7likuku      2012-06-14 21:29:06 +08:00  1 #!/usr/bin/env python # encoding: utf-8 """ html.py Created by likuku on 2012-06-14. Copyright (c) 2012 __MyCompanyName__. All rights reserved. """ import sys import os html=""" <html> <head> <title>Test</title> </head> <body> <p>输出我</p> <p>我来捣乱</p> </body> </html> """ def main(): for text in html.split('\n'): if text.find('<p>') != -1: tmp = text.replace('</p>','').replace('<p>','') print tmp break if __name__ == '__main__': main() | 
|      8aa88kk      2012-06-14 21:51:15 +08:00  1 用正则: m = re.search('<p>(.*?)<\/p>', s, re.S) | 
|  |      9cute      2012-06-14 21:57:50 +08:00  1 start = s.find('<p>')+ len('<p>') end = s.find('</p>', start) print s[start:end] | 
|  |      10ihciah OP 谢谢各位!!~~~~~~~~· | 
|  |      11ling0322      2012-06-18 21:46:38 +08:00 其实有一个比beautifulsoap更霸气的, 叫pyQuery | 
|  |      12binux      2012-06-18 21:49:04 +08:00 beautifulsoup太费内存了 | 
|  |      13chairo      2012-06-18 22:25:00 +08:00 libxml路过 |