문제 설명
Python에서 defaultdict 또는 dict를 Ordereddict로 변환할 수 있습니까? (Can I convert a defaultdict or dict to an ordereddict in Python?)
fasta 파일을 구문 분석하려고 하고 fasta 파일의 ATGCN의 가능한 모든 100번째 시퀀스를 포함하는 다른 파일을 만들고 싶습니다.
예:
chr1_1‑100:ATGC.....GC
chr1_2‑101:ATGC.....GC
chr1_3‑102:ATGC.....GC
......................
chr22_1‑100:ATGC....cG
chr22_2‑101:ATGC....cG
......................
다음 코드로 수행했습니다.
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
records = SeqIO.to_dict(SeqIO.parse(open(i1), 'fasta'))
with open(out, 'w') as f:
for key in records:
long_seq_record = records[key]
long_seq = long_seq_record.seq
length=len(long_seq)
alphabet = long_seq.alphabet
for i in range(0, length‑99):
short_seq = str(long_seq)[i:i+100]
text="@"+key+"_"+str(i)+"‑"+str(i+100)+":"+"\n"+short_seq+"\n"+"+"+"\n"+"IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII\n"
f.write(text)
문제는 작성된 파일이 순서가 없다는 것입니다. 즉, chr10
을 먼저 포함한 다음 chr2
를 포함할 수 있습니다. .
파싱이 dict(
예를 들어, SeqIO.to_dict(SeqIO.parse(open(i1), 'fasta'))
.
내 파일이 정렬되도록 사전을 정렬된 사전으로 변환할 수 있습니까? 아니면 솔루션을 얻을 수 있는 다른 방법이 있습니까?
참조 솔루션
방법 1:
Can I convert a defaultdict or dict to an ordereddict in Python?
Yes, you can convert it OrderedDict(any_dict)
and if you need to order the keys, you can sort them before creating the OrderedDict
:
>>> from collections import OrderedDict
>>> d = {'c':'c', 'b':'b', 'a':'a'}
>>> o = OrderedDict((key, d[key]) for key in sorted(d))
>>> o.items()[0]
('a', 'a')
>>> o.items()[1]
('b', 'b')
>>> o.items()[2]
('c', 'c')
방법 2:
Don't bother making any sort of dict at all. You don't need the properties a dict gives you, and you need the information the dict conversion loses. The record iterator from SeqIO.parse
already gives you what you need:
with open(i1) as infile, open(out, 'w') as f:
for record in SeqIO.parse(infile, 'fasta'):
# Do what you were going to do with the record.
If you need the information that was in the dict key, that's record.id
.
방법 3:
You have correctly identified the cause of the problem: the to_dict
method returns a dict, meaning that order has been lost. Since that point, there is no way to recover the order.
More, you do not really use the dict, because you process everything sequentially, so you could just iterate:
for record in SeqIO.parse(open(i1), 'fasta')) :
key = record.id
long_seq = record.seq
...
(by Surachit Sarkar、Cyrbil、user2357112、Serge Ballesta)