TroubleshootingΒΆ
Job is slow or OOMs (throws an
OutOfMemoryError
) while using an aggregate likecollect_list
orsample_call_summary_stats
Try disabling the ObjectHashAggregate by setting
spark.sql.execution.useObjectHashAggregateExec
tofalse
Job is slow or OOMs while writing to partitioned table
This error can occur when reading from highly compressed files. Try decreasing
spark.files.maxPartitionBytes
to a smaller value like33554432
(32MB)
My VCF looks weird after merging VCFs and saving with
bigvcf
When saving to a VCF, the samples in the genotypes array must be in the same order for each row. This ordering is not guaranteed when using
collect_list
to join multiple VCFs. Try sorting the array usingsort_array
.